Word embeddings are numerical representations of words in a high-dimensional space. They encode semantic information, which means that if two words have a similar meaning, their embeddings are close to each other. When you select a word, the closest ones (in the embedding space) are displayed too.
I'm working on it ! I'm currently working on a tool to visualize embeddings of documents.
How does the visualization work ?
+
The visualization uses Principal Component Analysis (PCA) to reduce the high-dimensional embeddings down to 2 dimensions for visualization. When you select a word/document, the tool shows both the selected item and its nearest neighbors in the embedding space. The distance between points represents semantic similarity - items that are closer together have more similar meanings according to the embedding model. Note that because of the PCA, the nearest neighbors in the embedding space might not be the nearest neighbors in the 2-dimension plot.
What does "Recalculate PCA" do ?
+
The button "Recalculate PCA" re-computes the PCA based on the words or documents that have been selected by the user. Since PCA "spreads" data points in its space, this leads to a new distribution where the selected items are more likely to be on the edges, and all the others more likely to be at the center.