Word2Vec performance getting synonyms

dantuzi · February 1, 2022, 3:49pm

Hello everybody,
I’m a new user of the Word2Vec model to retrieve the list of synonyms of a term.

I executed some benchmarks with a model with about 800k vectors of dimension 100 and I noticed that researching the top 10 nearest words takes more than 80ms using the wordsNearest(word, 10) method.
I ran a similar benchmark using a library that implements HNSW. Despite a bigger dataset than the previous benchmark (2 M vectors of size 300), each research took about 5ms.

The difference is impressive so I was wondering if there is a reason for DL4J not implementing HNSW for searching the nearest vectors… or maybe I’m using the wrong method to get the word synonyms.

Thanks in advance

Topic		Replies	Views
Loading model takes over two hours?! DL4J	3	896	June 26, 2020
Great start on Word2Vec, but still a few questions DL4J	4	481	June 26, 2020
Using "ParagraphVectors": How to efficiently search in > 100k embeddings? DL4J	11	539	December 1, 2022
Inputing word2vec#wordsNearest only positive words in prediction step DL4J	3	728	October 2, 2020
Train the Word2Vec model in a parts is fail. Code -1073741819 (1.0.0-beta7) DL4J	0	335	January 1, 2021

Word2Vec performance getting synonyms

Related topics