Hello everybody,
I’m a new user of the Word2Vec model to retrieve the list of synonyms of a term.
I executed some benchmarks with a model with about 800k vectors of dimension 100 and I noticed that researching the top 10 nearest words takes more than 80ms using the wordsNearest(word, 10)
method.
I ran a similar benchmark using a library that implements HNSW. Despite a bigger dataset than the previous benchmark (2 M vectors of size 300), each research took about 5ms.
The difference is impressive so I was wondering if there is a reason for DL4J not implementing HNSW for searching the nearest vectors… or maybe I’m using the wrong method to get the word synonyms.
Thanks in advance