FastText - Getting vectors for misspelled / unknown words?

I’m using the FastText.java class included in DL4J to access FastText functionality:

One of the main features of FastText is that it can provide vectors even for misspellings / unknown words. E.g according to the FAQs of FastText:

However, if I try to access a vector for an unknown word using fast.getWordVectorMatrix(word) in FastText.java, it just returns a default vector. However, I’d like it to retrieve a vector from the underlying fasttext implementation instead.

Is this possibile?

What type of FastText vectors are you loading?

Keep in mind there are two types of vectors for FastText - text/.vec and binary/.bin.
The text vectors are word-level only - no subword information.
Only the binary vectors have subword information and hence can return (non-default/unknown) vectors for unknown words.

That’s not in any way a limitation of DL4J, that’s a limitation of the FastText text/.vec format.

@AlexBlack, Thanks for that clarification. getWordVectorMatrix() indeed works for unknown words if I just use the binary model!

Is it possible to call the other methods (e.g wordsNearest(), similarity), etc for unknown words on the FastText class? I seem to get an NPE since modelUtils is null if I call wordsNearest. There’s a setModelUtils() method - do I need to provide a modelUtils implementation myself?

@aliakhtar Not sure sorry, but that’s a good question, I’ve opened an issue here - keep an eye on that