I tried it - still no useful info on the console and no error logs in the project folder.
Tried it using the corresponding modifications for ParagraphVectors. As a result I got the following observations:
- If the ParagraphVectors builder has no explicit resetModel set to false then the vocabulary is not being extended and the execution is successful
- If the ParagraphVectors builder has resetModel=false then the vocabulary is extended and after the extension the next call to fit() causes the mentioned native execution crash
- Saving/loading the model to/from the file (the same as in the example) has no impact on the behavior
- Updating an existing model with a new iterator (which points to new sequences) and running fit() doesn’t extend a vocabulary when resetModel=false and no crash is there.
- Updating an existing model with a new iterator (which points to new sequences), explicitly calling the buildVocab() in order to extend it and running fit() after that causes a crash.
So I played with different combinations and noticed that recreating a model using the same builder which created it but changing the sequence iterator and rebuilding the vocabulary in a specific way brings me to the point which I need. And I got a successful result!
I had to re-implement the vocabulary building because the fit() method doesn’t call buildVocab() if you don’t want to reset the model. Also I’ve noticed that VocabConstructor.buildJointVocabulary() doesn’t
mark the tokens which are already in the vocabulary as labels. So I fixed this part and it all worked and all the labels with all new words were there. I got no resetting of the lookup table, I got all labels preserved and the vocabulary extended. The main difference between the example you’ve referenced to and my implementation is that I recreate the model from scratch using the same builder which created it and after it I call
And it works! I tried the bulder without this line and I got the crash. Also I tried updating the existing model with new iterator as in the example and using this line but it still crashed. Only recreating a model using the builder (which itself preserves the lookup table and the vocabulary) together with consuming it’s own lookup table works fine. That means that the problem lies in any of ParagraphVectors fields which is re-initialized using a builder build() method call and thus it prevents the model from crashing during the next fit() after the vocabulary extension. I couldn’t identify, which field/s exactly is/are crucial here but if it’s not re-initialized (like in the example you’ve provided) - the fit() after vocabulary extension causes the crash.