I started playing around with deeplearning4j. I want to load a pretrained word2vec model (in Dutch) and do some tests. Loading the model (binary, 1 Gb) took my system over two hours, which is way too long, I presume?
This the code (in Eclipse). The line with the invoke-statement takes over two hours:
Can anyone help me out? Can I speed things up? What I would like to build is a system in Java that creates word-associations from a starting word, in Dutch.
Your dependencies look weird, why do you have an explicit dependency on nd4j-buffer? Why are you mixing cuda versions?
Where are you reading the data from? What kind of storage is it?
Even though your pom.xml says beta6, are you entirely sure you arenât somehow on beta7 (tried to downgrade after seeing something like this: Beta 7 - Glove Word Vector - #4 by treo, but Eclipse didnât properly pick up on that)
Typically loading a 1GB binary takes about as long as it takes to read the file - 2 hours obviously is way too long.
If you can, running your application with a profiler should also shed some light into why it takes so long to load it.
Thanks a lot for looking into my situation. My answers:
What do you mean with âreflectionâ? And where am I using that?
Mixed Cuda versions, youâre right. I checked my version and took 10.1 out. I also deleted the nd4j-buffer dependency.
Iâm reading from a file that I downloaded from NLPL word embeddings repository (tested both the Dutch .bin and .txt downloads, same long loading time). I assumed that I can use these files also for deeplearning4j. If not: do you know of a Duch model that is suited? I understand that the model from Google news is English only.
How can I check âwhere Iâm onâ?
NB. I also tested this line for loading the model, with the same effect:
As Iâm running version 1.0.0-beta7, I applied the workaround from the linked post (Beta 7 - Glove Word Vector - #4 by treo), loaded the txt file and stopped the time: