opened 04:01PM - 03 Nov 20 UTC
Bug
NLP
User Support
#### Issue Description
I use WordVectorSerializer.readWord2VecModel to read m…odel which is saved by WordVectorSerializer.writeWord2VecModel, then I got an exception.
WordVectorSerializer.readWord2VecModel is able to read a small word vector model, but failed to read a large one(about 1.2G).
- expected behavior
read model normally
- encountered behavior
throw an exception
#### Version Information
Please indicate relevant versions, including, if relevant:
* Deeplearning4j version
1.0.0-beta7
* Platform information (OS, etc)
Windows 10
* CUDA version, if used
* NVIDIA driver version, if in use
#### Additional Information
Where applicable, please also provide:
* Full log or exception stack trace (ideally in a Gist: gist.github.com)
`
23:40:13.762 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [CpuBackend] backend
23:40:14.968 [main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 4
23:40:15.046 [main] INFO org.nd4j.nativeblas.Nd4jBlas - Number of threads used for OpenMP BLAS: 4
23:40:15.066 [main] INFO o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CPU]; OS: [Windows 10]
23:40:15.066 [main] INFO o.n.l.a.o.e.DefaultOpExecutioner - Cores: [8]; Memory: [7.1GB];
23:40:15.066 [main] INFO o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [OPENBLAS]
23:40:36.342 [main] DEBUG o.d.m.e.loader.WordVectorSerializer - Trying simplified model restoration...
23:41:44.454 [main] DEBUG o.d.m.e.loader.WordVectorSerializer - Trying CSV model restoration...
23:41:44.463 [main] DEBUG o.d.m.e.loader.WordVectorSerializer - First line is a header
23:41:44.464 [main] DEBUG o.d.m.e.loader.WordVectorSerializer - Trying binary model restoration...
23:41:44.474 [main] ERROR o.d.m.e.loader.WordVectorSerializer - Cannot read binary model
java.lang.NumberFormatException: For input string: "PK 7�cQ syn0.txt\�۪��q����u��|ba�-����bH�`;$�}�W���?E�"i}k�ﳙ�鮮���j)?��������o��O{��~�����=ok�9K)������{e�����W��y�]?�c���3�s/~��Z����Ywۥ�S�<��>�Y}���muV���zZ��ٺ.����Z����v��g�V]����yΨ��^�,k�y�ޭ��z٭���8\�ҫ�r�{�Z����eꊗ�����ݣ�~��e��/��w���v���W�m�楧��7�6?���}���v]�c���뛟U]�=e�1���w�[�c�����}�����ԯ��[����t�<s�Qܸ�yz;G�2g�×���zf�.]������u�����f�Ʈ��z����K-�E}�>x�z=g��9�_����{>��o�z���r���ZM��M������\�M]��w���7�<֢>�-�=�I�?nK�ͭ]ЍO��������l�>�ђ�zl��:t���lz!��}�ڧ�����ئ���R��=��sG�1����]OGKu��oizc����9���ͷ�v�Һ�Zq�zˬ`�ȫ���j�����%��fm��+S��U1��]�q��k�{h)v=ס����Г��=���[��G[8�^����w���Z�Z"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readBinaryModel(WordVectorSerializer.java:278)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2444)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2426)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2413)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2372)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2341)
at com.tigerobo.javadl.inference.Dl4jWord2VecInference.<clinit>(Dl4jWord2VecInference.java:23)
at com.tigerobo.javadl.Word2VecTest.nearestWords(Word2VecTest.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
23:41:44.475 [main] ERROR o.d.m.e.loader.WordVectorSerializer - Unable to guess input file format
java.lang.RuntimeException: Unable to guess input file format. Please use corresponding loader directly
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2447)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2426)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2413)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2372)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2341)
at com.tigerobo.javadl.inference.Dl4jWord2VecInference.<clinit>(Dl4jWord2VecInference.java:23)
at com.tigerobo.javadl.Word2VecTest.nearestWords(Word2VecTest.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
java.lang.ExceptionInInitializerError
at com.tigerobo.javadl.Word2VecTest.nearestWords(Word2VecTest.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
Caused by: java.lang.RuntimeException: Unable to guess input file format. Please use corresponding loader directly
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2416)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2372)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2341)
at com.tigerobo.javadl.inference.Dl4jWord2VecInference.<clinit>(Dl4jWord2VecInference.java:23)
... 23 more
`
* pom.xml file or similar (also in a Gist)
`
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.tigerobo</groupId>
<artifactId>javaDL</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<dl4j.version>1.0.0-beta7</dl4j.version>
<djl.version>0.8.0</djl.version>
</properties>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-parent</artifactId>
<version>2.3.4.RELEASE</version>
</parent>
<dependencies>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.14</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.11</version>
</dependency>
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.4</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.74</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>ai.djl</groupId>
<artifactId>model-zoo</artifactId>
<version>${djl.version}</version>
</dependency>
<dependency>
<groupId>ai.djl.mxnet</groupId>
<artifactId>mxnet-model-zoo</artifactId>
<version>${djl.version}</version>
</dependency>
<dependency>
<groupId>ai.djl.mxnet</groupId>
<artifactId>mxnet-native-auto</artifactId>
<version>1.7.0-backport</version>
</dependency>
<dependency>
<groupId>ai.djl.pytorch</groupId>
<artifactId>pytorch-model-zoo</artifactId>
<version>${djl.version}</version>
</dependency>
<dependency>
<groupId>ai.djl.pytorch</groupId>
<artifactId>pytorch-native-auto</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>ai.djl.tensorflow</groupId>
<artifactId>tensorflow-model-zoo</artifactId>
<version>${djl.version}</version>
</dependency>
<dependency>
<groupId>ai.djl.tensorflow</groupId>
<artifactId>tensorflow-native-auto</artifactId>
<version>2.3.0</version>
</dependency>
<dependency>
<groupId>ai.djl</groupId>
<artifactId>basicdataset</artifactId>
<version>${djl.version}</version>
</dependency>
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-nlp-chinese</artifactId>
<version>${dl4j.version}</version>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native</artifactId>
<version>${dl4j.version}</version>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native</artifactId>
<version>${dl4j.version}</version>
<classifier>macosx-x86_64-avx2</classifier>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native</artifactId>
<version>${dl4j.version}</version>
<classifier>windows-x86_64-avx2</classifier>
</dependency>
</dependencies>
</project>
`
#### Contributing
If you'd like to help us fix the issue by contributing some code, but would
like guidance or help in doing so, please mention it!
I'm glad to do it!