ParagraphVectors predict exiting with EXCEPTION_ACCESS_VIOLATION

UPDATE at the end

Hey and thanks for this cool library :slight_smile:

I started with DL4J last week and managed to set up a ParagraphVector text classifier. However often the execution just stops during the learning process. Sometimes I get the following error message:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000006960194e, pid=19700, tid=0x0000000000004d5c
#
# JRE version: Java(TM) SE Runtime Environment (8.0_241-b07) (build 1.8.0_241-b07)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.241-b07 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C  [libnd4jcpu.dll+0x18194e]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\Ich\Documents\CleanVersion2\TextClassifierv2\hs_err_pid19700.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

I have managed to trigger the error at will with the following input:

“ADHOC ADC Therapeutics veranstaltet am 6. Mai 2021 eine Telefonkonferenz über die Ergebnisse des 1. Quartals 2021: ADC Therapeutics veranstaltet am 6. Mai 2021 eine Telefonkonferenz über die Ergebnisse des 1. Quartals 2021. ADC Therapeutics SA (NYSE: ADCT), ein kommerziell tätiges Biotechnologieunternehmen, das in der Entwicklung neuartiger Antikörper-Wirkstoff-Konjugate (AWK) zur Behandlung hämatologischer Malignome und solider Tumore führend ist, gab heute bekannt, dass es am Donnerstag, den 6. Mai 2021 um 8:30 Uhr EDT (USA) eine Telefonkonferenz und einen Live-Webcast veranstalten wird, um über die Finanzergebnisse für das 1. Quartal 2021 sowie aktuelle Geschäftsinformationen zu berichten.”

Output before the crash:

1)
19:26:13.771 [main] DEBUG org.deeplearning4j.models.sequencevectors.SequenceVectors - Similarity inside: [BAD] → 0.17727841436862946
19:26:13.771 [main] DEBUG org.deeplearning4j.models.sequencevectors.SequenceVectors - Similarity inside: [GOOD] → -0.2962224781513214
19:26:13.793 [main] INFO org.deeplearning4j.models.sequencevectors.SequenceVectors - Creating new PV-DM learner…
19:26:13.793 [main] INFO org.deeplearning4j.models.sequencevectors.SequenceVectors - Building learning algorithms:
19:26:13.793 [main] INFO org.deeplearning4j.models.sequencevectors.SequenceVectors - building SequenceLearningAlgorithm: [PV-DM]
19:26:13.794 [main] INFO org.deeplearning4j.models.sequencevectors.SequenceVectors - building ElementsLearningAlgorithm: [CBOW]

The hs_err log is showing the folliwing stack trace for that occurence:

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 3287  org.nd4j.nativeblas.Nd4jCpu$NativeOps.execAggregateFloat(Lorg/bytedeco/javacpp/PointerPointer;ILorg/bytedeco/javacpp/PointerPointer;ILorg/bytedeco/javacpp/PointerPointer;ILorg/bytedeco/javacpp/IntPointer;ILorg/bytedeco/javacpp/PointerPointer;ILorg/bytedeco/javacpp/FloatPointer;I)V (0 bytes) @ 0x00000000037ad341 [0x00000000037ad240+0x101]
J 3374 C2 org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(Lorg/nd4j/linalg/api/ops/aggregates/Aggregate;)V (732 bytes) @ 0x0000000003454d10 [0x0000000003454140+0xbd0]
J 3695 C2 org.deeplearning4j.models.embeddings.learning.impl.elements.CBOW.iterateSample(Lorg/deeplearning4j/models/sequencevectors/sequence/SequenceElement;[ILjava/util/concurrent/atomic/AtomicLong;DZIZLorg/nd4j/linalg/api/ndarray/INDArray;)V (418 bytes) @ 0x0000000003736128 [0x00000000037355c0+0xb68]
J 3363 C2 org.deeplearning4j.models.embeddings.learning.impl.sequence.DM.dm(ILorg/deeplearning4j/models/sequencevectors/sequence/Sequence;ILjava/util/concurrent/atomic/AtomicLong;DLjava/util/List;ZLorg/nd4j/linalg/api/ndarray/INDArray;)V (307 bytes) @ 0x00000000037d817c [0x00000000037d7b40+0x63c]
j  org.deeplearning4j.models.embeddings.learning.impl.sequence.DM.inferSequence(Lorg/deeplearning4j/models/sequencevectors/sequence/Sequence;JDDI)Lorg/nd4j/linalg/api/ndarray/INDArray;+172
j  org.deeplearning4j.models.paragraphvectors.ParagraphVectors.inferVector(Ljava/util/List;DDI)Lorg/nd4j/linalg/api/ndarray/INDArray;+210
j  org.deeplearning4j.models.paragraphvectors.ParagraphVectors.inferVector(Ljava/util/List;)Lorg/nd4j/linalg/api/ndarray/INDArray;+36
j  org.deeplearning4j.models.paragraphvectors.ParagraphVectors.predictSeveral(Ljava/util/List;I)Ljava/util/Collection;+21
j  org.deeplearning4j.models.paragraphvectors.ParagraphVectors.predictSeveral(Ljava/lang/String;I)Ljava/util/Collection;+112
j  org.deeplearning4j.models.paragraphvectors.ParagraphVectors.predictSeveral(Lorg/deeplearning4j/text/documentiterator/LabelledDocument;I)Ljava/util/Collection;+37

I do not understand what in my input string causes the classifier to fail. I would like to understand the pattern/reason, so I can exclude all strings producing these crashes. Help here would be much appreciated.

UPDATE

I have managed to get fix that particular instance of the error and reduced the error frequency a lot. I am using multiple models to predict and I noticed the error mostly occurred for one particular model. So I removed it from the model set and that made things a lot better. It seems at least partially the issue was a corrupt model.

However, I am getting these issues also during training. The probability increases running multiple instances of traning/prediction at the same time.

Have you tried the latest Snapshots? If it still does crash with them, can you please create a self-contained reproducer project (ideally one that crashes as often as possible :), so we can figure out what the reason for that crash is.