Error in jvm when get output from keras model

qq12cvhj · May 16, 2020, 12:42pm

Hi,I’m again here.I met error when trying to get output from ComputationGraph.output(inputTensor);
The logs are here below:

[thread 140634763167488 also had an error]#
# A fatal error has been detected by the Java Runtime Environment:

#
#  [thread 140631614166784 also had an error]
SIGSEGV (0xb)[thread 140631622559488 also had an error] at pc=0x00007fe7f40700d7, pid=12046
, tid=0x00007fe758e80700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_251-b08) (build 1.8.0_251-b08)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.251-b08 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  0x00007fe7f40700d7
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/qq12cvhj/710/Nd4jTest/hs_err_pid12046.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

The logs file can be found here.

The model is exported from keras and can be loaded back to the keras in a python env,and it works.Now I create a NDArray via nd4j as input tensor of the model.But met this error out of Java.

treo · May 16, 2020, 3:26pm

Can you share a bit more about your code?

After you’ve successfully loaded your keras model, it isn’t really much different from a model that was created with dl4j. So it being a keras model originally shouldn’t be the origin of the error.

Unfortunately the JVM crashed hard enough that it can’t even collect a proper stack trace here, but from the other entries in the log, it looks like you are using this in a concurrent context. Can you tell us more about that?

qq12cvhj · May 16, 2020, 4:31pm

Ok,this time when I input the ‘ulimit -c unlimited’ in the cmd line before run the java code.And the log seems to be detailed.Here is the log file.
This time ,in the log file you can see this:

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j org.nd4j.nativeblas.Nd4jCpu.execCustomOp2(Lorg/bytedeco/javacpp/PointerPointer;JLorg/bytedeco/javacpp/Pointer;)I+0
j org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(Lorg/nd4j/linalg/api/ops/CustomOp;Lorg/nd4j/linalg/api/ops/OpContext;)[Lorg/nd4j/linalg/api/ndarray/INDArray;+98
j org.nd4j.linalg.factory.Nd4j.exec(Lorg/nd4j/linalg/api/ops/CustomOp;Lorg/nd4j/linalg/api/ops/OpContext;)[Lorg/nd4j/linalg/api/ndarray/INDArray;+5
j org.deeplearning4j.nn.layers.mkldnn.MKLDNNConvHelper.preOutput(Lorg/nd4j/linalg/api/ndarray/INDArray;Lorg/nd4j/linalg/api/ndarray/INDArray;Lorg/nd4j/linalg/api/ndarray/INDArray;[I[I[ILorg/deeplearning4j/nn/conf/layers/ConvolutionLayer$AlgoMode;Lorg/deeplearning4j/nn/conf/layers/ConvolutionLayer$FwdAlgo;Lorg/deeplearning4j/nn/conf/ConvolutionMode;[ILorg/deeplearning4j/nn/conf/CNN2DFormat;Lorg/deeplearning4j/nn/workspace/LayerWorkspaceMgr;)Lorg/nd4j/linalg/api/ndarray/INDArray;+498
j org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.preOutput(ZZLorg/deeplearning4j/nn/workspace/LayerWorkspaceMgr;)Lorg/nd4j/common/primitives/Pair;+609
j org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ZLorg/deeplearning4j/nn/workspace/LayerWorkspaceMgr;)Lorg/nd4j/linalg/api/ndarray/INDArray;+61
j org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(ZLorg/deeplearning4j/nn/workspace/LayerWorkspaceMgr;)Lorg/nd4j/linalg/api/ndarray/INDArray;+23
j org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ZLorg/deeplearning4j/nn/api/FwdPassType;[I[Lorg/nd4j/linalg/api/ndarray/INDArray;[Lorg/nd4j/linalg/api/ndarray/INDArray;[Lorg/nd4j/linalg/api/ndarray/INDArray;ZZLorg/nd4j/linalg/api/memory/MemoryWorkspace;)[Lorg/nd4j/linalg/api/ndarray/INDArray;+1172
j org.deeplearning4j.nn.graph.ComputationGraph.output(ZZ[Lorg/nd4j/linalg/api/ndarray/INDArray;)[Lorg/nd4j/linalg/api/ndarray/INDArray;+27
j org.deeplearning4j.nn.graph.ComputationGraph.outputSingle(ZZ[Lorg/nd4j/linalg/api/ndarray/INDArray;)Lorg/nd4j/linalg/api/ndarray/INDArray;+43
j org.deeplearning4j.nn.graph.ComputationGraph.outputSingle(Z[Lorg/nd4j/linalg/api/ndarray/INDArray;)Lorg/nd4j/linalg/api/ndarray/INDArray;+4
j org.deeplearning4j.nn.graph.ComputationGraph.outputSingle([Lorg/nd4j/linalg/api/ndarray/INDArray;)Lorg/nd4j/linalg/api/ndarray/INDArray;+3
j Separator.separate([[F)[Lorg/nd4j/linalg/api/ndarray/INDArray;+111
j Separator.main([Ljava/lang/String;)V+9
v ~StubRoutines::call_stub

My code is simple ,just load the model and give it an input of the right shape.see below:

ComputationGraph computationGraph1 = KerasModelImport.importKerasModelAndWeights(“kerasModels/model1.h5”);
INDArray outputTensor1 = computationGraph1.outputSingle(inputTensor);

Because the model requires the input shape of (?,512,1024,2),so I new an INDArray of shape (1,512,1024,2)
The .h5 model file is here.

What’s more interesting,with the same code running on windows,I didn’t get the log,for there is no output log in the terminal,it is just stuck,telling me the code is running ,however I got nothing after a long time of wating,and it will not fall into the JVM error. But on the linux ,I will get the error log.

Emmmm,the code is not running in a concurrent env.But it reported this err log.

Thanks,hope you can get what I mean.My English not so pretty.

qq12cvhj · May 16, 2020, 4:33pm

Below is the log information and the model.Thanks

treo · May 16, 2020, 7:20pm

What happens if you use beta7?

I tried to reproduce your problem with beta7 like this and it appears to work:

ComputationGraph computationGraph1 = KerasModelImport.importKerasModelAndWeights("X:/model.h5");
INDArray outputTensor1 = computationGraph1.outputSingle(Nd4j.rand(1,512,1024,2));

And it works, even though you will get a warning telling you: MKL-DNN execution failed - falling back on built-in implementation java.lang.RuntimeException: could not create a descriptor for a dilated convolution forward propagation primitive

qq12cvhj · May 16, 2020, 10:52pm

could you please share me with the information about your nd4j profile?I’m using cpu as backend ,i dont know if this is related to nd4j.I am using a SNAPSHOT edition .

qq12cvhj · May 17, 2020, 12:51pm

Thanks.It is the difference of NCHW & NHWC that is related to this problem.But I wonder why it throws an error,rather than an exception.So now my problem solved.
Another problem is that I met problems on Android platform when I changed the version of nd4j ,datavec or dl4j to beta7,it throws UnsatifiedLinked error about c++shared.So I changed back the version to beta6.
Thanks.

saudet · May 19, 2020, 5:57am

@sshepel Did you make sure we’re linking statically with C++ runtime on Android?

sshepel · May 21, 2020, 7:21am

@saudet changes were merged to Konduit repo master… We need to wait next repo sync.

sshepel · June 10, 2020, 6:18am

Sync has been done, so changes should be in snapshots.

basedrhys · June 11, 2020, 9:15pm

I’ve had almost the exact same issue:

Running model on input tensor crashes the JVM
Running model on brand new NDArray of same size doesn’t crash JVM.
Here’s my GH issue - ImageRecordReader crashes JVM with loaded Keras model in 1.0.0-beta7 · Issue #8976 · eclipse/deeplearning4j · GitHub

If you’re still getting this issue, a temporary fix is to duplicate the input array before passing it to the model
iter.next.getFeatures().dup();

This doesn’t seem to have too much of an effect on overall memory usage.

qq12cvhj · June 12, 2020, 2:25am

In my case,I think it is not only the shape of the tensor that determines the result.However,when I change the order of the tensor,it works.So there are other factors that influences it.

Topic		Replies	Views
A fatal error has been detected by the Java Runtime Environment DL4J	5	80	October 29, 2024
Importing a Keras model into DL4J DL4J	5	424	June 8, 2023
Jvm SIGSEGV when running an imported tf2 frozen graph model DL4J	5	373	December 16, 2021
Failed to import keras seq2seq model to DL4J with Merge Vertex error DL4J	0	500	June 12, 2020
Unable to import keras model DL4J	5	448	February 27, 2021

Error in jvm when get output from keras model

Related topics