Error in jvm when get output from keras model

Hi,I’m again here.I met error when trying to get output from ComputationGraph.output(inputTensor);
The logs are here below:

[thread 140634763167488 also had an error]#
# A fatal error has been detected by the Java Runtime Environment:

#  [thread 140631614166784 also had an error]
SIGSEGV (0xb)[thread 140631622559488 also had an error] at pc=0x00007fe7f40700d7, pid=12046
, tid=0x00007fe758e80700
# JRE version: Java(TM) SE Runtime Environment (8.0_251-b08) (build 1.8.0_251-b08)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.251-b08 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  0x00007fe7f40700d7
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
# An error report file with more information is saved as:
# /home/qq12cvhj/710/Nd4jTest/hs_err_pid12046.log
# If you would like to submit a bug report, please visit:

The logs file can be found here.

The model is exported from keras and can be loaded back to the keras in a python env,and it works.Now I create a NDArray via nd4j as input tensor of the model.But met this error out of Java.

Can you share a bit more about your code?

After you’ve successfully loaded your keras model, it isn’t really much different from a model that was created with dl4j. So it being a keras model originally shouldn’t be the origin of the error.

Unfortunately the JVM crashed hard enough that it can’t even collect a proper stack trace here, but from the other entries in the log, it looks like you are using this in a concurrent context. Can you tell us more about that?

Ok,this time when I input the ‘ulimit -c unlimited’ in the cmd line before run the java code.And the log seems to be detailed.Here is the log file.
This time ,in the log file you can see this:

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j org.nd4j.nativeblas.Nd4jCpu.execCustomOp2(Lorg/bytedeco/javacpp/PointerPointer;JLorg/bytedeco/javacpp/Pointer;)I+0
j org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(Lorg/nd4j/linalg/api/ops/CustomOp;Lorg/nd4j/linalg/api/ops/OpContext;)[Lorg/nd4j/linalg/api/ndarray/INDArray;+98
j org.nd4j.linalg.factory.Nd4j.exec(Lorg/nd4j/linalg/api/ops/CustomOp;Lorg/nd4j/linalg/api/ops/OpContext;)[Lorg/nd4j/linalg/api/ndarray/INDArray;+5
j org.deeplearning4j.nn.layers.mkldnn.MKLDNNConvHelper.preOutput(Lorg/nd4j/linalg/api/ndarray/INDArray;Lorg/nd4j/linalg/api/ndarray/INDArray;Lorg/nd4j/linalg/api/ndarray/INDArray;[I[I[ILorg/deeplearning4j/nn/conf/layers/ConvolutionLayer$AlgoMode;Lorg/deeplearning4j/nn/conf/layers/ConvolutionLayer$FwdAlgo;Lorg/deeplearning4j/nn/conf/ConvolutionMode;[ILorg/deeplearning4j/nn/conf/CNN2DFormat;Lorg/deeplearning4j/nn/workspace/LayerWorkspaceMgr;)Lorg/nd4j/linalg/api/ndarray/INDArray;+498
j org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.preOutput(ZZLorg/deeplearning4j/nn/workspace/LayerWorkspaceMgr;)Lorg/nd4j/common/primitives/Pair;+609
j org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ZLorg/deeplearning4j/nn/workspace/LayerWorkspaceMgr;)Lorg/nd4j/linalg/api/ndarray/INDArray;+61
j org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(ZLorg/deeplearning4j/nn/workspace/LayerWorkspaceMgr;)Lorg/nd4j/linalg/api/ndarray/INDArray;+23
j org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ZLorg/deeplearning4j/nn/api/FwdPassType;[I[Lorg/nd4j/linalg/api/ndarray/INDArray;[Lorg/nd4j/linalg/api/ndarray/INDArray;[Lorg/nd4j/linalg/api/ndarray/INDArray;ZZLorg/nd4j/linalg/api/memory/MemoryWorkspace;)[Lorg/nd4j/linalg/api/ndarray/INDArray;+1172
j org.deeplearning4j.nn.graph.ComputationGraph.output(ZZ[Lorg/nd4j/linalg/api/ndarray/INDArray;)[Lorg/nd4j/linalg/api/ndarray/INDArray;+27
j org.deeplearning4j.nn.graph.ComputationGraph.outputSingle(ZZ[Lorg/nd4j/linalg/api/ndarray/INDArray;)Lorg/nd4j/linalg/api/ndarray/INDArray;+43
j org.deeplearning4j.nn.graph.ComputationGraph.outputSingle(Z[Lorg/nd4j/linalg/api/ndarray/INDArray;)Lorg/nd4j/linalg/api/ndarray/INDArray;+4
j org.deeplearning4j.nn.graph.ComputationGraph.outputSingle([Lorg/nd4j/linalg/api/ndarray/INDArray;)Lorg/nd4j/linalg/api/ndarray/INDArray;+3
j Separator.separate([[F)[Lorg/nd4j/linalg/api/ndarray/INDArray;+111
j Separator.main([Ljava/lang/String;)V+9
v ~StubRoutines::call_stub

My code is simple ,just load the model and give it an input of the right shape.see below:

ComputationGraph computationGraph1 = KerasModelImport.importKerasModelAndWeights(“kerasModels/model1.h5”);
INDArray outputTensor1 = computationGraph1.outputSingle(inputTensor);

Because the model requires the input shape of (?,512,1024,2),so I new an INDArray of shape (1,512,1024,2)
The .h5 model file is here.

What’s more interesting,with the same code running on windows,I didn’t get the log,for there is no output log in the terminal,it is just stuck,telling me the code is running ,however I got nothing after a long time of wating,and it will not fall into the JVM error. But on the linux ,I will get the error log.

Emmmm,the code is not running in a concurrent env.But it reported this err log.

Thanks,hope you can get what I mean.My English not so pretty.

Below is the log information and the model.Thanks

What happens if you use beta7?

I tried to reproduce your problem with beta7 like this and it appears to work:

ComputationGraph computationGraph1 = KerasModelImport.importKerasModelAndWeights("X:/model.h5");
INDArray outputTensor1 = computationGraph1.outputSingle(Nd4j.rand(1,512,1024,2));

And it works, even though you will get a warning telling you: MKL-DNN execution failed - falling back on built-in implementation java.lang.RuntimeException: could not create a descriptor for a dilated convolution forward propagation primitive

1 Like

could you please share me with the information about your nd4j profile?I’m using cpu as backend ,i dont know if this is related to nd4j.I am using a SNAPSHOT edition .

Thanks.It is the difference of NCHW & NHWC that is related to this problem.But I wonder why it throws an error,rather than an exception.So now my problem solved.
Another problem is that I met problems on Android platform when I changed the version of nd4j ,datavec or dl4j to beta7,it throws UnsatifiedLinked error about c++shared.So I changed back the version to beta6.

@sshepel Did you make sure we’re linking statically with C++ runtime on Android?

@saudet changes were merged to Konduit repo master… We need to wait next repo sync.

1 Like

Sync has been done, so changes should be in snapshots.

1 Like

I’ve had almost the exact same issue:

If you’re still getting this issue, a temporary fix is to duplicate the input array before passing it to the model;

This doesn’t seem to have too much of an effect on overall memory usage.

In my case,I think it is not only the shape of the tensor that determines the result.However,when I change the order of the tensor,it works.So there are other factors that influences it.