Jvm SIGSEGV when running an imported tf2 frozen graph model

Hi, i’m trying to run a trained tensorflow 2 model using dl4j.
I followed Frozen-Graph-TensorFlow/TensorFlow_v2 at master · leimao/Frozen-Graph-TensorFlow · GitHub example1.py code to save the pb model. (this link was from https://github.com/eclipse/deeplearning4j-examples/blob/master/tensorflow-keras-import-examples/README.md ).
Then in java, using version 1.0.0-M1.1 of nd4j/dl4j i imported the graph using:

   byte[] content = IOUtils.toByteArray(new FileInputStream(file));
    List<String> inputs= Arrays.asList(INPUTS);
    try(GraphRunner graphRunner = GraphRunner.builder().graphBytes(content).inputNames(inputs).build()) {
...

this loads with no errors.
However when i try to run it using:

      Map<String,INDArray>output=graphRunner.run(testInput);

where testInput is the map of ind arrays for a sample input i get:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000014c451cc4, pid=54941, tid=10499
#
# JRE version: OpenJDK Runtime Environment Corretto-11.0.11.9.1 (11.0.11+9) (build 11.0.11+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-11.0.11.9.1 (11.0.11+9-LTS, mixed mode, tiered, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# C  [libtensorflow_cc.so.1+0x4b4ecc4]  _ZNK10tensorflow4Node4nameEv+0x4
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/fabrizio/tactile/ai-models/hs_err_pid54941.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/corretto/corretto-11/issues/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

I also tried to import the same pb file using:

TFGraphMapper.importGraph(file);

as indicated in other examples but in that case i get this error:

Exception in thread "main" java.lang.IllegalStateException: Could not find class for TF Ops: TensorListFromTensor
	at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:638)
	at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:301)
	at org.nd4j.imports.graphmapper.tf.TFGraphMapper.importGraph(TFGraphMapper.java:284)
	at org.nd4j.imports.graphmapper.tf.TFGraphMapper.importGraph(TFGraphMapper.java:141)
	at org.nd4j.imports.graphmapper.tf.TFGraphMapper.importGraph(TFGraphMapper.java:87)
	at org.nd4j.imports.graphmapper.tf.TFGraphMapper.importGraph(TFGraphMapper.java:73)

i cannot tell what’s the issue either way.

i appreciate it if you could point me in the right direction to debug this issue.

thank you

PS: from the log file mentioned above, here is a detailed stack trace:

---------------  T H R E A D  ---------------

Current thread (0x00007fe408809000):  JavaThread "main" [_thread_in_native, id=10499, stack(0x00007000096bc000,0x00007000097bc000)]

Stack: [0x00007000096bc000,0x00007000097bc000],  sp=0x00007000097bb500,  free space=1021k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libtensorflow_cc.so.1+0x4b4ecc4]  _ZNK10tensorflow4Node4nameEv+0x4
C  [libtensorflow_cc.so.1+0x78824]  TF_SessionRun+0x264
C  [libjnitensorflow.dylib+0xa97de0]  Java_org_bytedeco_tensorflow_global_tensorflow_TF_1SessionRun__Lorg_bytedeco_tensorflow_TF_1Session_2Lorg_bytedeco_tensorflow_TF_1Buffer_2Lorg_bytedeco_tensorflow_TF_1Output_2Lorg_bytedeco_javacpp_PointerPointer_2ILorg_bytedeco_tensorflow_TF_1Output_2Lorg_bytedeco_javacpp_PointerPointer_2ILorg_bytedeco_javacpp_PointerPointer_2ILorg_bytedeco_tensorflow_TF_1Buffer_2Lorg_bytedeco_tensorflow_TF_1Status_2+0x3f0
j  org.bytedeco.tensorflow.global.tensorflow.TF_SessionRun(Lorg/bytedeco/tensorflow/TF_Session;Lorg/bytedeco/tensorflow/TF_Buffer;Lorg/bytedeco/tensorflow/TF_Output;Lorg/bytedeco/javacpp/PointerPointer;ILorg/bytedeco/tensorflow/TF_Output;Lorg/bytedeco/javacpp/PointerPointer;ILorg/bytedeco/javacpp/PointerPointer;ILorg/bytedeco/tensorflow/TF_Buffer;Lorg/bytedeco/tensorflow/TF_Status;)V+0
j  org.nd4j.tensorflow.conversion.graphrunner.GraphRunner.runTfTensor(Ljava/util/Map;)Ljava/util/Map;+1329
j  org.nd4j.tensorflow.conversion.graphrunner.GraphRunner.run(Ljava/util/Map;)Ljava/util/Map;+102
j  org.nd4j.tensorflow.conversion.graphrunner.GraphRunner.run(Ljava/util/Map;)Ljava/util/Map;+18
j  com.tactile.aimodels.tf.TfModel.main([Ljava/lang/String;)V+56
v  ~StubRoutines::call_stub
V  [libjvm.dylib+0x3bf51e]  _ZN9JavaCalls11call_helperEP9JavaValueRK12methodHandleP17JavaCallArgumentsP6Thread+0x220
V  [libjvm.dylib+0x4034dc]  _ZL17jni_invoke_staticP7JNIEnv_P9JavaValueP8_jobject11JNICallTypeP10_jmethodIDP18JNI_ArgumentPusherP6Thread+0x122
V  [libjvm.dylib+0x4062fb]  jni_CallStaticVoidMethod+0x17f
C  [libjli.dylib+0x4831]  JavaMain+0xad1
C  [libjli.dylib+0x6be4]  ThreadJavaMain+0x9
C  [libsystem_pthread.dylib+0x6109]  _pthread_start+0x94
C  [libsystem_pthread.dylib+0x1b8b]  thread_start+0xf

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.bytedeco.tensorflow.global.tensorflow.TF_SessionRun(Lorg/bytedeco/tensorflow/TF_Session;Lorg/bytedeco/tensorflow/TF_Buffer;Lorg/bytedeco/tensorflow/TF_Output;Lorg/bytedeco/javacpp/PointerPointer;ILorg
/bytedeco/tensorflow/TF_Output;Lorg/bytedeco/javacpp/PointerPointer;ILorg/bytedeco/javacpp/PointerPointer;ILorg/bytedeco/tensorflow/TF_Buffer;Lorg/bytedeco/tensorflow/TF_Status;)V+0
j  org.nd4j.tensorflow.conversion.graphrunner.GraphRunner.runTfTensor(Ljava/util/Map;)Ljava/util/Map;+1329
j  org.nd4j.tensorflow.conversion.graphrunner.GraphRunner.run(Ljava/util/Map;)Ljava/util/Map;+102
j  org.nd4j.tensorflow.conversion.graphrunner.GraphRunner.run(Ljava/util/Map;)Ljava/util/Map;+18
j  com.tactile.aimodels.tf.TfModel.main([Ljava/lang/String;)V+56
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000090

@fmorbinitc note that nd4j-tensorflow (which enables you to use tf java 1 underneath the covers with nd4j arrays) is different than model import and from what I’m seeing is what’s causing the bottom stack trace.
That crash is related to tensorflow not our model import. For now let’s focus on the model import.

Regarding the model import could you try the new model import framework:

if you still have issues I’d be happy to look at your pb file.

Hi, i’m trying that path and hitting a silly roadblock. I get this error:

Exception in thread "main" java.util.ServiceConfigurationError: org.nd4j.samediff.frameworkimport.opdefs.OpDescriptorLoader: Provider org.nd4j.samediff.frameworkimport.tensorflow.opdefs.TensorflowOpDescriptorLoader could not be instantiated
	at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:582)
	at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:804)
	at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:722)
	at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1395)
	at org.nd4j.samediff.frameworkimport.opdefs.OpDescriptorLoaderHolder.loadDescriptorLoaders(OpDescriptorLoaderHolder.kt:38)
	at org.nd4j.samediff.frameworkimport.opdefs.OpDescriptorLoaderHolder.<clinit>(OpDescriptorLoaderHolder.kt:28)
	at org.nd4j.samediff.frameworkimport.tensorflow.importer.TensorflowFrameworkImporter.<init>(TensorflowFrameworkImporter.kt:40)
	at com.tactile.aimodels.tf.TfModel.main(TfModel.java:76)
Caused by: java.io.FileNotFoundException: nd4j-op-def.pbtxt cannot be opened because it does not exist
	at org.nd4j.common.io.ClassPathResource.getInputStream(ClassPathResource.java:248)
	at org.nd4j.common.io.ClassPathResource.getInputStream(ClassPathResource.java:235)
	at org.nd4j.samediff.frameworkimport.tensorflow.opdefs.TensorflowOpDescriptorLoader.nd4jOpList(TensorflowOpDescriptorLoader.kt:55)
	at org.nd4j.samediff.frameworkimport.tensorflow.opdefs.TensorflowOpDescriptorLoader.<init>(TensorflowOpDescriptorLoader.kt:46)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:780)
	... 6 more
Caused by: java.io.FileNotFoundException: nd4j-op-def.pbtxt cannot be opened because it does not exist

basicaly i cannot find a jar that contains: nd4j-op-def.pbtxt

i tried to include:

    implementation group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'nd4j-native-platform', version: '1.0.0-M1.1'
    implementation 'org.nd4j:nd4j-tensorflow:1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'samediff-import-tensorflow', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'samediff-import-api', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'samediff-import-tensorflow', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'nd4j-api', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'nd4j-backend-impls', version: '1.0.0-M1.1', ext: 'pom'

i see that file here: deeplearning4j/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/resources at master · eclipse/deeplearning4j · GitHub

but cannot tell in which jar i’ll find that file. can you please point me to the right one?

thank you

after downloading the following files from dl4j github:
nd4j-op-def.pbtxt
tensorflow-mapping-ruleset.pbtxt
tensorflow-op-def.pbtxt
i can run java -cp build/libs/ai-models-1.0-SNAPSHOT.jar:. com.tactile.aimodels.tf.TfModel

i don’t get the file not found error as before, but i get the following exception:

Exception in thread "main" java.lang.ExceptionInInitializerError
	at org.nd4j.samediff.frameworkimport.tensorflow.importer.TensorflowFrameworkImporter.importFromGraph(TensorflowFrameworkImporter.kt:58)
	at org.nd4j.samediff.frameworkimport.tensorflow.importer.TensorflowFrameworkImporter.runImport(TensorflowFrameworkImporter.kt:64)
	at com.tactile.aimodels.tf.TfModel.main(TfModel.java:77)
Caused by: java.lang.IllegalArgumentException: Found invalid output tensor named permuteDims for rule ndarraymapping and mapping process for op transpose and input framework name Transpose with definition being name: "transpose"
argDescriptor {
  name: "dtype"
  argType: DATA_TYPE
}
argDescriptor {
  name: "outputs"
  argType: OUTPUT_TENSOR
}
argDescriptor {
  name: "permuteDims"
  argType: INT64
}
argDescriptor {
  name: "input"
  argType: INPUT_TENSOR
}
argDescriptor {
  name: "permutationVector"
  argType: INPUT_TENSOR
  argIndex: 1
}

	at org.nd4j.samediff.frameworkimport.process.AbstractMappingProcess.<init>(AbstractMappingProcess.kt:84)
	at org.nd4j.samediff.frameworkimport.tensorflow.process.TensorflowMappingProcess.<init>(TensorflowMappingProcess.kt:49)
	at org.nd4j.samediff.frameworkimport.tensorflow.process.TensorflowMappingProcess.<init>(TensorflowMappingProcess.kt:48)
	at org.nd4j.samediff.frameworkimport.tensorflow.TensorflowProtobufExtensionsKt.mapTensorNamesWithOp(TensorflowProtobufExtensions.kt:258)
	at org.nd4j.samediff.frameworkimport.tensorflow.definitions.TensorflowOpDeclarationsKt.<clinit>(TensorflowOpDeclarations.kt:2268)
	... 3 more

does that help identify the issue?

@fmorbinitc ah yes thanks. Sorry about that and I’m glad you knew the solution to that already! That will be fixed in the next release. It was due to the way the resource was being loaded.

We’re due for a release here in a week or 2 so you can also try snapshots if you have issues there.

Regarding the output vector issue also make sure you have the right mapping rules.
Add both of the files from this directory in to your project as well:

Again you can also try snapshots otherwise. Sorry about that. I will try to make the resource loading more explicit there in the next release.

The idea with that is if people run in to issues they can override it using some system properties with paths going to the respective configurations.

You can find more here:
https://github.com/eclipse/deeplearning4j/blob/master/nd4j/samediff-import/samediff-import-tensorflow/src/main/kotlin/org/nd4j/samediff/frameworkimport/tensorflow/opdefs/TensorflowOpDescriptorLoader.kt#L41

Basically there are 3 total definition files for model import:

  1. nd4j-op-def.pbtxt - this one handles mapping input and output names to input/output indices in our set of ops that get mapped to from each framework
  2. A framework op list (tensorflow-op-def.pbtxt): A proto buf text file that lists all the ops from a given framework like their number of inputs, names etc
  3. A mappings ruleset(tensorflow-mapping-ruleset.pbtxt); for describing how to map each op to our ops respectively -these rules are discoverable on the classpath and are by default exported from here:
    https://github.com/eclipse/deeplearning4j/blob/master/nd4j/samediff-import/samediff-import-tensorflow/src/main/kotlin/org/nd4j/samediff/frameworkimport/tensorflow/definitions/TensorflowOpDeclarations.kt

Hopefully that helps!

Thank you, I’ve exactly those 2 files (diffed to make sure) from that github repo:

still i get the same error, i created a gist with my pb file i’m importing:

could you try to import it?

i use this code:

package com.tactile.aimodels.tf;

import java.io.File;
import java.io.IOException;
import java.util.Collections;
import org.deeplearning4j.nn.graph.ComputationGraph;
import org.deeplearning4j.nn.modelimport.keras.KerasModelImport;
import org.deeplearning4j.nn.modelimport.keras.exceptions.InvalidKerasConfigurationException;
import org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.nd4j.autodiff.samediff.SameDiff;
import org.nd4j.imports.graphmapper.tf.TFGraphMapper;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.samediff.frameworkimport.tensorflow.importer.TensorflowFrameworkImporter;
import org.nd4j.tensorflow.conversion.graphrunner.GraphRunner;

public class TfModel {

  public static void main(String[] args)
      throws IOException, UnsupportedKerasConfigurationException, InvalidKerasConfigurationException {

    File file=new File("/Users/fabrizio/Downloads/complex_frozen_graph.pb");

    TensorflowFrameworkImporter tensorflowFrameworkImporter = new TensorflowFrameworkImporter();
    SameDiff graph = tensorflowFrameworkImporter.runImport(file.getAbsolutePath(), Collections.emptyMap());

    System.out.println(graph.summary());
    System.exit(0);
  }
}

and the gradle deps are:

    implementation group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'nd4j-native-platform', version: '1.0.0-M1.1'
    implementation 'org.nd4j:nd4j-tensorflow:1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'samediff-import-tensorflow', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'samediff-import-api', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'samediff-import-tensorflow', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'nd4j-api', version: '1.0.0-M1.1'
    implementation group: 'org.nd4j', name: 'nd4j-backend-impls', version: '1.0.0-M1.1', ext: 'pom'

i tried both M1 and M1.1, both cause the same error.
Does it mean i need to extend something? i didn’t use any special custom layer in the original model.

thank you