NPE when load TF to SameDiff

SidneyLann · October 26, 2021, 8:37am

SameDiff sd = SameDiff.importFrozenTF(new File(NerUtil.DATA_HOME + “/bert/bert_pretrain_model_zh.pb”));

this works in beta7, but below code not works in SNAPSHOT:
TensorflowFrameworkImporter tensorflowFrameworkImporter = new TensorflowFrameworkImporter();
SameDiff sd = tensorflowFrameworkImporter.runImport(NerUtil.DATA_HOME + “/bert/bert_pretrain_model_zh.pb”, Collections.emptyMap());

java.lang.NullPointerException
at org.nd4j.samediff.frameworkimport.registry.OpMappingRegistry.lookupInputFrameworkOpDef(OpMappingRegistry.kt:99)
at org.nd4j.samediff.frameworkimport.ir.IRFunctionsKt.importInfoForEachNodeInGraph(IRFunctions.kt:63)
at org.nd4j.samediff.frameworkimport.tensorflow.ir.TensorflowIRGraph.importInfoForEachNode(TensorflowIRGraph.kt:162)
at org.nd4j.samediff.frameworkimport.ImportGraph.importGraph(ImportGraph.kt:238)
at org.nd4j.samediff.frameworkimport.tensorflow.importer.TensorflowFrameworkImporter.importFromGraph(TensorflowFrameworkImporter.kt:58)
at org.nd4j.samediff.frameworkimport.tensorflow.importer.TensorflowFrameworkImporter.runImport(TensorflowFrameworkImporter.kt:64)

agibsonccc · October 26, 2021, 8:39am

@SidneyLann can you DM me the model? I’d be happy to look at it, we’re importing BERT models fine. I’d be happy to look at the variant here.

SidneyLann · October 26, 2021, 10:06am

But can’t upload zip file here.

SidneyLann · October 26, 2021, 9:14pm

@agibsonccc sent the model to you, please help to check. Thanks.

agibsonccc · October 26, 2021, 10:26pm

@SidneyLann please check your DMs, the model download was giving me some difficulties.

SidneyLann · October 28, 2021, 1:39am

@agibsonccc Please download from dl4j - Google Drive

SidneyLann · October 29, 2021, 1:39pm

@agibsonccc Can import the model in SNAPSHOT?

agibsonccc · October 29, 2021, 1:40pm

@SidneyLann I got your model, sorry will be testing it after my current batch of PRs is merged.

SidneyLann · November 8, 2021, 12:48pm

@agibsonccc Could you fix this issue and then I can test other issues? Thanks.

agibsonccc · November 11, 2021, 9:48pm

@SidneyLann I haven’t tried yet. I’ve been working on overhauling various aspects of the training and ensuring that’s solid (basically addressing your other concerns first) I wanted to try out various models after getting that done as a litmus test. Luckily I’m on the tail end of that work now and will be able to take a look at this.

agibsonccc · November 23, 2021, 2:15am

@SidneyLann I finally got some time to look in to this. With the latest changes I did here: https://github.com/eclipse/deeplearning4j/pull/9535 it appears to import this fine now. If you want the converted model let me know. Thanks for reporting this!

SidneyLann · November 23, 2021, 2:28pm

@agibsonccc Is the PR available in SNAPSHOT now? still can’t import.

tensorflowFrameworkImporter.runImport(NerUtil.DATA_HOME + “/bert/bert_pretrain_model_zh.pb”, Collections.emptyMap(), true or false);

not work

agibsonccc · November 24, 2021, 12:48am

@SidneyLann again if you want the model , happy to send it. Otherwise, snapshots runs every 2 days.

SidneyLann · November 27, 2021, 3:34am

Bert model imported successfully now, but it consume 1.5g heap + 7g offheap memory in snapshot but only consume 1g heap + 4g offheap memory in beta7.

agibsonccc · December 3, 2021, 10:58am

@SidneyLann is this at runtime or just passive? Anything to reproduce this would be nice. There have barely been any changes between beta7 and M1.1.

SidneyLann · December 3, 2021, 11:40am

@agibsonccc Just import the bert model I had sent you in beta7 and SNAPSHOT should see the different memory comsume.

agibsonccc · December 3, 2021, 11:42am

@SidneyLann how were you measuring though? Were you using anything like yourkit? Were you using our built in performance listener?

SidneyLann · December 3, 2021, 11:45am

One pc has 6g RAM, it can’t import the model, but another pc has 8g RAM can import successfuly.

successfully by using:
-Xmx1500m -Dorg.bytedeco.javacpp.maxbytes=6656m -Dorg.bytedeco.javacpp.maxphysicalbytes=7168m

fail in SNAPSHOT but successfully in beta7 by using:
-Xmx1024m -Dorg.bytedeco.javacpp.maxbytes=4096m -Dorg.bytedeco.javacpp.maxphysicalbytes=4096m

agibsonccc · December 3, 2021, 11:50am

@SidneyLann ok…I can’t really use that. I would need code I can run where I can clearly see profiler output to figure out the source of the problem if there is one or if this is just an opinion.

The goal I have when I ask a question like that is to figure out how to reply to you. If I don’t have clear examples I can run or how you are interpreting what more memory usage is I am not going to chase that down.

My only guess is that it’s the new eager mode that was added. During import we execute graphs 1 op at a time and store the results. This is for use in more complex use cases where we need to know the shape of something during import.

This won’t affect actually saving the model. I would recommend just running the conversion in a standalone process then that won’t be a problem.

SidneyLann · December 3, 2021, 11:56am

The bert model was generated in 2018, I’m not sure if it used eager mode or not.

Topic		Replies	Views
UnsupportedOperationException while training SameDiff	4	525	August 24, 2021
Some error happened when I importFrozenTF by SameDiffI SameDiff	6	658	August 14, 2021
Importing BERT fails with Unable to find name dataType for op name: "tensorarrayv3" SameDiff	8	483	February 3, 2022
Bert fail to train on ner task SameDiff	5	517	October 13, 2021
Jvm SIGSEGV when running an imported tf2 frozen graph model DL4J	5	375	December 16, 2021

NPE when load TF to SameDiff

Related topics