NPE when load TF to SameDiff

SameDiff sd = SameDiff.importFrozenTF(new File(NerUtil.DATA_HOME + “/bert/bert_pretrain_model_zh.pb”));

this works in beta7, but below code not works in SNAPSHOT:
TensorflowFrameworkImporter tensorflowFrameworkImporter = new TensorflowFrameworkImporter();
SameDiff sd = tensorflowFrameworkImporter.runImport(NerUtil.DATA_HOME + “/bert/bert_pretrain_model_zh.pb”, Collections.emptyMap());

at org.nd4j.samediff.frameworkimport.registry.OpMappingRegistry.lookupInputFrameworkOpDef(OpMappingRegistry.kt:99)
at org.nd4j.samediff.frameworkimport.ImportGraph.importGraph(ImportGraph.kt:238)
at org.nd4j.samediff.frameworkimport.tensorflow.importer.TensorflowFrameworkImporter.importFromGraph(TensorflowFrameworkImporter.kt:58)
at org.nd4j.samediff.frameworkimport.tensorflow.importer.TensorflowFrameworkImporter.runImport(TensorflowFrameworkImporter.kt:64)

@SidneyLann can you DM me the model? I’d be happy to look at it, we’re importing BERT models fine. I’d be happy to look at the variant here.

But can’t upload zip file here.

@agibsonccc sent the model to you, please help to check. Thanks.

@SidneyLann please check your DMs, the model download was giving me some difficulties.

@agibsonccc Please download from dl4j - Google Drive

@agibsonccc Can import the model in SNAPSHOT?

@SidneyLann I got your model, sorry will be testing it after my current batch of PRs is merged.

@agibsonccc Could you fix this issue and then I can test other issues? Thanks.

@SidneyLann I haven’t tried yet. I’ve been working on overhauling various aspects of the training and ensuring that’s solid (basically addressing your other concerns first) I wanted to try out various models after getting that done as a litmus test. Luckily I’m on the tail end of that work now and will be able to take a look at this.

@SidneyLann I finally got some time to look in to this. With the latest changes I did here: Fix shape resolution when shapes are < 0 when suggesting input arrays by agibsonccc · Pull Request #9535 · eclipse/deeplearning4j · GitHub it appears to import this fine now. If you want the converted model let me know. Thanks for reporting this!

@agibsonccc Is the PR available in SNAPSHOT now? still can’t import.

tensorflowFrameworkImporter.runImport(NerUtil.DATA_HOME + “/bert/bert_pretrain_model_zh.pb”, Collections.emptyMap(), true or false);

not work

@SidneyLann again if you want the model , happy to send it. Otherwise, snapshots runs every 2 days.

Bert model imported successfully now, but it consume 1.5g heap + 7g offheap memory in snapshot but only consume 1g heap + 4g offheap memory in beta7.

@SidneyLann is this at runtime or just passive? Anything to reproduce this would be nice. There have barely been any changes between beta7 and M1.1.

@agibsonccc Just import the bert model I had sent you in beta7 and SNAPSHOT should see the different memory comsume.

@SidneyLann how were you measuring though? Were you using anything like yourkit? Were you using our built in performance listener?

One pc has 6g RAM, it can’t import the model, but another pc has 8g RAM can import successfuly.

successfully by using:
-Xmx1500m -Dorg.bytedeco.javacpp.maxbytes=6656m -Dorg.bytedeco.javacpp.maxphysicalbytes=7168m

fail in SNAPSHOT but successfully in beta7 by using:
-Xmx1024m -Dorg.bytedeco.javacpp.maxbytes=4096m -Dorg.bytedeco.javacpp.maxphysicalbytes=4096m

@SidneyLann ok…I can’t really use that. I would need code I can run where I can clearly see profiler output to figure out the source of the problem if there is one or if this is just an opinion.

The goal I have when I ask a question like that is to figure out how to reply to you. If I don’t have clear examples I can run or how you are interpreting what more memory usage is I am not going to chase that down.

My only guess is that it’s the new eager mode that was added. During import we execute graphs 1 op at a time and store the results. This is for use in more complex use cases where we need to know the shape of something during import.

This won’t affect actually saving the model. I would recommend just running the conversion in a standalone process then that won’t be a problem.

The bert model was generated in 2018, I’m not sure if it used eager mode or not.