Hi there, I am hitting a road block while successfully using DL4J + ND4J for deploying a Keras Functional Model over Spark at Runtime. Let me walk you through couple things I have tried and explain problem with each,
NOTE: Model is built using Python + Keras
python version: 3.8.0
keras version: 2.4.0
tensorflow version: 2.4.0
NOTE: Model is saved as a whole w H5 format
model.save(‘model.h5’)
-
Using latest available release artifacts: 1.0.0-beta7
<dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-core</artifactId> <version>1.0.0-beta7</version> </dependency> <dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-modelimport</artifactId> <version>1.0.0-beta7</version> </dependency> <dependency> <groupId>org.nd4j</groupId> <artifactId>nd4j-native</artifactId> <version>1.0.0-beta7</version> </dependency>
Model Import using,
val model = KerasModelImport.importKerasModelAndWeights(‘model.h5’, false)
But 1.0.0-beta7 fails to import Functional model successfully with following error.
Expected model class name Model (found Functional). For more information, see https://deeplearning4j.konduit.ai/keras-import/overview
org.deeplearning4j.nn.modelimport.keras.exceptions.InvalidKerasConfigurationException: Expected model class name Model (found Functional). For more information, see https://deeplearning4j.konduit.ai/keras-import/overview
At this point I shift over to latest available over SNAPSHOT repos. i.e. https://oss.sonatype.org/content/repositories/snapshots
-
Using latest available release artifacts: 1.0.0-beta7
<dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-core</artifactId> <version>1.0.0-SNAPSHOT</version> </dependency> <dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-modelimport</artifactId> <version>1.0.0-SNAPSHOT</version> </dependency> <dependency> <groupId>org.nd4j</groupId> <artifactId>nd4j-native</artifactId> <version>1.0.0-SNAPSHOT</version> </dependency>
This resolved Keras Functional model import problem and allowed me to successfully establish a fully functional working model import and model predict on the local environment. Great feeling! But another problem was encountered while deploying this model JAR over the runtime Spark environment. i.e.
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.nd4j.linalg.factory.Nd4j
Above is encountered at model import. i.e. very first point of contact.
java.lang.NoClassDefFoundError: Could not initialize class org.nd4j.linalg.factory.Nd4j
at org.deeplearning4j.nn.modelimport.keras.Hdf5Archive.readDataSet(Hdf5Archive.java:295)
at org.deeplearning4j.nn.modelimport.keras.Hdf5Archive.readDataSet(Hdf5Archive.java:109)
at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelUtils.importWeights(KerasModelUtils.java:284)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:190)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:99)
at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelBuilder.buildModel(KerasModelBuilder.java:311)
at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:150)
at com.example.Model$.apply(Model.scala:29)
Note, it was verified that both JAR and classpath contains the class under question. At this point I suspected if Spark seeks platform specific Nd4j implementation under classpath and hence attempted to include nd4j-native-platform artifact under the JAR instead of basic nd4j-native. Correct me if step 3 was an incorrect move.
-
Using nd4j-native-platform artifact instead of basic nd4j-native
<dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-core</artifactId> <version>1.0.0-SNAPSHOT</version> </dependency> <dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-modelimport</artifactId> <version>1.0.0-SNAPSHOT</version> </dependency> <dependency> <groupId>org.nd4j</groupId> <artifactId>nd4j-native-platform</artifactId> <version>1.0.0-SNAPSHOT</version> </dependency>
This fails to compile on following,
[ERROR] Failed to execute goal on project apple: Could not resolve dependencies for project com.example.ls:apple:jar:2.3-SNAPSHOT: The following artifacts could not be resolved: org.nd4j:nd4j-native:jar:android-arm:1.0.0-SNAPSHOT, org.nd4j:nd4j-native:jar:android-arm64:1.0.0-SNAPSHOT, org.nd4j:nd4j-native:jar:android-x86:1.0.0-SNAPSHOT, org.nd4j:nd4j-native:jar:android-x86_64:1.0.0-SNAPSHOT, org.nd4j:nd4j-native:jar:linux-ppc64le:1.0.0-SNAPSHOT: Could not transfer artifact org.nd4j:nd4j-native:jar:android-arm:1.0.0-SNAPSHOT from/to maven-local-release : Failed to transfer file: http://artifactory.example.com:8000/artifactory/maven-local-release/org/nd4j/nd4j-native/1.0.0-SNAPSHOT/nd4j-native-1.0.0-SNAPSHOT-android-arm.jar. Return code is: 409 , ReasonPhrase:Conflict. → [Help 1]
Sorry for this long thread, but if I can obtain help with any of these outstanding roadblocks, it would be much appreciated.
PS: While I do read under other post that new beta OR RC release is expected within few weeks, it would be nice if team can push out latest SNAPSHOT to release repo with say beta OR alpha tags to help address #1 and potentially #3. Working with SNAPSHOT’s is more like living on the edge as you can break tomorrow morning. Regardless, I do seek your guidance here in the given scenario.