Version `GLIBC_2.23' not found when using dl4j-cuda

My try for that was downgrading gcc version but it’s stacking here

Normally, how long does it take to finish the build?

Building DL4J’s cuda modules takes quite a while. You can speed it up a bit by only building for the compute capability of the cards that are available in the cluster.

At the moment you are building for CC 5.3, 6.0, 6.1, 7.0, 7.5, 8.0 and 8.6. That even takes a long time on a very fast system.

You can add -Dlibnd4j.compute="8.6" to the mvn command to tell it to only support RTX30 series GPUs.

Check CUDA - Wikipedia for a list of GPUs and their compute capabilities.

Thank you so much.
I would like to ask you as well, after building from source how can my project knows about this build.
I mean after building from source I just come back to my project directory and generate my jar-with-dependency file with the command mvn clean package as I was doing previously and then submit my jar file with spar-submit command?

after the build, you should have a 1.0.0-SNAPSHOT version of dl4j in your ~/.m2 folder. Anything that references it, will pull it from there.

So if you are building your jar file for spark-submit in the same environment, it should pick up on that.

Okay I understand, thank you.
For java version, the build required JDK 11

So I loaded JDK 11 instead of 1.8. But in the pom.xml file of my project, should I set up java 11 as well ? or it’s fine with java 1.8 ?

@Nour-Rekik the more recent sources use jdk 11. What version is your spark cluster running?

If you need jdk 1.8 you can modify the source adding this to your mvn command:

-Dmaven.compiler.source=1.8 -Dmaven.compiler.target=1.8

This will tell maven (which uses java to run) via system properties to override what’s in the pom.xml there.

Yes I change it in my pom.xml to Java 11 and I am loading JDK 11 from my HPC cluster.
For Spark I am using Spark/3.0.1 version

@Nour-Rekik I think you’re a bit confused as to the nature of the error. Your reply didn’t tell me anything new…let me try rewording this.

What I just told you to do is make dl4j try to use java 8 for compilation.
I thought you needed JDK 8 on your cluster.

If you spark cluster is java 11 then use java 11 for the sources as well.

You should not need to do anything with the latest release.

Could you verify that you are using the correct JDK with the right JAVA_HOME?

If you managed to get the c++ libraries compiled that’s the hardest part. There shouldn’t need to be anything else once the JDK is correct.

Edit: also just so you don’t try to recompile the cuda libraries. Ensure you use -rf :nd4j. That will prevent the needless recompilation of the c++ libraries.

Otherwise you can also speed up the c++ build (libnd4j) with:
-Dlibnd4j.buildthreads=4

Set “4” to whatever your number of cores on your CPU cluster is. That should help a lot if you need to recompile the sources. If you’ve ever compiled a c++ library before it’s like using make -j 4.

Yes I understand.
I didn’t know that I can modify the source to use JDK 1.8 for the sources. So the build was done with JDK 11. And to not waste time and do the build with jdk1.8 I change my maven dependencies on my pom.xml of my project to JDK 11.
The build from source was successfully done
I moved to my project and I am getting this error

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/software/haswell/Spark/2.4.3-Hadoop-2.7-Java-11-Python-3.6.6-fosscuda-2018b/lib/python3.6/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.3.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/09/01 01:59:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Versions of org.bytedeco:javacpp:1.5.7 and org.bytedeco:cuda:11.4-8.2-1.5.6 do not match.
java.lang.UnsatisfiedLinkError: /home/h4/nore667e/.javacpp/cache/deepLearningSimpleOne-1.0-SNAPSHOT-jar-with-dependencies.jar/org/bytedeco/cuda/linux-x86_64/libjnicudart.so: libcudart.so.11.0: cannot open shared object file: No such file or directory
        at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
        at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2430)
        at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2487)
        at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2684)
        at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2617)
        at java.base/java.lang.Runtime.load0(Runtime.java:767)
        at java.base/java.lang.System.load(System.java:1831)
        at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1747)
        at org.bytedeco.javacpp.Loader.load(Loader.java:1402)
        at org.bytedeco.javacpp.Loader.load(Loader.java:1214)
        at org.bytedeco.javacpp.Loader.load(Loader.java:1190)
        at org.bytedeco.cuda.global.cudart.<clinit>(cudart.java:14)
        at org.nd4j.linalg.jcublas.JCublasBackend.canRun(JCublasBackend.java:66)
        at org.nd4j.linalg.jcublas.JCublasBackend.isAvailable(JCublasBackend.java:51)
        at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:175)
        at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5068)
        at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:290)
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.entryPoint(MnistRetrainingMain.java:27)
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.main(MnistRetrainingMain.java:22)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/09/01 01:59:23 WARN Nd4jBackend: Skipped [JCublasBackend] backend (unavailable): java.lang.UnsatisfiedLinkError: /home/h4/nore667e/.javacpp/cache/deepLearningSimpleOne-1.0-SNAPSHOT-jar-with-dependencies.jar/org/bytedeco/cuda/linux-x86_64/libjnicudart.so: libcudart.so.11.0: cannot open shared object file: No such file or directory
Exception in thread "main" java.lang.ExceptionInInitializerError
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.entryPoint(MnistRetrainingMain.java:27)
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.main(MnistRetrainingMain.java:22)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
        at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5072)
        at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:290)
        ... 14 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
        at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:211)
        at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5068)
        ... 15 more
22/09/01 01:59:23 INFO ShutdownHookManager: Shutdown hook called
22/09/01 01:59:23 INFO ShutdownHookManager: Deleting directory /tmp/spark-729df63b-8c13-4644-ac1d-54a189efed63

@Nour-Rekik what’s your pom.xml? Are you trying to use nd4j-cuda-platform or just nd4j-cuda? If you are just using that then you also need to includ the relevant version of javacpp’s cuda in your pom.xml. We do this for you in the nd4j-cuda-platform dependency normally.

Your problem also could be a cuda mismatch. I think you mentioned compiling for cuda 11.1 or something? Your cuda version should exactly match otherwise you will get errors like this.

I am using just nd4j-cuda. I build from source using cuda 11.4 and I am loading cuda 11.4 from the HPC.
Here my pom.xml:

    <properties>
        <dl4j-master.version>1.0.0-SNAPSHOT</dl4j-master.version>
        <!-- Change the nd4j.backend property to nd4j-cuda-X-platform to use CUDA GPUs -->
        <!--nd4j.backend>nd4j-cuda-11.0</nd4j.backend-->
        <!--nd4j.backend>nd4j-native</nd4j.backend-->
        <nd4j.backend>1.0.0-SNAPSHOT</nd4j.backend>
        <java.version>11</java.version>
        <shadedClassifier>bin</shadedClassifier>
        <scala.binary.version>2.12</scala.binary.version>
        <maven-compiler-plugin.version>3.8.1</maven-compiler-plugin.version>
        <maven.minimum.version>3.3.1</maven.minimum.version>
        <exec-maven-plugin.version>1.4.0</exec-maven-plugin.version>
        <maven-shade-plugin.version>2.4.3</maven-shade-plugin.version>
        <jcommon.version>1.0.23</jcommon.version>
        <jfreechart.version>1.0.13</jfreechart.version>
        <logback.version>1.1.7</logback.version>
        <jcommander.version>1.81</jcommander.version>
        <spark.version>2.4.0</spark.version>
        <jackson.version>2.5.1</jackson.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <repositories>
        <repository>
            <id>snapshots-repo</id>
            <url>https://oss.sonatype.org/content/repositories/snapshots</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
                <updatePolicy>daily</updatePolicy>  <!-- Optional, update daily -->
            </snapshots>
        </repository>
    </repositories>

    <build>
        <plugins>


            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>exec-maven-plugin</artifactId>
                <version>${exec-maven-plugin.version}</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>exec</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <executable>java</executable>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>${maven-shade-plugin.version}</version>
                <configuration>
                    <shadedArtifactAttached>true</shadedArtifactAttached>
                    <shadedClassifierName>${shadedClassifier}</shadedClassifierName>
                    <createDependencyReducedPom>true</createDependencyReducedPom>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>org/datanucleus/**</exclude>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <!--      Added to enable jar creation using mvn command-->

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.3.0</version>
                <configuration>
                    <!--outputDirectory>target/my-target-dir</outputDirectory-->
                    <archive>
                        <manifest>
                            <mainClass>fully.qualified.MainClass</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <!--finalName>test</finalName>
                    <appendAssemblyId>false</appendAssemblyId-->
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <!-- bind to the packaging phase -->
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <configuration>
                    <source>${java.version}</source>
                    <target>${java.version}</target>
                </configuration>
            </plugin>
        </plugins>
    </build>


    <dependencies>


        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
      
        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-11.4</artifactId>
            <version>${nd4j.backend}</version>
            <classifier>linux-x86_64</classifier>
        </dependency>


      

      



                <dependency>
                    <groupId>org.datavec</groupId>
                    <artifactId>datavec-spark_${scala.binary.version}</artifactId>
                    <version>${dl4j-master.version}</version>
                </dependency>
                <dependency>
                    <groupId>org.deeplearning4j</groupId>
                    <artifactId>dl4j-spark_${scala.binary.version}</artifactId>
                    <version>${dl4j-master.version}</version>
                </dependency>
                <dependency>
                    <groupId>org.deeplearning4j</groupId>
                    <artifactId>dl4j-spark-parameterserver_${scala.binary.version}</artifactId>
                    <version>${dl4j-master.version}</version>
                </dependency>
                <dependency>
                    <groupId>com.beust</groupId>
                    <artifactId>jcommander</artifactId>
                    <version>${jcommander.version}</version>
                </dependency>


        <!-- Used for patent classification example -->
        
        
        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-core</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency>
        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-cuda-11.4</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency>






        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.10.2</version>
        </dependency>

    </dependencies>




</project>

@Nour-Rekik first of all…get rid of deeplearning4j-cuda. In order to even access that you have to be using an old dl4j build. Could you tell me where you got that? If you clone from source you shouldn’t see that anymore. You can see all the relevant modules here: deeplearning4j/deeplearning4j at master · deeplearning4j/deeplearning4j · GitHub

Second of all, you need 3 more things:

  1. the nd4j-cuda-11.4 without the classifier. This contains the actual classes. The dependency classifier just contains the relevant c++ libraries for a given platform.

  2. Remove deeplearning4j-cuda-11.4 - that shouldn’t even be in the most recent source tree as mentioned above.

  3. (Optionally): Depending on whether you want to use cudnn or not (try this later maybe?) you can also specify -Dlibnd4j.helper=cudnn to your build then you’ll need the linux-x86_64-cudnn classifier instead.
    So add this:

 <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-11.4</artifactId>
            <version>${nd4j.backend}</version>
        </dependency>

As mentioned before since you’re not using -platform you’ll also need a relevant cuda. Add:

  <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>cuda-platform</artifactId>
            <version>11.4-8.2-1.5.6</version>
        </dependency>

Ensure you also specify -Djavacpp.platform=linux-x86_64 when you build your uber jar (the jar-with-dependencies) that will ensure that when using the cuda-platform module it only includes the jars you need.

Note you can find these javacpp/cuda versions in the change-cuda-versions.sh for your reference.

Try that and let me know if you run in to any issues.

Okay thank you I will try and let you know!

I am sorry, I had problems with the HPC.
Now I changed what you have said to me and I am getting this error:

22/09/02 13:09:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Versions of org.bytedeco:javacpp:1.5.7 and org.bytedeco:cuda:11.4-8.2-1.5.6 do not match.
22/09/02 13:10:01 INFO Nd4jBackend: Loaded [JCublasBackend] backend
22/09/02 13:10:05 INFO NativeOpsHolder: Number of threads used for linear algebra: 32
Exception in thread "main" java.lang.ExceptionInInitializerError
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.entryPoint(MnistRetrainingMain.java:27)
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.main(MnistRetrainingMain.java:22)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
        at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5205)
        at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5070)
        at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:290)
        ... 14 more
Caused by: java.lang.NullPointerException
        at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5148)
        ... 16 more
22/09/02 13:10:05 INFO ShutdownHookManager: Shutdown hook called

@Nour-Rekik how are you setting your uber jar up? That looks like it could be a resource inclusion problem. Nd4j reads from a configuration when it starts to figure out what you’re running and what classes to create for different functions.

This is throwing an error trying to load one of those classes.

These are the dependencies I am using to generate the jar-with-dependency file:

<build>
        <plugins>


            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>exec-maven-plugin</artifactId>
                <version>${exec-maven-plugin.version}</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>exec</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <executable>java</executable>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>${maven-shade-plugin.version}</version>
                <configuration>
                    <shadedArtifactAttached>true</shadedArtifactAttached>
                    <shadedClassifierName>${shadedClassifier}</shadedClassifierName>
                    <createDependencyReducedPom>true</createDependencyReducedPom>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>org/datanucleus/**</exclude>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <!--      Added to enable jar creation using mvn command-->

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.3.0</version>
                <configuration>
                    <!--outputDirectory>target/my-target-dir</outputDirectory-->
                    <archive>
                        <manifest>
                            <mainClass>fully.qualified.MainClass</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <!--finalName>test</finalName>
                    <appendAssemblyId>false</appendAssemblyId-->
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <!-- bind to the packaging phase -->
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

Then I am running mvn clean package -Djavacpp.platform=linux-x86_64
Then I am submitting the jar file with spark submit like that: spark-submit --conf spark.driver.memory=400g --class com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain target/deepLearningSimpleOne-1.0-SNAPSHOT-jar-with-dependencies.jar

I am asking about Spark version as well: in the build it’s the version 3.3.0 and my maven dependency is the same but from the HPC I am loading the version 3.0.1. Is it fine with that ?

Not too sure about the compatibility there. There might be something we have to do in order to make 3.x work but you’re welcome to try.

Your maven-shade-plugin doesn’t look anything different from the norm…are you sure that was the full stack trace?

Sometimes an issue with cuda linking against different libraries can be an issue. Those kinds of errors sometimes get buried. Could you add:
-Dorg.bytedeco.javacpp.logger.debug=true to the output to see?
This will show what’s being loaded when it starts the worker.
Thanks!

I am getting the same error

nore667e@taurusi8005:~/bigdl_pipeline> spark-submit --conf spark.driver.memory=400g --class com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain  target/deepLearningSimpleOne-1.0-SNAPSHOT-jar-with-dependencies.jar
22/09/02 15:39:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Versions of org.bytedeco:javacpp:1.5.7 and org.bytedeco:cuda:11.4-8.2-1.5.6 do not match.
22/09/02 15:39:02 INFO Nd4jBackend: Loaded [JCublasBackend] backend
22/09/02 15:39:04 INFO NativeOpsHolder: Number of threads used for linear algebra: 32
Exception in thread "main" java.lang.ExceptionInInitializerError
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.entryPoint(MnistRetrainingMain.java:27)
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.main(MnistRetrainingMain.java:22)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
        at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5205)
        at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5070)
        at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:290)
        ... 14 more
Caused by: java.lang.NullPointerException
        at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5148)
        ... 16 more
22/09/02 15:39:04 INFO ShutdownHookManager: Shutdown hook called
22/09/02 15:39:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea1d8020-1433-4b20-b25d-692df3ef2a4e

Could you do what I asked and set the additional property? Your stack trace and logs not changing means you either didn’t do it or you might need to do it on the workers. Depending on where the problem is the driver or the workers you can do that here: scala - How to pass -D parameter or environment variable to Spark job? - Stack Overflow

Until I know more your stack trace here still isn’t telling me a root cause. The property it’s complaining about has been around for years at this point so it shouldn’t be anything related ot recent changes either.

Here is my stack trace after adding the property :

nore667e@taurusi8023:~/bigdl_pipeline> spark-submit --conf spark.driver.memory=400g --conf spark.executor.extraJavaOptions=-Dorg.bytedeco.javacpp.logger.debug=true  --class com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain  target/deepLearningSimpleOne-1.0-SNAPSHOT-jar-with-dependencies.jar 
22/09/02 18:56:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Versions of org.bytedeco:javacpp:1.5.7 and org.bytedeco:cuda:11.4-8.2-1.5.6 do not match.
22/09/02 18:56:49 INFO Nd4jBackend: Loaded [JCublasBackend] backend
22/09/02 18:56:54 INFO NativeOpsHolder: Number of threads used for linear algebra: 32
Exception in thread "main" java.lang.ExceptionInInitializerError
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.entryPoint(MnistRetrainingMain.java:27)
        at com.examples.DeepLearningOnSpark.mnist_image.streaming_approach.MnistRetrainingMain.main(MnistRetrainingMain.java:22)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
        at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5205)
        at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5070)
        at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:290)
        ... 14 more
Caused by: java.lang.NullPointerException
        at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5148)
        ... 16 more
22/09/02 18:56:54 INFO ShutdownHookManager: Shutdown hook called
22/09/02 18:56:54 INFO ShutdownHookManager: Deleting directory /tmp/spark-5aede481-64fa-4d97-86ad-b532d468bff4