Requesting help with POM for GPU support GeForce RTX 3090

Hello everyone, hope someone could steer me in the right direction. I’ve been at it for a few (unsuccessful) days. As the topic line suggests, I’m trying to use DL4J with GPUs. I have a very simple sample program to attempt to initialize CUDA:

import org.nd4j.linalg.api.buffer.DataType;
import org.nd4j.linalg.factory.Nd4j;

public class Word2VecNeuralNetwork
{
        public static void main(String[] args)
        {
                Nd4j.create(1);

System.out.println("HERE");
        }

}

It compiles fine but when I attempt to run it I get:

02:37:10.354 [Word2VecNeuralNetwork.main()] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
02:37:10.423 [Word2VecNeuralNetwork.main()] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
02:37:10.424 [Word2VecNeuralNetwork.main()] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.
02:37:10.517 [Word2VecNeuralNetwork.main()] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
02:37:10.518 [Word2VecNeuralNetwork.main()] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.

and it just hangs there. I can’t seem to figure out where to find the missing dependencies. My POM file follows. Any suggestions appreciated …

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>dl4j-cuda-specific-examples</artifactId>
    <version>1.0.0-beta7</version>
    <name>DeepLearning4j CUDA special examples</name>



<profiles>
  <profile>
     <id>allow-snapshots</id>
        <activation><activeByDefault>true</activeByDefault></activation>
     <repositories>
       <repository>
         <id>snapshots-repo</id>
         <url>https://oss.sonatype.org/content/repositories/snapshots</url>
         <releases><enabled>true</enabled></releases>
         <snapshots><enabled>true</enabled></snapshots>
       </repository>
     </repositories>
   </profile>
</profiles>




    <properties>
      <dl4j-master.version>1.0.0-SNAPSHOT</dl4j-master.version>
      <nd4j.backend>nd4j-cuda-11.0</nd4j.backend>
      <java.version>1.8</java.version>
      <exec-maven-plugin.version>1.4.0</exec-maven-plugin.version>
      <maven-shade-plugin.version>2.4.3</maven-shade-plugin.version>
      <jcommon.version>1.0.23</jcommon.version>
      <logback.version>1.1.7</logback.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.nd4j</groupId>
                <artifactId>nd4j-cuda-9.2</artifactId>
                <version>${dl4j-master.version}</version>
            </dependency>
            <dependency>
                <groupId>org.nd4j</groupId>
                <artifactId>nd4j-cuda-10.0</artifactId>
                <version>${dl4j-master.version}</version>
            </dependency>
            <dependency>
                <groupId>org.nd4j</groupId>
                <artifactId>nd4j-cuda-10.1</artifactId>
                <version>${dl4j-master.version}</version>
            </dependency>
            <dependency>
                <groupId>org.nd4j</groupId>
                <artifactId>nd4j-cuda-10.2</artifactId>
                <version>${dl4j-master.version}</version>
            </dependency>
            <dependency>
                <groupId>org.nd4j</groupId>
                <artifactId>nd4j-cuda-11.0</artifactId>
                <version>${dl4j-master.version}</version>
            </dependency>
            <dependency>
                <groupId>org.freemarker</groupId>
                <artifactId>freemarker</artifactId>
                <version>2.3.29</version>
            </dependency>
            <dependency>
                <groupId>io.netty</groupId>
                <artifactId>netty-common</artifactId>
                <version>4.1.42.Final</version>
            </dependency>
      </dependencies>
    </dependencyManagement>

    <dependencies>
        <!-- Dependency for parallel wrapper (for multi-GPU parameter averaging -->
        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-parallel-wrapper</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency>

        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-core</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency>
        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-ui</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency>
        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-zoo</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency>
        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>${nd4j.backend}</artifactId>
        </dependency>
        <!-- datavec-data-codec: used only in video example for loading video data -->
        <dependency>
            <artifactId>datavec-data-codec</artifactId>
            <groupId>org.datavec</groupId>
            <version>${dl4j-master.version}</version>
        </dependency>

        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>${logback.version}</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>exec-maven-plugin</artifactId>
                <version>${exec-maven-plugin.version}</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>exec</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <executable>java</executable>
                        <mainClass>Word2VecNeuralNetwork</mainClass>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>${maven-shade-plugin.version}</version>
                <configuration>
                    <shadedArtifactAttached>true</shadedArtifactAttached>
                    <shadedClassifierName>${shadedClassifier}</shadedClassifierName>
                    <createDependencyReducedPom>true</createDependencyReducedPom>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>org/datanucleus/**</exclude>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <configuration>
                    <source>${java.version}</source>
                    <target>${java.version}</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

The error doesn’t match your pom.xml.

I’d expect to see an error like that with a DL4J version from 2016, not with beta7 or snapshots.

What are you doing? How have you set up your project? Are you using an IDE? Have you tried using maven from the command line?

treo
February 3

The error doesn’t match your pom.xml.

I’d expect to see an error like that with a DL4J version from 2016, not with beta7 or snapshots.

*** Hi Treo, txs for the very quick response (muchly appreciated). In answer to your questions:

What are you doing?

*** At this point, not much. All I want it to do is the simple code example I gave that just does a Nd4j.clear(1) to initialize the library/CUDA and then outputs “HERE”… It never gets to printing “HERE”. Just hangs on the clear(1) with the library load errors shown.

How have you set up your project?

*** Completely at the BASH shell (no ide). Using Maven and a POM file. I’ve been using DL4J for a while now (great framework) but always under the -native platform (CPU only) and it worked fine. I finally had access to GPUs and this is when my problem began. The POM file I gave was obviously copied from “-examples” but that was as a last-resort attempt to get my POM working. The error occurs just the same (hangs on missing/not found libraries).

Are you using an IDE?

*** No, as above … 100% CLI with Maven POM

Ok, then let’s try the following:

  1. Copy the following example: deeplearning4j-examples/mvn-project-template at master · eclipse/deeplearning4j-examples · GitHub
  2. Change it to use the cuda backend by removing the native dependency, uncommenting the cuda dependencies and using cuda version 10.2 instead of the 9.2 that is still used there

And then build the project with mvn package and run it with java -jar target/deeplearning4j-example-sample-1.0.0-beta7.jar (or whatever the actual uberjar name will be).

And then please share the full output.

I have a hunch, that it is recompiling the ptx code for your gpu, and that can easily take forever. But once that forever has passed, it should be cached somewhere on the system and from then on load faster.

Unfortunately, the last snapshot with Cuda 11 support is outdated, so if version 10.2 doesn’t work for you, you’ll have to wait for the next release.

txs again for your help. So I first tried to run the example with CPU native support to make sure it works. It works fine. After making the changes to CUDA as per the instructions I get this:

o.d.e.s.LeNetMNIST - Load data…
o.d.e.s.LeNetMNIST - Build model…
o.n.l.f.Nd4jBackend - Loaded [JCublasBackend] backend
o.n.n.NativeOpsHolder - Number of threads used for linear algebra: 64
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Linux]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [255]; Memory: [30.0GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [CUBLAS]
o.n.l.j.JCublasBackend - ND4J CUDA build version: 10.2.89
o.n.l.j.JCublasBackend - CUDA device 0: [GeForce RTX 3090]; cc: [8.6]; Total memory: [25447170048]
o.n.l.j.JCublasBackend - CUDA device 1: [GeForce RTX 3090]; cc: [8.6]; Total memory: [25447170048]
o.d.n.m.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
Exception in thread “main” java.lang.RuntimeException: cudaGetSymbolAddress(…) failed; Error code: [13]
at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.createShapeInfo(CudaExecutioner.java:2162)
at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:3280)
at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:74)
at org.nd4j.jita.constant.ProtectedCudaShapeInfoProvider.createShapeInformation(ProtectedCudaShapeInfoProvider.java:92)
at org.nd4j.jita.constant.ProtectedCudaShapeInfoProvider.createShapeInformation(ProtectedCudaShapeInfoProvider.java:73)
at org.nd4j.linalg.jcublas.CachedShapeInfoProvider.createShapeInformation(CachedShapeInfoProvider.java:42)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:212)
at org.nd4j.linalg.jcublas.JCublasNDArray.(JCublasNDArray.java:386)
at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1524)
at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1519)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4298)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3986)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:693)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:609)
at org.deeplearning4j.examples.sample.LeNetMNIST.main(LeNetMNIST.java:115)

It looks like unfortunately it doesn’t work with the 10.2 version of CUDA and you will have to wait for the next DL4J release (which is fortunately scheduled to be within the next few weeks) or build from source.

However, that isn’t exactly easy, and given that you’ve struggled to get this far, I suggest to wait for the next release.

Is there any update on this? I’m getting the same error, with my 3090 trying with this pom

       <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-cuda-11.2</artifactId>
            <version>1.0.0-M1</version>
        </dependency>
        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-core</artifactId>
            <version>1.0.0-M1</version>
        </dependency>
        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-11.2</artifactId>
            <version>1.0.0-M1</version>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>cuda-platform-redist</artifactId>
            <version>11.2-8.1-1.5.5</version>
        </dependency>

@chinproisbestpro this error always (and I mean always) stems from clashing cuda versions. You likely have multiple versions you’re loading.

I tried uninstalling all other CUDA versions from my computer (Windows) then restarting my computer but I’m still getting the same error.
My nvcc output

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_22:08:44_Pacific_Standard_Time_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

nvidia-smi output

| NVIDIA-SMI 466.63       Driver Version: 466.63       CUDA Version: 11.3

@chinproisbestpro that’s proof right there you have a different cuda version. Uninstall cuda completely just leaving the driver and use the redist artifact. We’re only supporting cuda 11.2 till @saudet updates the presets for 11.3 in a new release.

Again, only 1 version of cuda should be present. If you don’t want to use the redist artifact, then install cuda 11.2 and only cuda 11.2.

Looking at Different CUDA versions shown by nvcc and NVIDIA-smi - Stack Overflow, the CUDA output from nvidia-smi is the highest CUDA version that the driver supports.
Whereas the output from nvcc -V is the actual CUDA version installed on the computer.

I tried uninstalling all CUDA from my machine then reinstalling only CUDA 11.2 but I’m still running into the same issue.

Is this the redist artifact?

<dependency>
       <groupId>org.bytedeco</groupId>
       <artifactId>cuda-platform-redist</artifactId>
       <version>11.2-8.1-1.5.5</version>
 </dependency>

When I uninstall all CUDA versions and use the above dependency, I get
java.lang.UnsatisfiedLinkError: C:\Users\Name\.javacpp\cache\nd4j-cuda-11.2-1.0.0-M1-windows-x86_64.jar\org\nd4j\nativeblas\windows-x86_64\jnind4jcuda.dll: Can't find dependent libraries

@chinproisbestpro if you use redist you shouldn’t install anything. Instead just have no cuda installations on your computer. Any cuda install in your path will cause problems normally.

Unfortunately that didn’t work either. Maybe I’ll have to try a fresh install of windows sometime to make sure there’s no leftover CUDA somewhere messing everything up

@chinproisbestpro sorry about that, I’m not really sure without really looking at your whole computer, but my local install is just not having anything under the cuda toolkit directory.
On my computer this directory:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA

is completely empty and that’s generally how I leave it.

Drivers are installed separately under:
C:\Program Files\NVIDIA Corporation

Beyond that, you could try docker or WSL as alternatives if you can’t get things working.
Note that windows typically looks up DLLs on your path so you could double check that as well.

I was able to get the redist artifact working by uninstalling the Nvidia Frameview SDK.
I’ll try reinstalling my CUDA later to see if it was just that Nvidia Frameview SDK messing everything up on a local CUDA install too

Reinstalled CUDA 11.2 and CUDNN 8.2 and it works. However, the performance is 4x slower than my CPU… My CPU is 5600x and my GPU is an 3090.

My Pom file

<dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-cuda-11.2</artifactId>
            <version>1.0.0-M1</version>
        </dependency>
        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-core</artifactId>
            <version>1.0.0-M1</version>
        </dependency>
        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-11.2</artifactId>
            <version>1.0.0-M1</version>
        </dependency>

Checking MSI afterburner the GPU clock and GPU mem clock are maxing out so it’s being used. Not sure why it’s so slow though

Great that you’ve figured it out!

It depends on the structure of your model and your mini-batch size. For small and recurrent models, the communication overhead can easily eat up all the benefits of using the GPU.