Cuda on jetson nano


i got the following issue:

Could not resolve dependencies for project test-SNAPSHOT: The following artifacts could not be resolved: org.nd4j:nd4j-cuda-10.2:jar:linux-arm64:1.0.0-beta7, org.bytedeco:cuda:jar:linux-arm64:10.2-7.6-1.5.3: Failure to find org.nd4j:nd4j-cuda-10.2:jar:linux-arm64:1.0.0-beta7 in was cached in the local repository, resolution will not be reattempted until the update interval of has elapsed or updates are forced

basically i am trying to use cuda 10.2 on jetson nano board, that come preinstalled with cuda 10.2.
using 10.0 it’s ok (from pom) but then it’s not able to load the correct runtime.

do you think that it’s possible to add on repo the 10.2 version for arm? or any way to get it?

thanks for helping!

At the time of the beta7 release, there was no cuda 10.2 for the Nano yet, so that’s why it isn’t available.

As we’ve changed our CI since the last release, it will probably take a while before we have a new version that supports the newest JetPack version for the Jetson Nano.

Maybe @agibsonccc can comment on that, as he has been working on the new CI infrastructure.

Thanks Treo,

any way to compile it locally? it’s gonna be an hard task?

I’m not sure if there were any special things you needed to do to make it work on the jetson nano.

If you are building on the nano itself, it should be as simple as this:

  1. set the correct cuda version with
./ 10.2
  1. build and install it into your local repository with
mvn -Djavacpp.platform=linux-arm64 -Dlibnd4j.compute=5.3 -Dlibnd4j.chip=cuda -DskipTests clean install

Thanks for your support.

i did try, but i got
root@rama-jetson:/home/rama/Scaricati/deeplearning4j# ./ 10.2
Updating CUDA versions in pom.xml files to CUDA 10.2
sed: can’t read : No such file or directory
sed: can’t read : No such file or directory
sed: can’t read : No such file or directory
sed: can’t read : No such file or directory
sed: can’t read : No such file or directory
sed: can’t read : No such file or directory
sed: can’t read : No such file or directory
(repeated forever)

do you have some hint on that?

after checking pom.xml seems that the version got updated, so maybe it’s safe to ignore the warning, i’ll let you know if the build went fine, i think it’s gonna take a while

@ramarro123 we can compile our c++ code base for jetson nano, and have made sure that this step works. However, our cuda bindings we use from javacpp only support cuda 10.0 on the jetson nano.
You can see those here: Central Repository: org/bytedeco/cuda/10.0-7.4-1.5

In order to make this work you would have to compile those cuda bindings for the latest version of cuda.

There isn’t enough ROI for us to do all of this right now given other priorities. If you would like to attempt this yourself, I’m happy to try to point you at the particular steps. Otherwise, there aren’t a lot of alternatives right now.

i want to leave a little feedback, maybe it’s useful for others.

the easy things here it’s just to install a prev version of jetson sdk, the correct image with cuda 10.0 is

with this version, after adding all deps i got

2021-05-06 10:32:47.956 INFO [ main] d4jBackend : Loaded [JCublasBackend] backend
2021-05-06 10:32:52.826 INFO [ main] eOpsHolder : Number of threads used for linear algebra: 32
2021-05-06 10:32:52.952 INFO [ main] xecutioner : Backend used: [CUDA]; OS: [Linux]
2021-05-06 10:32:52.954 INFO [ main] xecutioner : Cores: [4]; Memory: [4,0GB];
2021-05-06 10:32:52.955 INFO [ main] xecutioner : Blas vendor: [CUBLAS]
2021-05-06 10:32:52.986 INFO [ main] lasBackend : ND4J CUDA build version: 10.0.326
2021-05-06 10:32:52.991 INFO [ main] lasBackend : CUDA device 0: [NVIDIA Tegra X1]; cc: [5.3]; Total memory: [4156817408]

and i think it’s the correct log (correct me if i am wrong)

1 Like

@ramarro123 yeah that looks great! Thanks for posting your solution.

here i am again :slight_smile:

i had time to play again with jetson, and unfortunately after the logging that i posted above, the code suddenly crash (not even run fit, just create a MultiLayerInstance)

that’s the output

mag 18, 2021 8:38:20 PM com.github.fommil.jni.JniNamer arch
AVVERTENZA: unrecognised architecture: aarch64
mag 18, 2021 8:38:21 PM com.github.fommil.netlib.ARPACK
AVVERTENZA: Failed to load implementation from: com.github.fommil.netlib.NativeSystemARPACK
mag 18, 2021 8:38:21 PM com.github.fommil.jni.JniNamer arch
AVVERTENZA: unrecognised architecture: aarch64
mag 18, 2021 8:38:21 PM com.github.fommil.netlib.ARPACK
AVVERTENZA: Failed to load implementation from: com.github.fommil.netlib.NativeRefARPACK

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x0000007ef34398b8, pid=10062, tid=10147

JRE version: OpenJDK Runtime Environment (11.0.11+9) (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)

Java VM: OpenJDK 64-Bit Server VM (11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, tiered, compressed oops, g1 gc, linux-aarch64)

Problematic frame:

C [] samediff::ticket::acquiredThreads(unsigned int)+0x0

Core dump will be written. Default location: Core dumps may be processed with “/usr/share/apport/apport %p %s %c %d %P” (or dumping to /home/rama/java/test/core.10062)

An error report file with more information is saved as:


If you would like to submit a bug report, please visit:

Bugs : openjdk-lts package : Ubuntu

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

let me know if the core dump can help

@ramarro123 t
Nd4j does not use this library. We used to but it hasn’t been maintained and the way it interfaces with native memory is at best subpar.

It especially does not work with gpus. That is a completely cpu library.

Could you give me an idea if something in our docs gave you an indication we were related to this somehow? If so, I’d like to fix this. I appreciate you working with me here.

i think i did a mistake on pom.xml as the modified version of code didn’t correlty report “cublas” and nvidia.

after modify it started again, but i am still facing some issue with a core dump after 10 mins of training.

i will be back on that matter when i find some more evidence.

Just a quick question, can i start various threads with their own MultiLayerNetwork? that’s allowed? i guess yes, as i create new objects for every thread and they don’t share anything, but better to double check before start other tests

@ramarro123 you generally should assume that networks are not thread safe. The jetson has so few resources, why are you multi threading? Is there a need for that? The point of it is to just offload work to the gpu. Multiple networks in ram will push the nano to its limits (which are only 4g of unified host and device memory) - you have to account for the JVM process taking up space as well.

It’s just to train 2 different network at the same time (different params, different input, just 2 different problem with some common logic on how to get the data from the source) but yeah, i can run one per time and play safe :smiley:

@ramarro123 yeah just play it safe then. It’s easier as a whole to deal with. We wrote parallelwrapper for larger machines if people want to do parallel training of a single model (multiple minibatches in multiple threads, average the changes)

Parallel training of 100 models is better way to train them in some cases in comparasion with single model training. Model can get to local minimum which is not absolute. So training multiple models can reduce that chance or affect.