nd4j-native spent 35 seconds to handle a file, and nd4j-cuda-11.6 spent 6 seconds. I installed cudnn by sudo rpm -ivh app/cudnn-local-repo-rhel8-126.96.36.199-1.0-1.x86_64.rpm, and don’t know where the cudnn is installed. How to check if the cudnn is in used in trainning?
You have to set that up yourself manually declaring the cuda dependeny, nd4j-cuda-11.6 and nd4j-cuda-11.6 with the correct classifier.
I did use linux-x86_64-cudnn, but the performance(speed) is the same as linux-x86_64. So I am not sure if cudnn was in used?
@SidneyLann could you post the top of the run log with nd4j initialization? It will print what it’s using in the build information at the top.
2022-07-05 03:58:18.626 INFO 10523 — [ main] org.nd4j.linalg.factory.Nd4jBackend : Loaded [JCublasBackend] backend
2022-07-05 03:58:41.887 INFO 10523 — [ main] org.nd4j.nativeblas.NativeOpsHolder : Number of threads used for linear algebra: 32
2022-07-05 03:58:41.961 INFO 10523 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Backend used: [CUDA]; OS: [Linux]
2022-07-05 03:58:41.962 INFO 10523 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Cores: ; Memory: [7.8GB];
2022-07-05 03:58:41.962 INFO 10523 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Blas vendor: [CUBLAS]
2022-07-05 03:58:41.975 INFO 10523 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : ND4J CUDA build version: 11.6.112
2022-07-05 03:58:41.978 INFO 10523 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : CUDA device 0: [NVIDIA GeForce GTX 1080 Ti]; cc: [6.1]; Total memory: 
2022-07-05 03:58:41.978 INFO 10523 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : Backend build information:
STD version: 201103L
2022-07-05 03:58:57.842 WARN 10523 — [ main] o.n.i.converters.ImportClassMapping : Duplicate TF op mapping found for op Pow: org.nd4j.linalg.api.ops.impl.scalar.Pow vs org.nd4j.linalg.api.ops.impl.transforms.custom.Pow
2022-07-05 03:58:57.850 WARN 10523 — [ main] o.n.i.converters.ImportClassMapping : Duplicate TF op mapping found for op FloorMod: org.nd4j.linalg.api.ops.impl.transforms.pairwise.arithmetic.FModOp vs org.nd4j.linalg.api.ops.impl.transforms.pairwise.arithmetic.FloorModOp
2022-07-05 03:58:59.447 WARN 10523 — [ main] o.n.a.functions.DifferentialFunction : No fields found for property name labelsSmoothing for class org.nd4j.linalg.api.ops.impl.loss.SoftmaxCrossEntropyLoss
@SidneyLann this says how it’s compiled and it appears to have cudnn there. If you have any questions on that I’d be happy to show you the full build process and you can comment on that if you want. In general though, the speed being the same maybe due to the way you’re feeding the data in.
Yes. 2 years ago I trained a nn, it only speed up for Activation.TANH in cudnn. The speed is the same as cuda for other activation functions.
So I would consider the speed is the same as cuda in some cases.