A low utilization isn’t quite unexpected when you have a rather small model working on rather small data. But I would still expect a bit more than 1 to 2 percent on your GPU. How are you measuring that?
As for your crash… that is pretty weird. Especially because the crash dump you’ve shared previously happens on pointer deallocation and you’ve got plenty of free memory.
Have the following issue with the latest snapshot:
I have also used Cuda 10.2. Can I then possibly also upgrade to 11.2?
Exception in thread "main" java.lang.NoClassDefFoundError: org/nd4j/common/config/ND4JClassLoading
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:137)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5092)
at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:270)
at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(NeuralNetConfiguration.java:577)
Caused by: java.lang.ClassNotFoundException: org.nd4j.common.config.ND4JClassLoading
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 6 more
12:22:58.799 [main] INFO org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
12:23:00.379 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper] of provided class-loader.
Exception in thread "main" java.lang.NullPointerException: Attempted to load class org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper but failed. No class found with this name.
at org.nd4j.common.base.Preconditions.throwNullPointerEx(Preconditions.java:643)
at org.nd4j.common.base.Preconditions.checkNotNull(Preconditions.java:467)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:100)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:89)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.initializeHelper(ConvolutionLayer.java:76)
@ForceUpdate1 we’re publishing docs on this for the release, I apologize for some of the side effects of being an early adopter
Basically to use cudnn, there’s actually 2 ways:
Now as for what you’re seeing, This is a bug that has been fixed recently. Please run mvn -U to ensure you get the updated dependencies. I would recommend trying #1 to see if that works for you.
It doesn’t look like it’s going to work. I once built the project with Maven “clean install -U” that should update the dependencies.
When I start the project from Intellij using Run, the same error occurs. Even after I have invalidated the cache.
If I start the output jar from maven this error comes:
13:50:25.205 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
13:50:25.208 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
13:50:25.208 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.
13:50:25.209 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
13:50:25.209 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.
13:50:26.757 [main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 32
13:50:26.790 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
13:50:26.790 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [8]; Memory: [7,1GB];
13:50:26.790 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
13:50:26.795 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 11.2.142
13:50:26.796 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [NVIDIA GeForce GTX 1070]; cc: [6.1]; Total memory: [8589934592]
13:50:26.797 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - Backend build information:
MSVC: 192829914
STD version: 201402L
CUDA: 11.2.142
DEFAULT_ENGINE: samediff::ENGINE_CUDA
HAVE_FLATBUFFERS
13:50:26.830 [main] INFO org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
13:50:27.317 [main] ERROR org.nd4j.linalg.compression.BasicNDArrayCompressor - Error loading ND4J Compressors via service loader: No compressors were found. This usually occurs when running ND4J UI from an uber-jar, which was built incorrectly (without services resource files being included)
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.nd4j.linalg.factory.Nd4j.getCompressor(Nd4j.java:5314)
at org.nd4j.linalg.api.ndarray.BaseNDArray.get(BaseNDArray.java:4122)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:706)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:604)
Caused by: java.lang.RuntimeException: Error loading ND4J Compressors via service loader: No compressors were found. This usually occurs when running ND4J UI from an uber-jar, which was built incorrectly (without services resource files being included)
at org.nd4j.linalg.compression.BasicNDArrayCompressor.loadCompressors(BasicNDArrayCompressor.java:66)
at org.nd4j.linalg.compression.BasicNDArrayCompressor.<init>(BasicNDArrayCompressor.java:46)
at org.nd4j.linalg.compression.BasicNDArrayCompressor.<clinit>(BasicNDArrayCompressor.java:39)
... 6 more
@ForceUpdate1 that particular stack trace is completely unrelated to your problem. It’s complaining about some files that are missing. it actually looks like it loads the build fine. How are you building your jar exactly?
Edit: Just so you know what you’re looking at, nd4j leverages service providers for its various implementations of different interfaces like the compressor:
You can find those in every backend.
Here’s the cpu version:
I would encourage you to try again and ensure the right files are present, I highly doubt this is a bug (hence the very specific error message provided there)
I do not know if anything else has been changed, but today it works at once, with the snapshot version. Also the high ETL is gone. Thanks a lot anyway.
I just have a few questions now:
Are the models I trained with the 1.0.0-SNAPSHOT compatible with the 1.0.0-beta7?
I also took a closer look at the paper and came across the following:
Any character exceeding length l0 is ignored, and any characters
that are not in the alphabet including blank characters are quantized as all-zero vectors.
Is it possible that I can somehow remove the zero from my one-hot encoding and replace it with a zero vector?
Now that the Cuda version is running, can I somehow still improve performance / GPU utilization with my model?
I just had a look in the task manager and there it is now between 2-3%.
I had already read through the wiki article and reduced the batchsize, but maybe there are tips aside from that, e.g. specifically for my model.
@ForceUpdate1 yes the models are compatible. After the release, I’ll encourage you to use newer versions where possible.
Generally, utilization comes down to always ensuring data is prepared and each batch size is big enough to warrant being sent to the gpu. It shouldn’t be your only metric though. Focus on bringing down overall training times. This could be components like pre saving your ETL pipeline so it only runs once, ensuring you async prepare your next batch for training among other things.
So I went through the wiki article on performance again. Currently one epoch takes about 3 minutes. There are 25000 records, my minibatch size is currently 64.
According to the performance listener, my ETL time is already constantly at zero and also my GC time is at zero.
One thing you might want to try here is to reduce the stats collecting, esp. on cuda that sometimes becomes a bottleneck.
The log you’ve shared does however clearly say that you are on CPU, so I’m a bit confused. If you are using the CPU backend, your cuda utilization obviously will be close to zero.
Ok, I have removed all the listeners, but it makes almost no difference.
I currently want to adjust the hyperparameters and it’s a bit annoying when it always takes 15min to get results. Maybe you have a tip, which parameters I could choose better?
I switched back to 1.0.0-beta7 for now, because there was issues with the snapshot version. And with the 1.0.0-beta7 I can’t use Cuda, because I currently have Cuda 11.2 installed.
You should have said that in the previous post. Because until now you were asking about cuda performance.
As for training your model, I’d suggest you get rid of L2 for now, as it is a regularization method, and if you can’t even get your loss low enough, then that may be a problem.
From your screenshot, it also looks like your learning rate is about an order of magnitude too high. And I’d suggest you try a different optimizer, maybe Nadam will work better for you.
Thanks, I have adjusted the parameters and I think it looks better. Have set l2 to zero and now use Nadam. Only the parameters updates, I think are lower than they should be or?
This is again my current network configuration. What still bothers me is the GlobalPoolingLayer. Do I need this layer at all? Currently I use it only, because otherwise the InputType for the DenseLayer does not fit. But maybe I have to set another nIn for the DenseLayer.
No, I don’t get that error anymore, but I get another one.
Exception in thread "main" java.lang.IllegalArgumentException: Labels and preOutput must have equal shapes: got shapes [64, 2] vs [6528, 2]
at org.nd4j.common.base.Preconditions.throwEx(Preconditions.java:636)
at org.nd4j.linalg.lossfunctions.impl.LossMCXENT.computeGradient(LossMCXENT.java:149)
at org.deeplearning4j.nn.layers.BaseOutputLayer.getGradientsAndDelta(BaseOutputLayer.java:175)
at org.deeplearning4j.nn.layers.BaseOutputLayer.backpropGradient(BaseOutputLayer.java:147)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.calcBackpropGradients(MultiLayerNetwork.java:1946)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2761)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2704)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:170)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:63)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1715)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1636)
That looks interesting, maybe there is a bug in the automatic reshaping. But I can see no way that it would result in the shape you are getting.
You can try setting the CnnToFeedForwardPreProcessor on your first dense layer and manually specify its arguments. But please also share the output of model.summary() when it is done automatically in your case.
Exception in thread "main" java.lang.IllegalStateException: Invalid input type: Expected input of type CNN, got InputTypeRecurrent(50,timeSeriesLength=102,format=NCW)
at org.deeplearning4j.nn.conf.preprocessor.CnnToFeedForwardPreProcessor.getOutputType(CnnToFeedForwardPreProcessor.java:173)
at org.deeplearning4j.nn.conf.MultiLayerConfiguration$Builder.build(MultiLayerConfiguration.java:699)
at org.deeplearning4j.nn.conf.NeuralNetConfiguration$ListBuilder.build(NeuralNetConfiguration.java:258)
Interesting. Apparently the Convolution1DLayer is doing more than I had remembered. It actually does not convert the recurrent input into convolutional input. It keeps it as recurrent.
And in that case the automatically inserted RnnToFeedForwardPreProcessor does something unexpected. It turns [miniBatchSize,layerSize,timeSeriesLength] to [miniBatchSize*timeSeriesLength,layerSize].
Looks like your case has come up in Keras import most often, so often in fact that the Preprocessor for that case has it in its name: KerasFlattenRnnPreprocessor.
Use that preprocessor, and it should properly flatten the data.