Training CNN error / CNN text classification

A low utilization isn’t quite unexpected when you have a rather small model working on rather small data. But I would still expect a bit more than 1 to 2 percent on your GPU. How are you measuring that?

As for your crash… that is pretty weird. Especially because the crash dump you’ve shared previously happens on pointer deallocation and you’ve got plenty of free memory.

Just in case that you are hitting a bug that may have already been fixed, can you try using snapshots please? (See

Have the following issue with the latest snapshot:
I have also used Cuda 10.2. Can I then possibly also upgrade to 11.2?

Exception in thread "main" java.lang.NoClassDefFoundError: org/nd4j/common/config/ND4JClassLoading
	at org.nd4j.linalg.factory.Nd4jBackend.load(
	at org.nd4j.linalg.factory.Nd4j.initContext(
	at org.nd4j.linalg.factory.Nd4j.<clinit>(
	at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(
Caused by: java.lang.ClassNotFoundException: org.nd4j.common.config.ND4JClassLoading
	at java.lang.ClassLoader.loadClass(
	at sun.misc.Launcher$AppClassLoader.loadClass(
	at java.lang.ClassLoader.loadClass(
	... 6 more

Yes, for snapshots you want to use 11.0 or 11.2

Which cuDNN version do I need for CUDA 11.2?

12:22:58.799 [main] INFO org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
12:23:00.379 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper] of provided class-loader.
Exception in thread "main" java.lang.NullPointerException: Attempted to load class org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper but failed. No class found with this name.
	at org.nd4j.common.base.Preconditions.throwNullPointerEx(
	at org.nd4j.common.base.Preconditions.checkNotNull(
	at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(
	at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(
	at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.initializeHelper(

@ForceUpdate1 we’re publishing docs on this for the release, I apologize for some of the side effects of being an early adopter :slight_smile:
Basically to use cudnn, there’s actually 2 ways:

  1. Use the new classifiers for nd4j found here:
    Index of /repositories/snapshots/org/nd4j/nd4j-cuda-11.2/1.0.0-SNAPSHOT
    (see anything with -cudnn in it)
  2. Use deeplearning4j-cuda. This is the old way. This can be found here:
    Index of /repositories/snapshots/org/deeplearning4j/deeplearning4j-cuda-11.0/1.0.0-SNAPSHOT
    and here:
    Index of /repositories/snapshots/org/deeplearning4j/deeplearning4j-cuda-11.2/1.0.0-SNAPSHOT

Now as for what you’re seeing, This is a bug that has been fixed recently. Please run mvn -U to ensure you get the updated dependencies. I would recommend trying #1 to see if that works for you.

It doesn’t look like it’s going to work. I once built the project with Maven “clean install -U” that should update the dependencies.
When I start the project from Intellij using Run, the same error occurs. Even after I have invalidated the cache.

If I start the output jar from maven this error comes:

13:50:25.205 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
13:50:25.208 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
13:50:25.208 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [] of provided class-loader.
13:50:25.209 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
13:50:25.209 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [] of provided class-loader.
13:50:26.757 [main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 32
13:50:26.790 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
13:50:26.790 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [8]; Memory: [7,1GB];
13:50:26.790 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
13:50:26.795 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 11.2.142
13:50:26.796 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [NVIDIA GeForce GTX 1070]; cc: [6.1]; Total memory: [8589934592]
13:50:26.797 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - Backend build information:
 MSVC: 192829914
STD version: 201402L
CUDA: 11.2.142
13:50:26.830 [main] INFO org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
13:50:27.317 [main] ERROR org.nd4j.linalg.compression.BasicNDArrayCompressor - Error loading ND4J Compressors via service loader: No compressors were found. This usually occurs when running ND4J UI from an uber-jar, which was built incorrectly (without services resource files being included)
Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.nd4j.linalg.factory.Nd4j.getCompressor(
        at org.nd4j.linalg.api.ndarray.BaseNDArray.get(
        at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(
        at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(
Caused by: java.lang.RuntimeException: Error loading ND4J Compressors via service loader: No compressors were found. This usually occurs when running ND4J UI from an uber-jar, which was built incorrectly (without services resource files being included)
        at org.nd4j.linalg.compression.BasicNDArrayCompressor.loadCompressors(
        at org.nd4j.linalg.compression.BasicNDArrayCompressor.<init>(
        at org.nd4j.linalg.compression.BasicNDArrayCompressor.<clinit>(
        ... 6 more

@ForceUpdate1 that particular stack trace is completely unrelated to your problem. It’s complaining about some files that are missing. it actually looks like it loads the build fine. How are you building your jar exactly?

Edit: Just so you know what you’re looking at, nd4j leverages service providers for its various implementations of different interfaces like the compressor:

You can find those in every backend.

Here’s the cpu version:

I would encourage you to try again and ensure the right files are present, I highly doubt this is a bug (hence the very specific error message provided there)

I’ve looked at your memory crash log again, and I’ve noticed that is says that your periodic gc is disabled.

In principle it shouldn’t be a problem, since workspaces are being used, but as a workaround you could try enabling it (see:

I do not know if anything else has been changed, but today it works at once, with the snapshot version. Also the high ETL is gone. Thanks a lot anyway.

I just have a few questions now:

Are the models I trained with the 1.0.0-SNAPSHOT compatible with the 1.0.0-beta7?

I also took a closer look at the paper and came across the following:

Any character exceeding length l0 is ignored, and any characters
that are not in the alphabet including blank characters are quantized as all-zero vectors.

Is it possible that I can somehow remove the zero from my one-hot encoding and replace it with a zero vector?

Now that the Cuda version is running, can I somehow still improve performance / GPU utilization with my model?

I just had a look in the task manager and there it is now between 2-3%.

I had already read through the wiki article and reduced the batchsize, but maybe there are tips aside from that, e.g. specifically for my model.

@ForceUpdate1 yes the models are compatible. After the release, I’ll encourage you to use newer versions where possible.

Generally, utilization comes down to always ensuring data is prepared and each batch size is big enough to warrant being sent to the gpu. It shouldn’t be your only metric though. Focus on bringing down overall training times. This could be components like pre saving your ETL pipeline so it only runs once, ensuring you async prepare your next batch for training among other things.

So I went through the wiki article on performance again. Currently one epoch takes about 3 minutes. There are 25000 records, my minibatch size is currently 64.

According to the performance listener, my ETL time is already constantly at zero and also my GC time is at zero.

This is the output for one epoch:

One thing you might want to try here is to reduce the stats collecting, esp. on cuda that sometimes becomes a bottleneck.

The log you’ve shared does however clearly say that you are on CPU, so I’m a bit confused. If you are using the CPU backend, your cuda utilization obviously will be close to zero.

Ok, I have removed all the listeners, but it makes almost no difference.

I currently want to adjust the hyperparameters and it’s a bit annoying when it always takes 15min to get results. Maybe you have a tip, which parameters I could choose better?

I switched back to 1.0.0-beta7 for now, because there was issues with the snapshot version. And with the 1.0.0-beta7 I can’t use Cuda, because I currently have Cuda 11.2 installed.

You should have said that in the previous post. Because until now you were asking about cuda performance.

As for training your model, I’d suggest you get rid of L2 for now, as it is a regularization method, and if you can’t even get your loss low enough, then that may be a problem.

From your screenshot, it also looks like your learning rate is about an order of magnitude too high. And I’d suggest you try a different optimizer, maybe Nadam will work better for you.

Thanks, I have adjusted the parameters and I think it looks better. Have set l2 to zero and now use Nadam. Only the parameters updates, I think are lower than they should be or?

This is again my current network configuration. What still bothers me is the GlobalPoolingLayer. Do I need this layer at all? Currently I use it only, because otherwise the InputType for the DenseLayer does not fit. But maybe I have to set another nIn for the DenseLayer.

So you are still getting this when you try it without the global pooling?

I’m asking because you currently shouldn’t be even having a mask at all.

No, I don’t get that error anymore, but I get another one.

Exception in thread "main" java.lang.IllegalArgumentException: Labels and preOutput must have equal shapes: got shapes [64, 2] vs [6528, 2]
	at org.nd4j.common.base.Preconditions.throwEx(
	at org.nd4j.linalg.lossfunctions.impl.LossMCXENT.computeGradient(
	at org.deeplearning4j.nn.layers.BaseOutputLayer.getGradientsAndDelta(
	at org.deeplearning4j.nn.layers.BaseOutputLayer.backpropGradient(
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.calcBackpropGradients(
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(
	at org.deeplearning4j.optimize.Solver.optimize(
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(

That looks interesting, maybe there is a bug in the automatic reshaping. But I can see no way that it would result in the shape you are getting.

You can try setting the CnnToFeedForwardPreProcessor on your first dense layer and manually specify its arguments. But please also share the output of model.summary() when it is done automatically in your case.

I hope I did it right. But there is now another error.

Exception in thread "main" java.lang.IllegalStateException: Invalid input type: Expected input of type CNN, got InputTypeRecurrent(50,timeSeriesLength=102,format=NCW)
	at org.deeplearning4j.nn.conf.preprocessor.CnnToFeedForwardPreProcessor.getOutputType(
	at org.deeplearning4j.nn.conf.MultiLayerConfiguration$
	at org.deeplearning4j.nn.conf.NeuralNetConfiguration$

Interesting. Apparently the Convolution1DLayer is doing more than I had remembered. It actually does not convert the recurrent input into convolutional input. It keeps it as recurrent.

And in that case the automatically inserted RnnToFeedForwardPreProcessor does something unexpected. It turns [miniBatchSize,layerSize,timeSeriesLength] to [miniBatchSize*timeSeriesLength,layerSize].

Looks like your case has come up in Keras import most often, so often in fact that the Preprocessor for that case has it in its name:

Use that preprocessor, and it should properly flatten the data.