Training CNN error / CNN text classification

I’ve looked at your memory crash log again, and I’ve noticed that is says that your periodic gc is disabled.

In principle it shouldn’t be a problem, since workspaces are being used, but as a workaround you could try enabling it (see: https://deeplearning4j.konduit.ai/config/config-memory/config-workspaces#garbage-collector)

I do not know if anything else has been changed, but today it works at once, with the snapshot version. Also the high ETL is gone. Thanks a lot anyway.

I just have a few questions now:

Are the models I trained with the 1.0.0-SNAPSHOT compatible with the 1.0.0-beta7?

I also took a closer look at the paper and came across the following:

Any character exceeding length l0 is ignored, and any characters
that are not in the alphabet including blank characters are quantized as all-zero vectors.

Is it possible that I can somehow remove the zero from my one-hot encoding and replace it with a zero vector?

Now that the Cuda version is running, can I somehow still improve performance / GPU utilization with my model?

I just had a look in the task manager and there it is now between 2-3%.

I had already read through the wiki article and reduced the batchsize, but maybe there are tips aside from that, e.g. specifically for my model.

@ForceUpdate1 yes the models are compatible. After the release, I’ll encourage you to use newer versions where possible.

Generally, utilization comes down to always ensuring data is prepared and each batch size is big enough to warrant being sent to the gpu. It shouldn’t be your only metric though. Focus on bringing down overall training times. This could be components like pre saving your ETL pipeline so it only runs once, ensuring you async prepare your next batch for training among other things.

So I went through the wiki article on performance again. Currently one epoch takes about 3 minutes. There are 25000 records, my minibatch size is currently 64.

According to the performance listener, my ETL time is already constantly at zero and also my GC time is at zero.

This is the output for one epoch:
https://gist.github.com/ForceUpdate1/cf299ba739834918de856ad7a0c2d5cf

One thing you might want to try here is to reduce the stats collecting, esp. on cuda that sometimes becomes a bottleneck.

The log you’ve shared does however clearly say that you are on CPU, so I’m a bit confused. If you are using the CPU backend, your cuda utilization obviously will be close to zero.

Ok, I have removed all the listeners, but it makes almost no difference.

I currently want to adjust the hyperparameters and it’s a bit annoying when it always takes 15min to get results. Maybe you have a tip, which parameters I could choose better?

I switched back to 1.0.0-beta7 for now, because there was issues with the snapshot version. And with the 1.0.0-beta7 I can’t use Cuda, because I currently have Cuda 11.2 installed.

You should have said that in the previous post. Because until now you were asking about cuda performance.

As for training your model, I’d suggest you get rid of L2 for now, as it is a regularization method, and if you can’t even get your loss low enough, then that may be a problem.

From your screenshot, it also looks like your learning rate is about an order of magnitude too high. And I’d suggest you try a different optimizer, maybe Nadam will work better for you.

Thanks, I have adjusted the parameters and I think it looks better. Have set l2 to zero and now use Nadam. Only the parameters updates, I think are lower than they should be or?

This is again my current network configuration. What still bothers me is the GlobalPoolingLayer. Do I need this layer at all? Currently I use it only, because otherwise the InputType for the DenseLayer does not fit. But maybe I have to set another nIn for the DenseLayer.

https://gist.github.com/ForceUpdate1/b2660bce28a2ed0385c16a48103d0e4b

So you are still getting this when you try it without the global pooling?

I’m asking because you currently shouldn’t be even having a mask at all.

No, I don’t get that error anymore, but I get another one.

Exception in thread "main" java.lang.IllegalArgumentException: Labels and preOutput must have equal shapes: got shapes [64, 2] vs [6528, 2]
	at org.nd4j.common.base.Preconditions.throwEx(Preconditions.java:636)
	at org.nd4j.linalg.lossfunctions.impl.LossMCXENT.computeGradient(LossMCXENT.java:149)
	at org.deeplearning4j.nn.layers.BaseOutputLayer.getGradientsAndDelta(BaseOutputLayer.java:175)
	at org.deeplearning4j.nn.layers.BaseOutputLayer.backpropGradient(BaseOutputLayer.java:147)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.calcBackpropGradients(MultiLayerNetwork.java:1946)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2761)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2704)
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:170)
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:63)
	at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1715)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1636)

That looks interesting, maybe there is a bug in the automatic reshaping. But I can see no way that it would result in the shape you are getting.

You can try setting the CnnToFeedForwardPreProcessor on your first dense layer and manually specify its arguments. But please also share the output of model.summary() when it is done automatically in your case.

I hope I did it right. But there is now another error.

https://gist.github.com/ForceUpdate1/b2660bce28a2ed0385c16a48103d0e4b

Exception in thread "main" java.lang.IllegalStateException: Invalid input type: Expected input of type CNN, got InputTypeRecurrent(50,timeSeriesLength=102,format=NCW)
	at org.deeplearning4j.nn.conf.preprocessor.CnnToFeedForwardPreProcessor.getOutputType(CnnToFeedForwardPreProcessor.java:173)
	at org.deeplearning4j.nn.conf.MultiLayerConfiguration$Builder.build(MultiLayerConfiguration.java:699)
	at org.deeplearning4j.nn.conf.NeuralNetConfiguration$ListBuilder.build(NeuralNetConfiguration.java:258)

Interesting. Apparently the Convolution1DLayer is doing more than I had remembered. It actually does not convert the recurrent input into convolutional input. It keeps it as recurrent.

And in that case the automatically inserted RnnToFeedForwardPreProcessor does something unexpected. It turns [miniBatchSize,layerSize,timeSeriesLength] to [miniBatchSize*timeSeriesLength,layerSize].

Looks like your case has come up in Keras import most often, so often in fact that the Preprocessor for that case has it in its name:
KerasFlattenRnnPreprocessor.

Use that preprocessor, and it should properly flatten the data.

Ok thank you, it works. I have now as parameter for deep the number of faturemaps and for tsLength the 102.

I felt that now it is also a little faster. But it can also be because I am currently testing again in the snapshot version with Cuda.

But I have another issue, after I set a dropout on my Dense Layer, in the Cuda version the following error spams in the console during training:

16:42:04.438 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.dropout.CudnnDropoutHelper] of provided class-loader.
16:42:04.438 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Unable to find class org.deeplearning4j.cuda.dropout.CudnnDropoutHelper  using the classloader set for Dl4jClassLoading. Trying to use class loader that loaded the  class org.deeplearning4j.nn.conf.dropout.DropoutHelper instead.
16:42:04.438 [main] DEBUG org.deeplearning4j.common.config.DL4JClassLoading - Global class-loader for DL4J was changed.
16:42:04.438 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.dropout.CudnnDropoutHelper] of provided class-loader.
16:42:04.438 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Unable to use  helper implementation org.deeplearning4j.cuda.dropout.CudnnDropoutHelper for helper type org.deeplearning4j.nn.conf.dropout.DropoutHelper, please check your classpath. Falling back to built in  normal  methods for now.
16:42:04.438 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Returning class loader to original one.
16:42:04.438 [main] DEBUG org.deeplearning4j.common.config.DL4JClassLoading - Global class-loader for DL4J was changed.
16:42:04.442 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.dropout.CudnnDropoutHelper] of provided class-loader.
16:42:04.442 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Unable to find class org.deeplearning4j.cuda.dropout.CudnnDropoutHelper  using the classloader set for Dl4jClassLoading. Trying to use class loader that loaded the  class org.deeplearning4j.nn.conf.dropout.DropoutHelper instead.
16:42:04.442 [main] DEBUG org.deeplearning4j.common.config.DL4JClassLoading - Global class-loader for DL4J was changed.
16:42:04.442 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.dropout.CudnnDropoutHelper] of provided class-loader.
16:42:04.442 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Unable to use  helper implementation org.deeplearning4j.cuda.dropout.CudnnDropoutHelper for helper type org.deeplearning4j.nn.conf.dropout.DropoutHelper, please check your classpath. Falling back to built in  normal  methods for now.
16:42:04.442 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Returning class loader to original one.
16:42:04.442 [main] DEBUG org.deeplearning4j.common.config.DL4JClassLoading - Global class-loader for DL4J was changed.

And without Cuda with the CPU comes directly this error:

https://gist.github.com/ForceUpdate1/e5bceae7908ccafea61404f9a9d996c5

Both with the current snapshot version

You should be able to alleviate the cuda training issue by adding deeplearning4j-cuda-11.2 to your dependencies (with the correct cuda version for you).

The op helper issue you are seeing on the snapshots here is currently being worked on: Fix up helper instantiation, add tests for helpers created by reflection by agibsonccc · Pull Request #9308 · eclipse/deeplearning4j · GitHub

Ok thanks.

With the changes I now have an issue with the training not working properly. No matter what I change in the learning rate, it only gets worse. Maybe you have a tip for me again.

The huge step at the beginning and then going nowhere looks suspicious. Maybe try an even lower starting learning rate (10^-4 maybe?) or maybe even try an even smaller initialization.

The paper you are trying to implement says:

The algorithm used is stochastic gradient descent (SGD) with a minibatch of size 128, using momentum 0.9 and initial step size 0.01 which is halved every 3 epoches for 10 times

So you might want to use that.

I’ve looked for their implementation, and actually found it:

Maybe their learning rate schedule is important. Maybe not.

Another implementation of the same paper is this:

And there you see a Truncated normal initialization and a fixed 10^-3 learning rate with an Adam optimizer, and apparently it has worked too.

How can I set this values?

I had activated the option already:

.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)

Is the SoftMax function the same as the LogSoftMax function at least I could not find it?

That is not what is meant there. To get the same effect, you would be using the Sgd optimizer instead of Adam or Nadam.

Instead of a learning rate you can pass a Schedule to the optimizer.
For possible options see: https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule

Thanks for help, I think so far it’s going quite well. But I still have the issue that with 25000 records and a batchSize of 128, 1 epoch still takes about 10 minutes on the CPU. Therefore I wanted to try the Cuda version again and have taken the changes from you. But now another error comes again:

https://gist.github.com/ForceUpdate1/5eb5083e7e329ef52857ff15b42e0aa3

My actual pom.xml:

    <properties>
        <dl4j.version>1.0.0-SNAPSHOT</dl4j.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-native</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-core</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-ui</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-cuda-11.2</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-11.2</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>commons-cli</groupId>
            <artifactId>commons-cli</artifactId>
            <version>1.4</version>
        </dependency>

        <dependency>
            <groupId>org.datavec</groupId>
            <artifactId>datavec-local</artifactId>
            <version>${dl4j.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.datavec</groupId>
                    <artifactId>datavec-arrow</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

Edit
In JProfiler you can see that the backpropagation takes a lot of time I came across the following article:

https://deeplearning4j.konduit.ai/models/recurrent#truncated-back-propagation-through-time

Unfortunately, it doesn’t quite work for me:

[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [60, 69, 128]	and labels with shape [60, 2]
[main] INFO de.foorcee.chatfilter.trainer.ChatfilterTrainer - Epoch 1 complete in 6598ms (0 min). Starting evaluation:
Exception in thread "main" java.lang.IllegalStateException: Illegal set of indices for array: need at least 2 point/interval/all/specified indices for rank 2 array ([128, 2]), got indices [all(), all(), Interval(b=0,e=20,s=1)]
	at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:641)
	at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:412)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.get(BaseNDArray.java:4140)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.getSubsetsForTbptt(MultiLayerNetwork.java:2112)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluationHelper(MultiLayerNetwork.java:3472)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluation(MultiLayerNetwork.java:3400)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.evaluate(MultiLayerNetwork.java:3595)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.evaluate(MultiLayerNetwork.java:3505)