Training CNN error / CNN text classification

treo · May 11, 2021, 6:52am

I’ve looked at your memory crash log again, and I’ve noticed that is says that your periodic gc is disabled.

In principle it shouldn’t be a problem, since workspaces are being used, but as a workaround you could try enabling it (see: https://deeplearning4j.konduit.ai/config/config-memory/config-workspaces#garbage-collector)

ForceUpdate1 · May 11, 2021, 3:22pm

I do not know if anything else has been changed, but today it works at once, with the snapshot version. Also the high ETL is gone. Thanks a lot anyway.

I just have a few questions now:

Are the models I trained with the 1.0.0-SNAPSHOT compatible with the 1.0.0-beta7?

I also took a closer look at the paper and came across the following:

Any character exceeding length l0 is ignored, and any characters
that are not in the alphabet including blank characters are quantized as all-zero vectors.

Is it possible that I can somehow remove the zero from my one-hot encoding and replace it with a zero vector?

Now that the Cuda version is running, can I somehow still improve performance / GPU utilization with my model?

I just had a look in the task manager and there it is now between 2-3%.

I had already read through the wiki article and reduced the batchsize, but maybe there are tips aside from that, e.g. specifically for my model.

agibsonccc · May 11, 2021, 10:43pm

@ForceUpdate1 yes the models are compatible. After the release, I’ll encourage you to use newer versions where possible.

Generally, utilization comes down to always ensuring data is prepared and each batch size is big enough to warrant being sent to the gpu. It shouldn’t be your only metric though. Focus on bringing down overall training times. This could be components like pre saving your ETL pipeline so it only runs once, ensuring you async prepare your next batch for training among other things.

ForceUpdate1 · May 12, 2021, 1:11pm

So I went through the wiki article on performance again. Currently one epoch takes about 3 minutes. There are 25000 records, my minibatch size is currently 64.

According to the performance listener, my ETL time is already constantly at zero and also my GC time is at zero.

This is the output for one epoch:
https://gist.github.com/ForceUpdate1/cf299ba739834918de856ad7a0c2d5cf

treo · May 12, 2021, 2:14pm

One thing you might want to try here is to reduce the stats collecting, esp. on cuda that sometimes becomes a bottleneck.

The log you’ve shared does however clearly say that you are on CPU, so I’m a bit confused. If you are using the CPU backend, your cuda utilization obviously will be close to zero.

ForceUpdate1 · May 12, 2021, 2:31pm

Ok, I have removed all the listeners, but it makes almost no difference.

I currently want to adjust the hyperparameters and it’s a bit annoying when it always takes 15min to get results. Maybe you have a tip, which parameters I could choose better?

I switched back to 1.0.0-beta7 for now, because there was issues with the snapshot version. And with the 1.0.0-beta7 I can’t use Cuda, because I currently have Cuda 11.2 installed.

treo · May 12, 2021, 2:37pm

You should have said that in the previous post. Because until now you were asking about cuda performance.

As for training your model, I’d suggest you get rid of L2 for now, as it is a regularization method, and if you can’t even get your loss low enough, then that may be a problem.

From your screenshot, it also looks like your learning rate is about an order of magnitude too high. And I’d suggest you try a different optimizer, maybe Nadam will work better for you.

ForceUpdate1 · May 12, 2021, 3:26pm

Thanks, I have adjusted the parameters and I think it looks better. Have set l2 to zero and now use Nadam. Only the parameters updates, I think are lower than they should be or?

This is again my current network configuration. What still bothers me is the GlobalPoolingLayer. Do I need this layer at all? Currently I use it only, because otherwise the InputType for the DenseLayer does not fit. But maybe I have to set another nIn for the DenseLayer.

https://gist.github.com/ForceUpdate1/b2660bce28a2ed0385c16a48103d0e4b

treo · May 13, 2021, 8:39am

So you are still getting this when you try it without the global pooling?

ForceUpdate1:

    Expected rank 4 mask array for 2D CNN layer activations. Got rank 4 mask array (shape [150, 1, 6, 1]) - when used in conjunction with input data of shape [batch,channels,h,w] 4d masks passing through CnnToFeedForwardPreProcessor should have shape [batchSize,1,1,1]

I’m asking because you currently shouldn’t be even having a mask at all.

ForceUpdate1 · May 13, 2021, 10:37am

No, I don’t get that error anymore, but I get another one.

Exception in thread "main" java.lang.IllegalArgumentException: Labels and preOutput must have equal shapes: got shapes [64, 2] vs [6528, 2]
	at org.nd4j.common.base.Preconditions.throwEx(Preconditions.java:636)
	at org.nd4j.linalg.lossfunctions.impl.LossMCXENT.computeGradient(LossMCXENT.java:149)
	at org.deeplearning4j.nn.layers.BaseOutputLayer.getGradientsAndDelta(BaseOutputLayer.java:175)
	at org.deeplearning4j.nn.layers.BaseOutputLayer.backpropGradient(BaseOutputLayer.java:147)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.calcBackpropGradients(MultiLayerNetwork.java:1946)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2761)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2704)
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:170)
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:63)
	at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1715)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1636)

treo · May 13, 2021, 11:39am

That looks interesting, maybe there is a bug in the automatic reshaping. But I can see no way that it would result in the shape you are getting.

You can try setting the CnnToFeedForwardPreProcessor on your first dense layer and manually specify its arguments. But please also share the output of model.summary() when it is done automatically in your case.

ForceUpdate1 · May 13, 2021, 12:06pm

I hope I did it right. But there is now another error.

https://gist.github.com/ForceUpdate1/b2660bce28a2ed0385c16a48103d0e4b

Exception in thread "main" java.lang.IllegalStateException: Invalid input type: Expected input of type CNN, got InputTypeRecurrent(50,timeSeriesLength=102,format=NCW)
	at org.deeplearning4j.nn.conf.preprocessor.CnnToFeedForwardPreProcessor.getOutputType(CnnToFeedForwardPreProcessor.java:173)
	at org.deeplearning4j.nn.conf.MultiLayerConfiguration$Builder.build(MultiLayerConfiguration.java:699)
	at org.deeplearning4j.nn.conf.NeuralNetConfiguration$ListBuilder.build(NeuralNetConfiguration.java:258)

treo · May 13, 2021, 1:32pm

Interesting. Apparently the Convolution1DLayer is doing more than I had remembered. It actually does not convert the recurrent input into convolutional input. It keeps it as recurrent.

And in that case the automatically inserted RnnToFeedForwardPreProcessor does something unexpected. It turns [miniBatchSize,layerSize,timeSeriesLength] to [miniBatchSize*timeSeriesLength,layerSize].

Looks like your case has come up in Keras import most often, so often in fact that the Preprocessor for that case has it in its name:
KerasFlattenRnnPreprocessor.

Use that preprocessor, and it should properly flatten the data.

ForceUpdate1 · May 13, 2021, 3:06pm

Ok thank you, it works. I have now as parameter for deep the number of faturemaps and for tsLength the 102.

I felt that now it is also a little faster. But it can also be because I am currently testing again in the snapshot version with Cuda.

But I have another issue, after I set a dropout on my Dense Layer, in the Cuda version the following error spams in the console during training:

16:42:04.438 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.dropout.CudnnDropoutHelper] of provided class-loader.
16:42:04.438 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Unable to find class org.deeplearning4j.cuda.dropout.CudnnDropoutHelper  using the classloader set for Dl4jClassLoading. Trying to use class loader that loaded the  class org.deeplearning4j.nn.conf.dropout.DropoutHelper instead.
16:42:04.438 [main] DEBUG org.deeplearning4j.common.config.DL4JClassLoading - Global class-loader for DL4J was changed.
16:42:04.438 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.dropout.CudnnDropoutHelper] of provided class-loader.
16:42:04.438 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Unable to use  helper implementation org.deeplearning4j.cuda.dropout.CudnnDropoutHelper for helper type org.deeplearning4j.nn.conf.dropout.DropoutHelper, please check your classpath. Falling back to built in  normal  methods for now.
16:42:04.438 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Returning class loader to original one.
16:42:04.438 [main] DEBUG org.deeplearning4j.common.config.DL4JClassLoading - Global class-loader for DL4J was changed.
16:42:04.442 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.dropout.CudnnDropoutHelper] of provided class-loader.
16:42:04.442 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Unable to find class org.deeplearning4j.cuda.dropout.CudnnDropoutHelper  using the classloader set for Dl4jClassLoading. Trying to use class loader that loaded the  class org.deeplearning4j.nn.conf.dropout.DropoutHelper instead.
16:42:04.442 [main] DEBUG org.deeplearning4j.common.config.DL4JClassLoading - Global class-loader for DL4J was changed.
16:42:04.442 [main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot find class [org.deeplearning4j.cuda.dropout.CudnnDropoutHelper] of provided class-loader.
16:42:04.442 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Unable to use  helper implementation org.deeplearning4j.cuda.dropout.CudnnDropoutHelper for helper type org.deeplearning4j.nn.conf.dropout.DropoutHelper, please check your classpath. Falling back to built in  normal  methods for now.
16:42:04.442 [main] WARN org.deeplearning4j.nn.layers.HelperUtils - Returning class loader to original one.
16:42:04.442 [main] DEBUG org.deeplearning4j.common.config.DL4JClassLoading - Global class-loader for DL4J was changed.

And without Cuda with the CPU comes directly this error:

https://gist.github.com/ForceUpdate1/e5bceae7908ccafea61404f9a9d996c5

Both with the current snapshot version

treo · May 13, 2021, 3:37pm

You should be able to alleviate the cuda training issue by adding deeplearning4j-cuda-11.2 to your dependencies (with the correct cuda version for you).

The op helper issue you are seeing on the snapshots here is currently being worked on: Fix up helper instantiation, add tests for helpers created by reflection by agibsonccc · Pull Request #9308 · eclipse/deeplearning4j · GitHub

ForceUpdate1 · May 13, 2021, 5:26pm

Ok thanks.

With the changes I now have an issue with the training not working properly. No matter what I change in the learning rate, it only gets worse. Maybe you have a tip for me again.

treo · May 13, 2021, 5:33pm

The huge step at the beginning and then going nowhere looks suspicious. Maybe try an even lower starting learning rate (10^-4 maybe?) or maybe even try an even smaller initialization.

The paper you are trying to implement says:

The algorithm used is stochastic gradient descent (SGD) with a minibatch of size 128, using momentum 0.9 and initial step size 0.01 which is halved every 3 epoches for 10 times

So you might want to use that.

I’ve looked for their implementation, and actually found it:

github.com

zhangxiangxiao/Crepe/blob/master/train/config.lua

--[[
Configuration for Crepe Training Program
By Xiang Zhang @ New York University
--]]

require("nn")

-- The namespace
config = {}

local alphabet = "abcdefghijklmnopqrstuvwxyz0123456789-,;.!?:'\"/\\|_@#$%^&*~`+-=<>()[]{}"

-- Training data
config.train_data = {}
config.train_data.file = paths.concat(paths.cwd(), "../data/train.t7b")
config.train_data.alphabet = alphabet
config.train_data.length = 1014
config.train_data.batch_size = 128

-- Validation data

This file has been truncated. show original

Maybe their learning rate schedule is important. Maybe not.

Another implementation of the same paper is this:

github.com

dongjun-Lee/text-classification-models-tf/blob/master/cnn_models/char_cnn.py

import tensorflow as tf


class CharCNN(object):
    def __init__(self, alphabet_size, document_max_len, num_class):
        self.learning_rate = 1e-3
        self.filter_sizes = [7, 7, 3, 3, 3, 3]
        self.num_filters = 256
        self.kernel_initializer = tf.truncated_normal_initializer(stddev=0.05)

        self.x = tf.placeholder(tf.int32, [None, document_max_len], name="x")
        self.y = tf.placeholder(tf.int32, [None], name="y")
        self.is_training = tf.placeholder(tf.bool, [], name="is_training")
        self.global_step = tf.Variable(0, trainable=False)
        self.keep_prob = tf.where(self.is_training, 0.5, 1.0)

        self.x_one_hot = tf.one_hot(self.x, alphabet_size)
        self.x_expanded = tf.expand_dims(self.x_one_hot, -1)

        # ============= Convolutional Layers =============

This file has been truncated. show original

And there you see a Truncated normal initialization and a fixed 10^-3 learning rate with an Adam optimizer, and apparently it has worked too.

ForceUpdate1 · May 13, 2021, 7:10pm

How can I set this values?

I had activated the option already:

.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)

Is the SoftMax function the same as the LogSoftMax function at least I could not find it?

treo · May 13, 2021, 7:24pm

That is not what is meant there. To get the same effect, you would be using the Sgd optimizer instead of Adam or Nadam.

Instead of a learning rate you can pass a Schedule to the optimizer.
For possible options see: https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule

ForceUpdate1 · May 15, 2021, 7:34am

Thanks for help, I think so far it’s going quite well. But I still have the issue that with 25000 records and a batchSize of 128, 1 epoch still takes about 10 minutes on the CPU. Therefore I wanted to try the Cuda version again and have taken the changes from you. But now another error comes again:

https://gist.github.com/ForceUpdate1/5eb5083e7e329ef52857ff15b42e0aa3

My actual pom.xml:

    <properties>
        <dl4j.version>1.0.0-SNAPSHOT</dl4j.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-native</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-core</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-ui</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-cuda-11.2</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-11.2</artifactId>
            <version>${dl4j.version}</version>
        </dependency>

        <dependency>
            <groupId>commons-cli</groupId>
            <artifactId>commons-cli</artifactId>
            <version>1.4</version>
        </dependency>

        <dependency>
            <groupId>org.datavec</groupId>
            <artifactId>datavec-local</artifactId>
            <version>${dl4j.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.datavec</groupId>
                    <artifactId>datavec-arrow</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

Edit
In JProfiler you can see that the backpropagation takes a lot of time I came across the following article:

https://deeplearning4j.konduit.ai/models/recurrent#truncated-back-propagation-through-time

Unfortunately, it doesn’t quite work for me:

[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [128, 69, 128]	and labels with shape [128, 2]
[main] WARN org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [60, 69, 128]	and labels with shape [60, 2]
[main] INFO de.foorcee.chatfilter.trainer.ChatfilterTrainer - Epoch 1 complete in 6598ms (0 min). Starting evaluation:
Exception in thread "main" java.lang.IllegalStateException: Illegal set of indices for array: need at least 2 point/interval/all/specified indices for rank 2 array ([128, 2]), got indices [all(), all(), Interval(b=0,e=20,s=1)]
	at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:641)
	at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:412)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.get(BaseNDArray.java:4140)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.getSubsetsForTbptt(MultiLayerNetwork.java:2112)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluationHelper(MultiLayerNetwork.java:3472)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluation(MultiLayerNetwork.java:3400)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.evaluate(MultiLayerNetwork.java:3595)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.evaluate(MultiLayerNetwork.java:3505)

Topic		Replies	Views
Error in training -> Invalid input: expect CNN activations with rank 4 DL4J	3	233	July 29, 2023
Problem with Conv1D example DL4J	0	254	December 8, 2022
DL4JInvalidInputException in simple example DL4J	4	449	February 24, 2021
Saved network training update failure DL4J	1	189	June 13, 2023
Error msg in LSTM RNN DL4J	2	44	July 10, 2024

Training CNN error / CNN text classification

Related topics