Training CNN error / CNN text classification

I am currently trying to implement a network for text classification from a paper in Java. My network looks like this so far:

Unfortunately, I always get an error when I try to train the network. Can it maybe be due to the missing flatten layer. I have not found anything about this. Also the examples of the text classification did not bring me further.

If I add a GlobalPoolingLayer between CNN and Defense Layer the training works, but unfortunately a class is not matched at all. Therefore I think it is not correct.

    Expected rank 4 mask array for 2D CNN layer activations. Got rank 4 mask array (shape [150, 1, 6, 1]) - when used in conjunction with input data of shape [batch,channels,h,w] 4d masks passing through CnnToFeedForwardPreProcessor should have shape [batchSize,1,1,1]

It does tell you what its problem is: your mask array doesn’t have the shape that it expects.

That may be because your l2 regularization is quite large compared to your learning rate.

Thanks for the answer. So I can not do without the global pooling layer? What about the general structure of the network? According to the literature, the Word / Sentence vector should be a 2D array where X is the character position in the word and Y should then be the alphabet from a-z.
As far as I have looked at the paragraph vectors, they are always 1D arrays. Is it also possible to get a representation as a 2D array?

As long as the shapes match what it expects you can do whatever you want.

Given that we only have your config, there may be plenty of other problems, but this one stands out too:

setInputType(InputType.convolutional(batchSize, vec.vectorSize(), 1))

I think you misunderstand what this sets.

The height, width and depth are all about a single example. But you are introducing your batch size here as the height.

So, yes you can have 2D data, but you’ve to provide it correctly and set up the network to receive it correctly.

Thank you very much. That really makes sense. I changed it once, but unfortunately the error still exists. I have updated the gist again and added more code.

Is the error still exactly the same?

The gist looks somewhat all over the place if you are actually trying to do what you say you are trying to do, i.e. it is set up to receive just a single word per example.

Ok thanks. I have cleaned up and updated the Gist. And I’ll try again to describe my issue in more detail. I have 2 text files with 10,000 lines, one with the label “block” with incorrect sentences and one with the label “good” with correct sentences. I want to train my CNN so that it can recognize whether a sentence falls into the category block or into the category good. The sentences have the eiegnity that they can only have a total maximum length of 255 characters and can also only consist of a fairly small alphabet. I noticed that the DataSets features from the CnnSentenceDataSetIterator have a shape of [150, 1, 28, 300] and I think that is where the issue is. I have set the InputType height to 255.

Of course I have more sets but I have limited myself to 10.000 sets for testing.

After the literature, I try to implement the network:

The first layer of the network used in the paper is of size 1024, with a local
receptive field (called ‘kernel’ in the paper) of 7, followed by a pooling layer with a
pool of size 3. This all is called ‘layer 1’ in the paper. The authors consider pooling to
be a part of the convolutional layer, which is ok, but Keras treats pooling as a separate
layer, so we will re-enumerate the layers here so that the reader can recreate them
in Keras. The third and fourth level are the same as the first and second. The fifth,
sixth, seventh and eighth layer are the same as the first layer (they are convolutional
layers with no pooling), the ninth layer is a max pooling layer with a pool of 3 (i.e.
it is like the second layer). The tenth layer is a flattening layer, and the eleventh and
twelfth layers are fully-connected layers of size 2048. The final layer’s size depends
on the number of classes used. For sentiment this is ‘positive’ and ‘negative’, so we
may use a logistic function with a single output neuron (all other layers use ReLUs).
If we were to have more classes, we would use softmax, but we will do this in the
later chapters. There are also two dropout layers between the three fully-connected
layers and special weight initializations, but we ignore them here.

Sandro Skansi - Introduction to Deep Learning

I don’t know exactly how to implement the flattenlayer. And without the GlobalPoolingLayer it doesn’t work and with the GlobalPoolingLayer no useful results come out.

Results with my test dataset

=========================Confusion Matrix=========================
   0   1
   0 411 | 0 = block
   0 741 | 1 = good

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times

Going by the code in your gist, you aren’t actually creating a character based input, but lets get your model problem sorted out first.

What you are looking for is a reshape. In principle you shouldn’t need to add anything if your input type is set up correctly.

In that case it should add the conversion from CNN format to a flat format automatically:

And your original error message actually tells us that it is added.

So it looks like your actual problem is in data loading. And as I said initially, what you are doing certainly is not a character level one-hot encoding of the input.

As your data probably fits into memory comfortably, I suggest you start with using a CollectionSequenceRecordReader, and give it a <List<List<List<StringWritable>>>. The outer most list contains all of your examples. The middle list contains all the steps of your sequence. The inner most list is then a list of two elements new StringWritable(char) and new StringWritable(label).

While this does duplicate your label for every character, it simplifies the setup a bit. Once you understand how things work you can split that out again.

You then create a transform, that will turn your characters into a one hot encoding (e.g. see Quickstart with Deeplearning4J – dubs·tech) and create a SequenceRecordReaderDataSetIterator that will then create sequences of one hot encoded vectors, just as you wanted it to.

If you want to skip all that, and just have a sanity check that your model is correct first, you can simply create INDArrays of the correct shape (which you should understand by know) for your inputs and labels, and send it through the model (, labels)).

Thank you for the answer, I have now tried it out so.
I also did not find the class StringWritable, but Text.
How do I have to implement the schema for the data structure?

      List<List<List<Writable>>> list = new ArrayList<>();

    CollectionSequenceRecordReader recordReader = new CollectionSequenceRecordReader(list);

    Schema.Builder schemaBuilder = new Schema.Builder();
    schemaBuilder.addColumnCategorical("label", Arrays.asList("good,block"));

    Schema schema =;

    TransformProcess process = new TransformProcess.Builder(schema)
            .integerToOneHot("char", Character.MIN_VALUE, Character.MAX_VALUE)

    Schema finalSchema = process.getFinalSchema();

    TransformProcessRecordReader trainRecordReader = new TransformProcessRecordReader(recordReader, process);

    RecordReaderDataSetIterator trainIterator = new RecordReaderDataSetIterator.Builder(trainRecordReader, batchSize)
            .classification(finalSchema.getIndexOfColumn("label"), 2)

This is how I load the data from the file:

        private static List<List<List<Writable>>> loadData(String label) {
    List<List<List<Writable>>> list = new ArrayList<>();
    try (Scanner scanner = new Scanner(new File("messages/test/" + label + "/messages.txt"))) {
        while (scanner.hasNextLine()) {
            String sequence = scanner.nextLine();
            char[] chars = sequence.toCharArray();
            List<List<Writable>> innerList = new ArrayList<>(chars.length);
            for (int i = 0; i < chars.length; i++) {
                List<Writable> mostInnerList = new ArrayList<>(2);
                mostInnerList.add(new Text(label));
                mostInnerList.add(new IntWritable(chars[i]));
    } catch (FileNotFoundException e) {

    return list;

I was writing things from memory, so maybe there is an inconsistency with naming :slight_smile: for the number types it is always called SomethingWritable.

At a glance, your code looks like it should work. The only thing I’d change is that I’d shuffle the list before creating the record reader, so you don’t get all the block examples first and all the good examples later.

Do you have a specific error that happens?

Ok, then everything is good. It was the same for all classes only the string type was a bit different. But then it fits so far.

Yes, that is the error I am getting:

Exception in thread "main" java.lang.UnsupportedOperationException: next() not supported for CollectionSequencRecordReader; use sequenceRecord()
at org.datavec.api.records.reader.impl.collection.CollectionSequenceRecordReader.nextRecord(
at org.datavec.api.records.reader.impl.transform.TransformProcessRecordReader.hasNext(
at org.datavec.api.records.reader.impl.transform.TransformProcessRecordReader.nextRecord(
at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.initializeUnderlying(
at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.resetSupported(
at org.nd4j.linalg.dataset.AsyncDataSetIterator.<init>(
at org.nd4j.linalg.dataset.AsyncDataSetIterator.<init>(
at org.nd4j.linalg.dataset.AsyncDataSetIterator.<init>(
at org.nd4j.linalg.dataset.AsyncDataSetIterator.<init>(
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(

Also, what width do I have to choose for the InputType now?
I have updated the code again to Gist.

And since I only have a limited character set/alphabet anyway, I think it would be better if I made the one hot encoding a little smaller later. But for testing it should actually fit.

Ah, I missed that when looking at your code. You are building Sequences, so you need to use a SequenceRecordReaderDataSetIterator (I know the name is ridiculously long).

For the input type width, you just use the total alphabet size you are currently using.

Thank you very much. I have created an alphabet for my text. But now I have the following issue with my schema:
Have updated the gist.

Exception in thread "main" java.lang.IllegalStateException: Cannot convert categorical value to one-hot: input value ("good") is not in the list of known categories (state names/categories: [good,block])
at org.datavec.api.transform.transform.BaseTransform.mapSequence(
at org.datavec.api.transform.TransformProcess.executeSequenceToSequence(
at org.datavec.api.transform.TransformProcess.executeSequence(
at org.datavec.api.records.reader.impl.transform.TransformProcessSequenceRecordReader.nextSequence(
at org.deeplearning4j.datasets.datavec.SequenceRecordReaderDataSetIterator.initializeUnderlyingFromReader(
at org.deeplearning4j.datasets.datavec.SequenceRecordReaderDataSetIterator.hasNext(
at org.nd4j.linalg.dataset.AsyncDataSetIterator.<init>(
at org.nd4j.linalg.dataset.AsyncDataSetIterator.<init>(
at org.nd4j.linalg.dataset.AsyncDataSetIterator.<init>(
at org.nd4j.linalg.dataset.AsyncDataSetIterator.<init>(
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(

I think you are missing a ", " in there.

Oh lost. I am so stupid. Sorry. But somehow my InputType doesn’t quite fit yet. You have taken in your Quickstart, as width the column number, that I have taken over times so. Did I perhaps not consider that the sentences can have a different length? How does that work, that the arrays of the sets are all brought to the same length?

gist updated

Exception in thread "main" java.lang.IllegalStateException: Expected rank 4 mask array for 2D CNN layers. Mask arrays for 2D CNN layers must have shape [batchSize,channels,X,Y] where X = (1 or activationsHeight) and Y = (1 or activationsWidth): Got rank 2 array with shape [150, 72]
at org.deeplearning4j.util.ConvolutionUtils.cnn2dMaskReduction(
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.feedForwardMaskArray(
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.feedForwardMaskArray(
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.setLayerMaskArrays(
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(

I just found the function:

    List<Writable> writables = new ArrayList<>();
    writables.add(new IntWritable(0));
    writables.add(new IntWritable(0));
    for (int i = 0; i < alphabet.size(); i++) {
        writables.add(new IntWritable(0));

    TransformProcess process = new TransformProcess.Builder(schema)
            .integerToOneHot("char",  0, alphabet.size() - 1)
            .trimOrPadSequenceToLength(255, writables)

But I still get an error:

Exception in thread "main" org.deeplearning4j.exception.DL4JInvalidInputException: Got rank 3 array as input to ConvolutionLayer (layer name = Input, layer index = 0) with shape [150, 87, 255]. Expected rank 4 array with shape [minibatchSize, layerInputDepth, inputHeight, inputWidth]. (layer name: Input, layer index: 0, layer type: ConvolutionLayer)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.validateInputRank(
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.preOutput(
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(
at org.deeplearning4j.nn.layers.AbstractLayer.activate(
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.ffToLayerActivationsInWs(
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(
at org.deeplearning4j.optimize.Solver.optimize(
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(

So we are getting closer. Batchsize is 150, I guess your alphabet size is 87 and you’ve got a sequence length of 255 steps. That is the input shape for Recurrent Networks.

So it at least looks like your data finally has the correct shape. I guess you’ve looked at it also and verified that it at least looks correct (i.e. you’ve got a proper sequence of one hot encoded vectors).

As you’ve got recurrent type input, you should set the input type accordingly to InputType.recurrent(alphabetSize, seqLength).

In principle it should have enough information to automatically set the proper preprocessor for you then. But if it doesn’t, you can set an input preprocessor on the model with .setInputPreProcessor(new RnnToCnnPreProcessor(alphabetSize, seqLength, 1)). That should reshape your data properly for a temporal cnn network.

On the topic of temporal CNN networks. The model you’ve defined is not one of them.

The paper that your book is referencing ( is explicitly taking about one dimensional CNNs.

In this article we explore treating text as a kind of raw signal at character level, and applying temporal (one-dimensional) ConvNets to it.

You are using kernels like 7,7, i.e. 7 features wide and 7 features high. But for a CNN to be one-dimensional it needs to be something like 7,alphabetSize, which would make sure that it effectively looks at 7 characters at once. And later on you’d need to adjust that second size accordingly to look across all the feature maps that the previous steps have created.

That is a bit tedious to implement manually. But DL4J has 1D variants of all the layers you need: deeplearning4j/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers at master · eclipse/deeplearning4j · GitHub

So you should be able to build your network using those instead.


I have implemented the things first so I think as you mean, but now there is other issue. Can it be that I have to specify something different for the size?

I have updated the gist.

Exception in thread "main" java.lang.IllegalStateException: Invalid input: expected RNN input of size 21930 = (d=1 * w=255 * h=86), got InputTypeRecurrent(86,timeSeriesLength=255,format=NCW)
at org.deeplearning4j.nn.conf.preprocessor.RnnToCnnPreProcessor.getOutputType(
at org.deeplearning4j.nn.conf.MultiLayerConfiguration$
at org.deeplearning4j.nn.conf.NeuralNetConfiguration$

I’d suggest you setup the 1D convolutions first, if I remember correctly, it should accept rnn input type data without an additional input preprocessor.

gist updated

Exception in thread “main” org.deeplearning4j.exception.DL4JInvalidInputException: Cannot do forward pass in Convolution layer (layer name = Input, layer index = 0): input array channels does not match CNN layer configuration (data format = NCHW, data input channels = 87, [minibatch, channels, height, width]=[150, 87, 255, 1]; expected input channels = 1) (layer name: Input, layer index: 0, layer type: Convolution1DLayer)
Note: Convolution layers can be configured for either NCHW (channels first) or NHWC (channels last) format for input images and activations.
Layers can be configured using .dataFormat(CNN2DFormat.NCHW/NHWC) when constructing the layer, or for the entire net using .setInputType(InputType.convolutional(height, width, depth, CNN2DForman.NCHW/NHWC)).
ImageRecordReader and NativeImageLoader can also be configured to load image data in either NCHW or NHWC format which must match the network
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.validateInputDepth(
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.preOutput(
at org.deeplearning4j.nn.layers.convolution.Convolution1DLayer.preOutput(
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(
at org.deeplearning4j.nn.layers.convolution.Convolution1DLayer.activate(
at org.deeplearning4j.nn.layers.AbstractLayer.activate(

ugh, didn’t expect to see that error in this situation.

Set the data format on the first channel to be NHWC and it should work.