Classifier indefinitely skips prediction on certain classes

Hello, I’ve checked out multiple threads pertaining to the same issue (having been 3 years dated), yet it appears that regardless of the data set I attempt to input as well as the tuning I do, the classifier continues to skip classes at random.

I decided to create a classifier identical to the IrisClassifier that can be found in the DL4J examples repository:

with the following modifications:
-changing the iterations
-changing the activation functions
-updating the learning rate
-removing/modifying l2 regularization
-changing the loss function
-the amount of data that is used for training vs. testing

Here is the data set I’ve primarily tried to input:
This dataset contains 1097 inputs, but 90% of these inputs are toward class 2
which displays the following result:

Multiple threads indicate that a biased dataset will result in the error, so I tried a different approach:
This dataset contains 143 inputs, where all three classes are evenly distributed
However I appear to get a similar or worse result:

From what I’ve tried, nothing allows a run where all classes are considered. I’ve hit a wall and have absolutely no idea how to continue. Any help is greatly appreciated.

@chapipo could you clarify what version you’re using? Tuning results vary WILDLY form any individual neural network, dataset, and version of the library.

1 Like

Thank you for your reply. The library was grabbed here using git bash and installed with Apache Maven 3.6.3 following this tutorial. The IrisClassifier works perfectly normal with very good results.

@chapipo this is a > 4 year old tutorial.Please avoid older content like that.

Some initial tips here: your learning rate is very high. Change that to be lower.

There’s almost no reason to have 5 dense layers. Reduce your number of layers to just 1 and see how your results are first.

By “1” I mean 1 Dense Layer and 1 output layer.
Follow these pages for some more tips:

Ensure your data is normalized if needed as well. (This is needed if it’s not values between 0 and 1.)

Search around on the forums for post like this:

if the docs aren’t helping you. Feel free to ask clarifying questions as well.

1 Like

Thank you for the tuning links. I’ve checked it out and here are the changes I’ve tried:
-reducing the hidden layers to 1
-adding a stochastic gradient descent, changing the learning rate to 1e-3 and 1e-4
-10, 100, or 1000 epochs
-different amount of training data used
-attempting gradient normalization
-changing the loss function
-changing the activation function

Unfortunately, it seems like the data either only recognizes each class standalone, or two at a time.

I had already had this data normalization implemented:

I apologize if I’m missing something or if I’m slow to learn; I’m still a greenhorn at neural networks. Is there anything else I can try?

@chapipo could you post your updated network either in a reply or update your post? Thanks!

1 Like
package org.deeplearning4j.examples.quickstart.modeling.feedforward.classification;
import org.datavec.api.records.reader.RecordReader;
import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
import org.datavec.api.split.FileSplit;
import org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator;
import org.deeplearning4j.nn.api.OptimizationAlgorithm;
import org.deeplearning4j.nn.conf.GradientNormalization;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.conf.layers.DenseLayer;
import org.deeplearning4j.nn.conf.layers.OutputLayer;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
import org.deeplearning4j.examples.utils.DownloaderUtility;
import org.nd4j.evaluation.classification.Evaluation;
import org.nd4j.linalg.activations.Activation;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.dataset.DataSet;
import org.nd4j.linalg.dataset.SplitTestAndTrain;
import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
import org.nd4j.linalg.dataset.api.preprocessor.NormalizerStandardize;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.learning.config.Nesterovs;
import org.nd4j.linalg.learning.config.Sgd;
import org.nd4j.linalg.lossfunctions.LossFunctions;
public class PolioClassifier
    public static void main(String[] args) throws Exception
        char delim = ',';
        RecordReader cohortReader = new CSVRecordReader(0, delim);
        String path = FilenameUtils.concat(System.getProperty("user.home"), "dl4j-examples-data/");
        cohortReader.initialize(new FileSplit(new File(path, "temp2.txt")));

        DataSetIterator iterate = new RecordReaderDataSetIterator(cohortReader, 1097, 5, 3);
        DataSet data =;
        SplitTestAndTrain training = data.splitTestAndTrain(0.5);

        DataSet dataToTrain = training.getTrain();
        DataSet dataToTest = training.getTest();

        DataNormalization toNormalize = new NormalizerStandardize();;

        long val = 123;

        MultiLayerConfiguration network = new NeuralNetConfiguration.Builder()
                .updater(new Sgd(0.01))
                .layer(new DenseLayer.Builder().nIn(5).nOut(3)
                .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)

        MultiLayerNetwork neuralnet = new MultiLayerNetwork(network);
        /*neuralnet.setListeners(new ScoreIterationListener(100));
        for(int i = 0; i < 3000; i++)
        for(int i = 0; i < 40; i++)
                dataToTest =;

        Evaluation eval = new Evaluation(3);
        INDArray output = neuralnet.output(dataToTest.getFeatures());
        eval.eval(dataToTest.getLabels(), output);

The dataset “temp2.txt” refers to the first link I sent in my post.
Thank you!

@chapipo lower your learning rate.
Get rid of your l2.
Change your optimizer/updater to adam.

Lastly, how big is your dataset?
If the whole thing fits in memory then our default minibatch knob (where we normalize the gradients by the batch size) prevents the network from fully learning.

That’s a reasonable default for most reasonable problems but not for toy problems when people first get started with something like 100 examples or something.

Set minibatch(false) at the top part of the configuration if that’s the case.

The first link in my post shows the data set I used with 1097 samples.

This dataset contains 1097 inputs, but 90% of these inputs are toward class 2

Multiple threads indicate that a biased dataset will result in the error, so I tried a different approach:
This dataset contains 143 inputs, where all three classes are evenly distributed
However I appear to get a similar or worse result

Interestingly enough, I have taken your advice and if I set the learning rate to 0.004:

When I set the learning rate to 0.005:

When I set the learning rate to 0.006:

Keeping the learning rate at 0.004, I tried changing other settings, but then the same issue with class exclusion happens.

E.g: using the Sigmoid activation function:

@chapipo just to get rid of some variables try to simplify your training pipeline a bit. Remove the split test and train and see if your model can learn from the whole dataset.
This is mainly to see if your model can overfit first.

One other question I have…what’s the breakdown of your dataset? It could be imbalanced classes. Sometimes you spend hours tuning just to realize your dataset is the problem.

Often times people dont’ spend enough time ensuring their datasets actually have signal.

If your model can overfit at least it can learn something.

Beyond that I would suggest potentially using the UI and visualizing the training as well. UI examples here:

It overfits.

I observe my dataset is imbalanced and skewed towards one specific class, is there anything within the neural network I can do to get around this limitation?

Thank you for the UI visualization tools, I will take a look at them shortly.

@chapipo that’s definitely your problem then.Could you give a breakdown of your label distribution? For under represented classes you’ll want to generate a dataset using something like over or under sampling(basically either balance your dataset or repeatedly sample from the underrepresented label)

If that doesn’t work then weighted loss functions also come to mind. This example should help:

Heavily skewed, in fact, but unfortunately this is the only data set I can gather from the source I want to test:
42/1097 = 0
47/1097 = 1
1008/1097 = 2

I have followed your advice on repeatedly using the same sample from the underrepresented classes, and:

Having modified the epochs, I can now reach 60%.

@chapipo keep going with weighted loss functions and resampling. You’re headed in the right direction!