Classifier indefinitely skips prediction on certain classes

chapipo · February 14, 2023, 5:09am

Hello, I’ve checked out multiple threads pertaining to the same issue (having been 3 years dated), yet it appears that regardless of the data set I attempt to input as well as the tuning I do, the classifier continues to skip classes at random.

I decided to create a classifier identical to the IrisClassifier that can be found in the DL4J examples repository:

with the following modifications:
-changing the iterations
-changing the activation functions
-updating the learning rate
-removing/modifying l2 regularization
-changing the loss function
-the amount of data that is used for training vs. testing

Here is the data set I’ve primarily tried to input:
This dataset contains 1097 inputs, but 90% of these inputs are toward class 2
which displays the following result:

Multiple threads indicate that a biased dataset will result in the error, so I tried a different approach:
This dataset contains 143 inputs, where all three classes are evenly distributed
However I appear to get a similar or worse result:

From what I’ve tried, nothing allows a run where all classes are considered. I’ve hit a wall and have absolutely no idea how to continue. Any help is greatly appreciated.

agibsonccc · February 14, 2023, 5:34am

@chapipo could you clarify what version you’re using? Tuning results vary WILDLY form any individual neural network, dataset, and version of the library.

chapipo · February 14, 2023, 11:22am

Thank you for your reply. The library was grabbed here using git bash and installed with Apache Maven 3.6.3 following this tutorial. The IrisClassifier works perfectly normal with very good results.

agibsonccc · February 14, 2023, 12:24pm

@chapipo this is a > 4 year old tutorial.Please avoid older content like that.

Some initial tips here: your learning rate is very high. Change that to be lower.

There’s almost no reason to have 5 dense layers. Reduce your number of layers to just 1 and see how your results are first.

By “1” I mean 1 Dense Layer and 1 output layer.
Follow these pages for some more tips:

Ensure your data is normalized if needed as well. (This is needed if it’s not values between 0 and 1.)

Search around on the forums for post like this:

if the docs aren’t helping you. Feel free to ask clarifying questions as well.

chapipo · February 15, 2023, 2:18am

Thank you for the tuning links. I’ve checked it out and here are the changes I’ve tried:
-reducing the hidden layers to 1
-adding a stochastic gradient descent, changing the learning rate to 1e-3 and 1e-4
-10, 100, or 1000 epochs
-different amount of training data used
-attempting gradient normalization
-changing the loss function
-changing the activation function

Unfortunately, it seems like the data either only recognizes each class standalone, or two at a time.

I had already had this data normalization implemented:

I apologize if I’m missing something or if I’m slow to learn; I’m still a greenhorn at neural networks. Is there anything else I can try?

agibsonccc · February 15, 2023, 7:49am

@chapipo could you post your updated network either in a reply or update your post? Thanks!

chapipo · February 15, 2023, 7:29pm

package org.deeplearning4j.examples.quickstart.modeling.feedforward.classification;
import org.apache.commons.io.FilenameUtils;
import org.datavec.api.records.reader.RecordReader;
import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
import org.datavec.api.split.FileSplit;
import org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator;
import org.deeplearning4j.nn.api.OptimizationAlgorithm;
import org.deeplearning4j.nn.conf.GradientNormalization;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.conf.layers.DenseLayer;
import org.deeplearning4j.nn.conf.layers.OutputLayer;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
import org.deeplearning4j.examples.utils.DownloaderUtility;
import org.nd4j.evaluation.classification.Evaluation;
import org.nd4j.linalg.activations.Activation;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.dataset.DataSet;
import org.nd4j.linalg.dataset.SplitTestAndTrain;
import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
import org.nd4j.linalg.dataset.api.preprocessor.NormalizerStandardize;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.learning.config.Nesterovs;
import org.nd4j.linalg.learning.config.Sgd;
import org.nd4j.linalg.lossfunctions.LossFunctions;
import java.io.File;
public class PolioClassifier
{
    public static void main(String[] args) throws Exception
    {
        char delim = ',';
        RecordReader cohortReader = new CSVRecordReader(0, delim);
        String path = FilenameUtils.concat(System.getProperty("user.home"), "dl4j-examples-data/");
        cohortReader.initialize(new FileSplit(new File(path, "temp2.txt")));

        DataSetIterator iterate = new RecordReaderDataSetIterator(cohortReader, 1097, 5, 3);
        DataSet data = iterate.next();
        data.shuffle();
        SplitTestAndTrain training = data.splitTestAndTrain(0.5);

        DataSet dataToTrain = training.getTrain();
        DataSet dataToTest = training.getTest();

        DataNormalization toNormalize = new NormalizerStandardize();
        toNormalize.fit(dataToTrain);
        toNormalize.transform(dataToTest);
        toNormalize.transform(dataToTest);

        long val = 123;

        MultiLayerConfiguration network = new NeuralNetConfiguration.Builder()
                .seed(val)
                //.activation(Activation.SIGMOID)
                .activation(Activation.TANH)
                //.activation(Activation.RELU)
                .weightInit(WeightInit.XAVIER)
                .updater(new Sgd(0.01))
                .l2(1e-4)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                /*.gradientNormalization(GradientNormalization.RenormalizeL2PerParamType)
                .gradientNormalizationThreshold(0.)*/
                .list()
                .layer(new DenseLayer.Builder().nIn(5).nOut(3)
                        .build())
                .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .activation(Activation.SOFTMAX)
                        .nIn(5).nOut(3).build())
                .build();

        MultiLayerNetwork neuralnet = new MultiLayerNetwork(network);
        neuralnet.init();
        /*neuralnet.setListeners(new ScoreIterationListener(100));
        for(int i = 0; i < 3000; i++)
        {
            neuralnet.fit(dataToTrain);
        }*/
        for(int i = 0; i < 40; i++)
        {
            while(iterate.hasNext())
            {
                dataToTest = iterate.next();
            }
            neuralnet.fit(dataToTest);
        }

        Evaluation eval = new Evaluation(3);
        INDArray output = neuralnet.output(dataToTest.getFeatures());
        eval.eval(dataToTest.getLabels(), output);
        System.out.println(eval.stats());
    }
}

The dataset “temp2.txt” refers to the first link I sent in my post.
Thank you!

agibsonccc · February 16, 2023, 2:52am

@chapipo lower your learning rate.
Get rid of your l2.
Change your optimizer/updater to adam.

Lastly, how big is your dataset?
If the whole thing fits in memory then our default minibatch knob (where we normalize the gradients by the batch size) prevents the network from fully learning.

That’s a reasonable default for most reasonable problems but not for toy problems when people first get started with something like 100 examples or something.

Set minibatch(false) at the top part of the configuration if that’s the case.

chapipo · February 16, 2023, 4:30am

The first link in my post shows the data set I used with 1097 samples.

This dataset contains 1097 inputs, but 90% of these inputs are toward class 2

Multiple threads indicate that a biased dataset will result in the error, so I tried a different approach:
This dataset contains 143 inputs, where all three classes are evenly distributed
However I appear to get a similar or worse result

Interestingly enough, I have taken your advice and if I set the learning rate to 0.004:

When I set the learning rate to 0.005:

When I set the learning rate to 0.006:

Keeping the learning rate at 0.004, I tried changing other settings, but then the same issue with class exclusion happens.

E.g: using the Sigmoid activation function:

agibsonccc · February 16, 2023, 4:47am

@chapipo just to get rid of some variables try to simplify your training pipeline a bit. Remove the split test and train and see if your model can learn from the whole dataset.
This is mainly to see if your model can overfit first.

One other question I have…what’s the breakdown of your dataset? It could be imbalanced classes. Sometimes you spend hours tuning just to realize your dataset is the problem.

Often times people dont’ spend enough time ensuring their datasets actually have signal.

If your model can overfit at least it can learn something.

Beyond that I would suggest potentially using the UI and visualizing the training as well. UI examples here:

chapipo · February 16, 2023, 5:21am

It overfits.

I observe my dataset is imbalanced and skewed towards one specific class, is there anything within the neural network I can do to get around this limitation?

Thank you for the UI visualization tools, I will take a look at them shortly.

agibsonccc · February 16, 2023, 5:25am

@chapipo that’s definitely your problem then.Could you give a breakdown of your label distribution? For under represented classes you’ll want to generate a dataset using something like over or under sampling(basically either balance your dataset or repeatedly sample from the underrepresented label)

If that doesn’t work then weighted loss functions also come to mind. This example should help:

github.com

deeplearning4j/deeplearning4j-examples/blob/686db99fee3d4825ee70663e1a15aa8d6216f2c2/dl4j-examples/src/main/java/org/deeplearning4j/examples/quickstart/features/classimbalance/WeightedLossFunctionExample.java#L61


      
           classes, and 1.0 or larger weights for infrequently occurring classes.
          
          
Training and the data pipelines when using weighted loss functions are identical to not using them, so this example
          shows only how to configure the weighting.
           */
          
          
int numInputs = 4;
          int numClasses = 3;     //3 classes for classification
          
          
//Create the weights array. Note that we have 3 output classes, therefore we have 3 weights
          INDArray weightsArray = Nd4j.create(new double[]{0.5, 0.5, 1.0});
          
          
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
              .activation(Activation.RELU)
              .weightInit(WeightInit.XAVIER)
              .updater(new Sgd(0.1))
              .list()
              .layer(new DenseLayer.Builder().nIn(numInputs).nOut(5)
                  .build())
              .layer(new DenseLayer.Builder().nIn(5).nOut(5)
                  .build())

chapipo · February 16, 2023, 6:12am

Heavily skewed, in fact, but unfortunately this is the only data set I can gather from the source I want to test:
42/1097 = 0
47/1097 = 1
1008/1097 = 2

I have followed your advice on repeatedly using the same sample from the underrepresented classes, and:

Having modified the epochs, I can now reach 60%.

agibsonccc · February 16, 2023, 9:26am

@chapipo keep going with weighted loss functions and resampling. You’re headed in the right direction!

Topic		Replies	Views
Prediction Makes No Sense DL4J	1	33	November 13, 2024
Warning: 1 class was never predicted by the model and was excluded from average precision Classes excluded from average precision: [1] DL4J	1	1327	June 18, 2020
1 class was never predicted by the model and were excluded DL4J	3	1635	May 11, 2020
4 classes were never predicted by the model and were excluded from average precision DL4J	3	1156	April 9, 2020
Incomplete Confusion Matrix DL4J	1	418	May 4, 2021

Classifier indefinitely skips prediction on certain classes

Related topics