Incomplete Confusion Matrix

Hi!
My confusion matrix is looking a little bit… confusing

=========================Confusion Matrix=========================
   0   1
---------
 506   0 | 0 = 0
   0   0 | 1 = 1

As you can see, I have just 2 classes, 0 and 1. And although my data set (a csv file), which I split into training and test data sets has 2354 samples (lines) with class 0 (last index in each line) and 2354 samples with class 1, the resulting confusion matrix does not show any samples of class 1.
I mean regardless of how bad the hyperparameters are, there should be entries in the bottom row of the confusion matrix, no?
This is my dataset: ZeroBin.net

I also get these warnings:
Warning: 1 class was never predicted by the model and was excluded from average precision
Classes excluded from average precision: [1]
Warning: 1 class was never predicted by the model and was excluded from average recall
Classes excluded from average recall: [1]

Here is the code:

int numLinesToSkip = 0;
char delimiter = ',';
String datasetPath = "dataset.csv";

int numClasses = 1;
double trainPercentage = 0.7;
int nEpochs = 5000;
int batchSize = 512;

RecordReader recordReader = new CSVRecordReader(numLinesToSkip,delimiter);
recordReader.initialize(new FileSplit(new File(datasetPath)));

// Build a Input Schema
Schema inputDataSchema = new Schema.Builder()
    .addColumnsFloat("speed","mean_acc_x","mean_acc_y","mean_acc_z","std_acc_x","std_acc_y","std_acc_z")
        .addColumnDouble("sma")
        .addColumnFloat("mean_svm")
        .addColumnsDouble("entropyX","entropyY","entropyZ")
        .addColumnsInteger("bike_type","phone_location","incident_type")
        .build();

// Make the necessary transformations
TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
        .integerToOneHot("bike_type",0,8) // bike_type can have values from 0 to 8
        .integerToOneHot("phone_location",0,6) // phone_location can have values from 0 to 6
        .build();

// Get output schema
Schema outputSchema = tp.getFinalSchema();

//Second: the RecordReaderDataSetIterator handles conversion to DataSet objects, ready for use in neural network
int labelIndex = outputSchema.getColumnNames().size() - 1;     //15 values in each row of the dataset.csv; CSV: 14 input features followed by an integer label (class) index. Labels are the 15th value (index 14) in each row

TransformProcessRecordReader transformProcessRecordReader = new TransformProcessRecordReader(recordReader,tp);
DataSetIterator iterator = new RecordReaderDataSetIterator(transformProcessRecordReader,batchSize,labelIndex,numClasses);
DataSet allData = iterator.next();
allData.shuffle();
SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(trainPercentage);  //Use 70% of data for training

DataSet trainingData = testAndTrain.getTrain();
DataSet testData = testAndTrain.getTest();

//We need to normalize our data. We'll use NormalizeStandardize (which gives us mean 0, unit variance):
DataNormalization normalizer = new NormalizerStandardize();
normalizer.fit(trainingData);           //Collect the statistics (mean/stdev) from the training data. This does not modify the input data

normalizer.transform(trainingData);     //Apply normalization to the training data
normalizer.transform(testData);         //Apply normalization to the test data. This is using statistics calculated from the *training* set

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .l2(1e-4)
        .updater(new Sgd(0.1))
        .list()
        .layer(new DenseLayer.Builder()
                .nIn(labelIndex)
                .nOut(2000)
                .activation(Activation.RELU)
                .build())
        .layer(new DenseLayer.Builder()
                .nIn(2000)
                .nOut(2000)
                .activation(Activation.RELU)
                .build())
        .layer(new DenseLayer.Builder()
                .nIn(2000)
                .nOut(2000)
                .activation(Activation.RELU)
                .build())
        .layer(new DenseLayer.Builder()
                .nIn(2000)
                .nOut(2000)
                .activation(Activation.RELU)
                .build())
        .layer(new DenseLayer.Builder()
                .nIn(2000)
                .nOut(2000)
                .activation(Activation.RELU)
                .build())
        .layer(new OutputLayer.Builder()
                .nIn(2000)
                .nOut(numClasses)
                .activation(Activation.SIGMOID)
                .lossFunction(LossFunctions.LossFunction.XENT)
                .build())
        .build();
 MultiLayerNetwork model = new MultiLayerNetwork(conf);
 model.init();
 model.setListeners(new ScoreIterationListener(100));
 for(int i=0; i<conf.getEpochCount(); i++) {
    model.fit(trainingData);
 }
 // Evaluate the model on the test set
 Evaluation eval = new Evaluation(numClasses);
 INDArray output = model.output(trainingData.getFeatures());
 eval.eval(trainingData.getLabels(), output);

If your network learns to always predict one class it will result in a confusion matrix like that.

With your network setup (very bad for the given data!), it isn’t much of a surprise.

But there are even more problems with your code.

This is not doing what you think it is doing.
What you get here is batchsize of examples. Then you shuffle that single batch and split it into training and test data.

In your link it looks like your data is sorted by label, so now you only have examples for a single label.

Even if your model would be able to learn something useful, it can simply learn to always say it is label 0 and have perfect training results.

And that already explains the confusion matrix you see.

So you’ll need to load your data differently. The easiest way, if you want to stay with CSV files is to just put each data point into a single file (e.g. quickstart-with-dl4j/Step0_PrepareData.java at master · treo/quickstart-with-dl4j · GitHub) and split the data at that point.

Then you can follow the description for data loading at Quickstart with Deeplearning4J – dubs·tech

The .splitTestAndTrain function on the DataSet (which is actually a Mini batch) only ever makes sense if you actually load your entire data set in memory, but then you wouldn’t be using mini batches for your training, which has other drawbacks. I personally think that it should be deprecated and removed before 1.0 is released, as it invites errors like yours to happen.

Let’s move on to the model, even though your main problem right now is loading the data properly.

Other than the way too big learning rate, too large layers and too many of them, the biggest problem is likely to be your combination of activation function and loss function.

You have two classes and you want one or the other. But you set it up in a way that allows both to be predicted at the same time. That way you don’t get a probability distribution on your output and your evaluation is essentially meaningless.

If you want to keep everything else the same, at least change your activation function to Softmax and your loss function to multi class cross entropy. The results will be still bad because of the other parameters, but at least you give it some chance to learn a proper probability distribution.