Lower accuracy for a simple model trained by DL4J than Keras

I implemented a simple model in both DL4J and Keras. This is a model with one dense layer with relu, and another dense output layer with softmax. Everything is exactly the same, hyper parameters (batch size, epoch, early stopping, learning rate, etc.). Dataset is also the same.

However, I got about 61% accuracy in DL4J, but 63% accuracy in Keras. Is this normal, and is there anyway to fix it? Also I tried a more complex model, the difference gets bigger (76% in DL4J and 83% in Keras). So I guess as the model becomes more complex, the gap builds up. And model trained by DL4J becomes worse than Keras. Is there a way to fix it?


@XJ8 we’d need to reproduce your issue to know. Whether you’re aware, dl4j is capable of importing keras models. We have hundreds of tests covering this. Same for tensorflow.

I doubt the difference is from invalid computations. Chances are it’s a subtle difference in the way you’re setting the problem up.

A common problem people run in to is they run a toy problem of something small like 5 examples and have reduced accuracy on that toy example rahte rthan running a normal benchmark like mnist.
If you’re running a similar problem, then make sure to set minibatch to false in your configuration.

Beyond that, you can’t expect us to know what the difference or problem might be unless you give us the same code you are running to see for ourselves.

@agibsonccc Thank you very much, Adam.

I will use DL4J for on-device training. This is a sanity check to make sure the model trained with DL4J will have the same performance.

If you have time, please take a look at my code, and dataset. The dataset is imbalanced, but in decent size.

Thanks again.

@XJ8 I’ll have to let this run for a bit. I changed the script to use keras import as well as loading the numpy arrays directly saved from the numpy file to reduce interop issues. I’ll compare the results tomorrow.

@XJ8 I reproduced this but found it should just be numerical precision.
Please set:


before training. Nd4j’s default data type is float. Tensorflow’s is double.

I got keras and dl4j to within 0.03 of each other before earlystopping converges.
That should be the source of your problem.

Just of note for maximum reporudicability, I also directly loaded the numpy arrays used in your script:

  INDArray x = Nd4j.createFromNpyFile(new File("x.npy"));
        INDArray y = Nd4j.createFromNpyFile(new File("y.npy"));
        DataSet dataSet = new org.nd4j.linalg.dataset.DataSet(x,y);
        List<org.nd4j.linalg.dataset.DataSet> dataSets = dataSet.batchBy(batchSize);
        ListDataSetIterator trainIter = new ListDataSetIterator(dataSets);
        //DataSetIterator trainIter = new RecordReaderDataSetIterator(rr,batchSize,0,numOutputs);

        //Load the test/evaluation data:
        INDArray xTest = Nd4j.createFromNpyFile(new File("x_test.npy"));
        INDArray yTest = Nd4j.createFromNpyFile(new File("y_test.npy"));
        DataSet dataSetTest = new org.nd4j.linalg.dataset.DataSet(xTest,yTest);
        ListDataSetIterator testIter = new ListDataSetIterator(dataSetTest.batchBy(batchSize));

For your model, I used our keras import:

        MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights("initial_model.hdf5");

Thanks a lot, Adam.

I will try it again.

@XJ8 if you have any other workloads like that, I’m happy to take a look. The more examples we get out there of reproducing issues like this the better. Like I said before, we have reasonable compatibility with keras testing hundreds of different model configurations. If anything breaks, we’d like to know about it. Thanks!

@agibsonccc Thank you, Adam. I got 63%+ accuracy by just setting double, without using npy dataset or importing keras model.

There is an issue you may want to look into. After a model is trained and saved with TransferLearning class, in the json file of the model structure, the preprocess layer gets changed with adding the data size. So the saved model can’t be used directly. For example,

“preProcessor” : {

“inputShape” : [ 300 ],
“targetShape” : [ 3, 100 ],
“format” : null


“preProcessor” : {

“inputShape” : [ 13, 300 ],
“targetShape” : [ 13, 3, 100 ]

Nothing major, just using the original model structure would be fine.

@XJ8 sorry just saw this. Could you file an issue?