I implemented a simple model in both DL4J and Keras. This is a model with one dense layer with relu, and another dense output layer with softmax. Everything is exactly the same, hyper parameters (batch size, epoch, early stopping, learning rate, etc.). Dataset is also the same.
However, I got about 61% accuracy in DL4J, but 63% accuracy in Keras. Is this normal, and is there anyway to fix it? Also I tried a more complex model, the difference gets bigger (76% in DL4J and 83% in Keras). So I guess as the model becomes more complex, the gap builds up. And model trained by DL4J becomes worse than Keras. Is there a way to fix it?
@XJ8 we’d need to reproduce your issue to know. Whether you’re aware, dl4j is capable of importing keras models. We have hundreds of tests covering this. Same for tensorflow.
I doubt the difference is from invalid computations. Chances are it’s a subtle difference in the way you’re setting the problem up.
A common problem people run in to is they run a toy problem of something small like 5 examples and have reduced accuracy on that toy example rahte rthan running a normal benchmark like mnist.
If you’re running a similar problem, then make sure to set minibatch to false in your configuration.
Beyond that, you can’t expect us to know what the difference or problem might be unless you give us the same code you are running to see for ourselves.
@XJ8 I’ll have to let this run for a bit. I changed the script to use keras import as well as loading the numpy arrays directly saved from the numpy file to reduce interop issues. I’ll compare the results tomorrow.
@XJ8 if you have any other workloads like that, I’m happy to take a look. The more examples we get out there of reproducing issues like this the better. Like I said before, we have reasonable compatibility with keras testing hundreds of different model configurations. If anything breaks, we’d like to know about it. Thanks!
@agibsonccc Thank you, Adam. I got 63%+ accuracy by just setting double, without using npy dataset or importing keras model.
There is an issue you may want to look into. After a model is trained and saved with TransferLearning class, in the json file of the model structure, the preprocess layer gets changed with adding the data size. So the saved model can’t be used directly. For example,