Basic deeplearning4j classification example

It is quite normal for small networks like that to be slower on the GPU since it’s spending most of it’s time coordinating thousands of cores rather than doing the computation. If you use a neural network like YOLO that has over 60,000,000 parameters (and much more computationally intensive convolution operations) on a much larger data set, you’ll see the GPU outperform the CPU.

To see the difference with your data. try a large minibatch size of 1024 or 4096 (as much as fits in your GPU memory) and use try double or triple the dense layers. I can’t guarantee that such an over-parameterized network will be any good, but it should show the performance difference.