It is hard to really tell with the amount of details you are sharing. What exactly have you tried? Do you have anything we can use to reproduce your case?
Sometimes, you may get low utilization because your data loading is just too slow to keep up. But usually I see that more with Python frameworks.
Another reason would be that you’re not using the CuDNN variant, so it doesn’t use the more optimized LSTM codepath.
Anyway, that mgubaidullin.github.io page looks like a very old fork of the old DL4J documentation, so anything you read there is likely going to be out of date.
Tried the following examples from dl4j-examples:
AdditionModelWithSeq2Seq - LSTM - 25% utilization
SequenceAnomalyDetection - LSTM - 22% utilization
TrainLotteryModelSeqPrediction- LSTM - 26% utilization
TinyYoloHouseNumberDetection - CNN - 95% utilization
It seems that it’s purely LSTM issue, CNN leverages the GPU very well.
I skipped installing display driver in the Nvidia CUDA Toolkit installer since my current version 511.23 is more recent than that in the installer 460.89, I have only installed CUDA related components, not sure if it somehow relates to my issue.
As of right now it doesn’t contain an example implementation, but you should be able to import a tensorflow or onnx based model rather easily on the current snapshots.