NVIDIA Tensor Cores Usage


Thanks for the great project. I have used DL4J in the past and I am about to begin again a small project with it. I hope its development will continue.

I have a rather newbie question but I have not come across an answer here yet. It is about the usage of the nvidia tensor cores of the RTX20/30/40 GPUs. On this page (https://developer.nvidia.com/blog/optimizing-gpu-performance-tensor-cores/) it is said that mixed precision must be used and there are some descriptions for other frameworks on how to activate this. I can set to half precision in DL4J. Would that make the training use the cores or how does it work in DL4J if it does?