Transfomer neural network (TNN)

Are there any samples somewhere of how to implement a transformer neural net (TNN) in DL4J

@gerrie-myburgh PFA

partarstu/transformers-in-java: Experimental project for AI and NLP based on Transformer Architecture (

@agibsonccc @treo please share any more resources for the same,


@Ujjwal2805 I’ll publish some better examples after the release. Unfortunately still in the middle of a clean up of the code base yet.

look forward to it, transformer is really popular right now.

Yeah of course! Unfortunately cuda technical debt has been the main concern lately.
I’ve also been trying to make the test suite more robust. I ran in to many limitations when running the tests. They couldn’t reliably run even with a 4090 due to the number of kernel launches. I want to make sure M3 won’t have any surprises. I want to make sure that’s under control then a lot of focus will go in to transfrormers.