Can you please point me to end to end examples that would use transformer layer and multihead attention with dl4j.


Trying to build following example: using dl4j

All attention models can be found here:

I would also suggest running keras import to see if that fits your use case:

Thanks for the response - I am trying to build attention/transformer natively using java only. I couldn’t find any example that would show how to do that.
Are you suggesting/encouraging that attention models be implemented in keras, and loaded as keras layer in dl4j something like a hybrid approach?