Transformer/Attention-NLP

data-llectual · July 21, 2020, 5:44pm

Hello,

Can you please point me to end to end examples that would use transformer layer and multihead attention with dl4j.

Thanks
Ravi.

data-llectual · July 21, 2020, 8:10pm

Trying to build following example: Text classification with Transformer using dl4j

agibsonccc · July 23, 2020, 4:04am

All attention models can be found here:

I would also suggest running keras import to see if that fits your use case:
https://deeplearning4j.konduit.ai/keras-import/overview

data-llectual · July 23, 2020, 6:46pm

Thanks for the response - I am trying to build attention/transformer natively using java only. I couldn’t find any example that would show how to do that.
Are you suggesting/encouraging that attention models be implemented in keras, and loaded as keras layer in dl4j something like a hybrid approach?

agibsonccc · November 22, 2020, 4:32am

@data-llectual (sorry just catching up on the forum a bit and just realized I didn’t get a notification for this, I’ll just answer this for future readers), your best bet in this case would be to look at some of our test cases:

github.com

eclipse/deeplearning4j/blob/881a672fa13f3b37177eb2ea023e39b0de893645/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/nn/transferlearning/TransferLearningMLNTest.java#L705


      
          
          
    assertEquals("Incorrect number of outputs!", 5 , newNet.layerSize(0));
              assertEquals("Incorrect number of inputs!", 5, newNet.layerInputSize(2));
              newNet.output(input);
          }
          
          

          
@Test
          public void testTransferLearningSameDiffLayers(){
          
          
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                      .dataType(DataType.DOUBLE)
                      .activation(Activation.TANH)
                      .updater(new Adam(0.01))
                      .weightInit(WeightInit.XAVIER)
                      .list()
                      .layer(new LSTM.Builder().nOut(8).build())
                      .layer( new SelfAttentionLayer.Builder().nOut(4).nHeads(2).projectInput(true).build())
                      .layer(new GlobalPoolingLayer.Builder().poolingType(PoolingType.MAX).build())
                      .layer(new OutputLayer.Builder().nOut(2).activation(Activation.SOFTMAX)
                              .lossFunction(LossFunctions.LossFunction.MCXENT).build())

We need to build out docs, but there are actually layers for this:

Topic		Replies	Views
"Attention Is All You Need" model implementation using dl4j DL4J	55	1205	June 9, 2023
Attention Layer DL4J	4	557	December 27, 2020
Basic deeplearning4j classification example DL4J	4	1001	February 3, 2020
Chatbot with DL4J DL4J	2	515	August 21, 2022
BERT model to Deeplearning4j DL4J	1	787	March 13, 2020

Transformer/Attention-NLP

Related topics