SelfAttention Token Training Example

thomas · December 23, 2022, 3:56pm

Hello,

i found a lot of examples to classify text with the selfattention layer. Is there Already the possibility to train token wise. Like in BertIterator.Task.UNSUPERVISED. Would appreciate any code examples.

I tried to create a simple Learning Graph with Attention for Tokens like this:

    final NeuralNetConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
    	.updater(new RmsProp(lr))
        .weightInit(WeightInit.XAVIER)

    final GraphBuilder graphBuilder = builder.graphBuilder()
        .backpropType(BackpropType.Standard)
        .addInputs("inputLine" ,"inputPos")
        .setInputTypes(InputType.recurrent(vocabSize) ,InputType.recurrent(maxLen))
        .addLayer("embedding",
                new EmbeddingSequenceLayer.Builder()
                    .nIn(vocabSize)
                    .activation(Activation.IDENTITY)
                    .nOut(EMBEDDING_WIDTH)
                    .build(),
                "inputLine")
        
        .addLayer("posEmbedding", new EmbeddingSequenceLayer.Builder()
        		.nIn(maxLen)
        		.activation(Activation.IDENTITY)
        		.nOut(EMBEDDING_WIDTH)
        		.build(), "inputPos")
        
        .addLayer("attention1", new SelfAttentionLayer.Builder()
        		.projectInput(true)
        		.nHeads(4)
        		.nOut(HIDDEN_LAYER_WIDTH)
        		.build(), "embedding","posEmbedding")
        
        .addLayer("output", new RnnOutputLayer.Builder()
        		.nOut(vocabSize)
        		.dataFormat(RNNFormat.NCW)
        		.activation(Activation.SOFTMAX)
        		.lossFunction(LossFunction.NEGATIVELOGLIKELIHOOD)
        		.build(), "attention1")
        
        .setOutputs("output");

    model = new ComputationGraph(graphBuilder.build());
    model.init();

I create a BertTokenIterator for Unsupervised learning and the gradients constantly explodes and i get only NaN Results.

Thanks in advance.

Thomas

agibsonccc · December 30, 2022, 4:59am

@thomas sorry for the late reply. Holidays and all. Let me get back to you with an example. You’ll probably want to use samediff though. You’ll want to use our BertIterator like here: deeplearning4j-examples/BertInferenceExample.java at master · deeplearning4j/deeplearning4j-examples · GitHub

If you need something more specific, could you try to elaborate a bit? Thanks!

thomas · January 5, 2023, 2:24pm

@agibsonccc thanks for your reply. The examples i found before, they are all for Sequence Classification not Token Classification. My problem ist that my model in 1000 tries and parameter setups every time crashes when i train with BertIterator and UNSUPERSIVED. I programmed my own iterator now.

For you a tipp, i got similar crashes from time to time with my iterator until i saw that i used in my iteratore something like this:

		// set input vector 
		input.put(new INDArrayIndex[] { NDArrayIndex.point(j), NDArrayIndex.point(0), NDArrayIndex.interval(0, inLen) },
                Nd4j.create(ArrayUtils.toPrimitive(masked.toArray(new Double[0]))));

I thought it would be ok if the List (masked) is longer then the “inLen” variable here, as long as it is not longer then the given vector. But as soon as i cut the List (masked) to the exact length == inLen the crashes where gone.

best regards

Thomas

Topic		Replies	Views
BertIterator for own model to train	2	375	October 11, 2021
Transformer's Encoder with Self-Attention layers SameDiff	21	2390	December 14, 2020
Attention Layer DL4J	4	557	December 27, 2020
Attention and Pooling Problem with Merge on Backpropagation DL4J	9	780	October 30, 2021
BERT model to Deeplearning4j DL4J	1	787	March 13, 2020

SelfAttention Token Training Example

Related topics