Hello,
i found a lot of examples to classify text with the selfattention layer. Is there Already the possibility to train token wise. Like in BertIterator.Task.UNSUPERVISED. Would appreciate any code examples.
I tried to create a simple Learning Graph with Attention for Tokens like this:
final NeuralNetConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
.updater(new RmsProp(lr))
.weightInit(WeightInit.XAVIER)
final GraphBuilder graphBuilder = builder.graphBuilder()
.backpropType(BackpropType.Standard)
.addInputs("inputLine" ,"inputPos")
.setInputTypes(InputType.recurrent(vocabSize) ,InputType.recurrent(maxLen))
.addLayer("embedding",
new EmbeddingSequenceLayer.Builder()
.nIn(vocabSize)
.activation(Activation.IDENTITY)
.nOut(EMBEDDING_WIDTH)
.build(),
"inputLine")
.addLayer("posEmbedding", new EmbeddingSequenceLayer.Builder()
.nIn(maxLen)
.activation(Activation.IDENTITY)
.nOut(EMBEDDING_WIDTH)
.build(), "inputPos")
.addLayer("attention1", new SelfAttentionLayer.Builder()
.projectInput(true)
.nHeads(4)
.nOut(HIDDEN_LAYER_WIDTH)
.build(), "embedding","posEmbedding")
.addLayer("output", new RnnOutputLayer.Builder()
.nOut(vocabSize)
.dataFormat(RNNFormat.NCW)
.activation(Activation.SOFTMAX)
.lossFunction(LossFunction.NEGATIVELOGLIKELIHOOD)
.build(), "attention1")
.setOutputs("output");
model = new ComputationGraph(graphBuilder.build());
model.init();
I create a BertTokenIterator for Unsupervised learning and the gradients constantly explodes and i get only NaN Results.
Thanks in advance.
Thomas