Transformer's Encoder with Self-Attention layers

treo · December 14, 2020, 7:46pm

I’ve answered the question on when and how to apply a mask in SameDiff over at your other thread: Feature Mask application in custom SameDiff model - #2 by treo

It isn’t necessary for you to do that. They will be automatically applied during backprop with SameDiff when you apply it in the forward pass. The entire point of SameDiff is that it does the entire backprop for you so you don’t have to define it manually.

Topic		Replies	Views
SelfAttention Token Training Example DL4J	2	258	January 5, 2023
Attention and Pooling Problem with Merge on Backpropagation DL4J	9	780	October 30, 2021
Autoencode sanity check DL4J	3	507	January 29, 2023
Creating custom layer in java Deeplearning4J DL4J	1	727	January 14, 2021
LSTM Training stops working in Snapshot	4	793	June 25, 2021

Transformer's Encoder with Self-Attention layers

Related topics