Feature Mask application in custom SameDiff model

treo · December 14, 2020, 7:43pm

You first need to understand how feature masking in general works: It effectively zeros out “useless” output.

In your scenario you are masking sequences so you can work with different sequence lengths in the same minibatch. This means that you only ever need to apply masking when it actually makes a difference.

That is also the reason why the attention ops care about a mask, as they need to know where they need to ignore the results when sequence spanning calculations are necessary.

If you are applying just direct transformations on each step individually, you don’t actually need to apply masking at that point, because the useless results do not influence the other steps.

In that case, you only need to apply masking at the loss step. How exactly that should be done will depend on the loss function you are using, but it is usually just a simple mul (an element-wise multiplication) that is used to apply the mask.

Topic		Replies	Views
Cannot do LSTM -> Dense in Sequence Classification DL4J	6	1685	April 5, 2020
Methods for dropping out inputs (features) during post-fit Evaluation? DL4J	7	584	May 5, 2021
RNN with simple dense function SameDiff	15	397	January 23, 2023
Masking for different dates in different examples (using LSTM) DL4J	0	378	November 3, 2020
Build Customer Loss Function DL4J	3	1491	July 31, 2020

Feature Mask application in custom SameDiff model

Related topics