Layer Normalization

thomas · January 5, 2023, 2:27pm

Hi,

i wanted to ask i there are currently any development in this direction.

github.com/deeplearning4j/deeplearning4j

More-useful alternative to Batch Norm

opened 08:24AM - 07 Jul 18 UTC

tom-adsfund

Enhancement DL4J

Recent work trying to understand why Batch Norm (BN) works shows (https://arxiv.…org/pdf/1805.11604.pdf) that there are other ways to achieve the same benefits using simpler methods. Importantly in practice, BN can't be used with dropout (unstable). And this prevents getting confidence ranges using dropout on networks using BN. I emailed one of the lead authors of the paper above to learn how to implement the l_1 norm alternative to BN, and here is the strategy: ``` norm = compute_l1_norm(x_1,...,x_n) mean = compute_mean(x_1,...,x_n) xHat_i = (x_i - mean) / norm ``` I presume this would be pretty straight forward to implement, because it's simpler than BN. I'm just not familiar with how you'd actually set this up in DL4J. Aha! Link: https://skymindai.aha.io/features/DL4J-87

Especially for recurrent and attention layer it would be a nice to have option.

Best regards

Thomas

thomas · January 8, 2023, 8:59pm

Ok, i tried base implementation as an SameDiffVertex without learnable parameter with help of:

a short test with SameDiff:

@Override
public SDVariable defineVertex(SameDiff sameDiff, VertexInputs inputs) {
	
    SDVariable x1 = inputs.getInput(0);

    // mean_i = sum(x_i[j] for j in range(k)) / k
    SDVariable mean1 = x1.mean(0);
    // var_i = sum((x_i[j] - mean_i) ** 2 for j in range(k)) / k
    SDVariable var1 = x1.sub(mean1).pow(2).mean(0);
    // x_i_normalized = (x_i - mean_i) / sqrt(var_i + epsilon)
    SDVariable norm1 = x1.sub(mean1).div(sameDiff.math.square(var1.add(1e-10)));

    return norm1;
}

Can anybody give me a short hint how this would map to an RNN Input Matrix. I know i get the base variable inside the defineVertex Method with something like that:

inputs.getInput(0)

or is this already enough if i use: x1 = inputs.getInput(0);

Appreciate any hint.

Best regards

thomas

treo · January 9, 2023, 7:49am

Layer Normalization is already implemented in SameDiff:

github.com

deeplearning4j/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/api/ops/impl/transforms/custom/LayerNorm.java#L41


      
          import org.nd4j.linalg.api.ops.DynamicCustomOp;
          import org.nd4j.shade.guava.primitives.Ints;
          
          
import java.util.Collections;
          import java.util.HashMap;
          import java.util.List;
          import java.util.Map;
          
          

          
@NoArgsConstructor
          public class LayerNorm extends DynamicCustomOp {
          
          
    private boolean noBias = false;
              private boolean channelsFirst;
          
          
    public LayerNorm(@NonNull SameDiff sameDiff, @NonNull SDVariable input, @NonNull SDVariable gain, SDVariable bias, boolean channelsFirst, int... dimensions) {
                  super(null, sameDiff, wrapFilterNull(input, gain, bias), false);
                  this.noBias = bias == null;
                  this.channelsFirst = channelsFirst;
                  setDimensions(dimensions);
              }

You can easily use it with sd.nn.layerNorm (see also https://deeplearning4j.konduit.ai/samediff/reference/operation-namespaces/nn#layernorm)

thomas · January 9, 2023, 3:40pm

Thanks, i will try to integrate into my layer.

Topic		Replies	Views
How to implement Deep Adaptive Input Normalization? DL4J	1	286	December 29, 2021
BatchNormalization layer vs hasLayerNorm in the Dense Layer? DL4J	2	469	August 2, 2021
SameDiff Normalization SameDiff	1	184	September 14, 2023
How to add a BatchNorm layer after a 1D CNN layer? DL4J	3	22	July 31, 2025
BatchNormalization Layer only support single int as nIn and nOut? DL4J	6	450	October 6, 2020

Layer Normalization

Related topics