How to implement Deep Adaptive Input Normalization?

I want to implement DAIN, which is a flexible way to normalise data that is fed to a network, as opposed to the “fixed” normalisation methods like MinMax scaling and alike, and put that in front of an LSTM based network that does take in essence [mini batch size, # features, sequence length] inputs.

What is the best approach to do this?

A python implementation (and link to paper) is at GitHub - passalis/dain: Deep Adaptive Input Normalization for Time Series Forecasting

In extension of this, I was also wondering what is the best approach to implement GitHub - Nicholas-Picini/Temporal-Attention-time-series-analysis: A Temporal Attention augmented Bilinear Network for Directional Prediction of Financial Time Series in DL4J?


The regular normalizer feature of DL4J will be not uesful in this case.

I think the best way to approach it is probably to use a SameDiffLayer and just specify the necessary computations in there.