Attributes normalization

ramarro123 · May 5, 2021, 4:49pm

Hello,

a general question about attributes normalization.

i do something like

DataNormalization normalizer = new NormalizerStandardize();
normalizer.fit(trainingData);
normalizer.transform(trainingData);
normalizer.transform(testData);

to normalize my dataset, but i am wondering if it have to be done always, not at all or only on specific case.

Now, this is my scenario, used just as a test case

temp,pressure,humidity,wind_speed,prec,prediction
29,1200,40,18,0,A
27,1200,23,14,1,B
21,1200,33,33,3,C

since the data are “related” on cols, and the scale it’s veryt different from one cols to another (unit of measure it’s different, like pressure and temperature and windspeed) should i normalize this dataset or convert it in some way? How and when should data normalization be used?

treo · May 6, 2021, 7:01am

Yes normalizing the data will be necessary. That is because for most activation functions the sensitive region is between -1 and 1, and everything beyond that is saturated, which makes training very hard, as you have almost no gradient to work with.

As for the specific normalization you want to use, that may depend on the data (see also Quickstart with Deeplearning4J – dubs·tech). With NormalizerStandardize, the normalization you get will be to calculate the statistics for each column and then normalize each column in such a way that it has a zero mean and unit variance (μ=0, σ=1).

That has an interesting side effect if a column always has the same value (as you maybe have in the pressure column). Because it is moving the mean value to zero, that feature is effectively dropped (see Methods for dropping out inputs (features) during post-fit Evaluation? - #8 by treo for the explanation of the math for zero features).

When you are running a classification, it will leave the labels alone. If you were running a regression, you’d also have to set fitLabels(true) on the normalizer.

Topic		Replies	Views
How to data normalization for INDArray/Array ND4J	4	1658	March 9, 2020
Neural Network Normalization: Unsupported data rank DL4J	1	130	February 6, 2024
How to do denormalization? DL4J	2	52	September 18, 2024
Does NormalizerStandardize.transform(dataSet) support multi-threads? Tuning Help	3	139	March 8, 2024
Normalization with DL4J on Spark DL4J	0	346	May 11, 2020

Attributes normalization

Related topics