I found the code confuse me in the BatchNormalization class

1:In the method backpropGradient

//dL/dmu

val effectiveBatchSize = input.size(0) * input.size(hIdx) * input.size(wIdx);

INDArray dxmu1 = dxhat.sum(nonChDims).divi(std).negi();

INDArray dxmu2 = xMu.sum(nonChDims).muli(-2.0 / effectiveBatchSize).muli(dLdVar);

INDArray dLdmu = dxmu1.addi(dxmu2);

why calculate the dxmu2?, It is a zero.

2:In the method backpropGradient UseLogStd comments

//Use log10(std) parameterization. This is more numerically stable for FP16 and better for distributed training

//First: we have log10(var[i]) from last iteration, hence can calculate var[i] and stdev[i]

//Need to calculate log10{std[i]) - log10(std[i+1]) as the “update”

//Note, var[i+1] = d*var[i] + (1-d)*batchVar

//Need to calculate log10{std[i]) - log10(std[i+1]) as the “update” shouble be

//Need to calculate log{std[i]) - log(std[i+1])?

Thank somebody help me!