I tried a lot of different WeightInits and Activation functions, but I don’t get the network to learn. The gradients are from the initialization in my opinion very small 1e-3 to 1e-5. The starting q-values are okay between +1 and -1 but during training they are getting nearer and nearer to ±0 (1e-3 to 1e-5). Insert to the network are gray scale 1 byte images, with are scaled from 0-255 to 0-1 values.
MultiLayerNetwork model = new MultiLayerNetwork(new NeuralNetConfiguration.Builder()
.seed(seed)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(4.0)
.updater(new Adam(0.00025))
.list()
.layer(new ConvolutionLayer.Builder(8, 8)
.stride(4, 4)
.nIn(CHANNELS)
.nOut(32)
.weightInit(WeightInit.RELU_UNIFORM)
.activation(Activation.LEAKYRELU)
.build())
// .layer(new BatchNormalization())
.layer(new ConvolutionLayer.Builder(4, 4)
.stride(2, 2)
.nOut(64)
.weightInit(WeightInit.RELU_UNIFORM)
.activation(Activation.LEAKYRELU)
.build())
// .layer(new BatchNormalization())
.layer(new ConvolutionLayer.Builder(3, 3)
.stride(1, 1)
.nOut(64)
.weightInit(WeightInit.RELU_UNIFORM)
.activation(Activation.LEAKYRELU)
.build())
// .layer(new BatchNormalization())
.layer(new DenseLayer.Builder()
.nOut(512)
.weightInit(WeightInit.XAVIER)
.activation(Activation.LEAKYRELU)
.build())
.layer(new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
.nOut(GameAction.values().length) // Ausgabe (z.B. 10 Klassen)
.activation(Activation.IDENTITY) // No transformation
.weightInit(WeightInit.NORMAL)
.build())
.setInputType(InputType.convolutional(HEIGHT, WIDTH, CHANNELS)) // Input-Shape definieren
.build());
model.init();
If I comment in the BatchNormatization layers, the the starting q values are much higher and escalate after a couple 1000 iterations.
I talked a lot about the Network configuration with ChatGPT. The discussion goes forward and backward and in the end tried all his advises, but nothing made it better.
Q value calculation I already verified with other little games like ConnectFour and TicTacToe.
Can you give me an advise, how I can optimize the Network configuration.
And I have another question in that context.
For Q-Value update I first do an output() with the actual state and then an output with the next state. Getting the max q-Value for the next state. Calculate the target value like:
double target = transition.getReward() + q * getGamma();
Then I replace in the Q-Vaue of the taken action in the current state and teach this vector back. All like it is in the literature.
But I wonder, if I can use the “labelMask” and just teaching back the changed Q value. That could result, that I don’t need to do an output() for actual state. because the value teached back is nevertheless overriden and with the labelMask only this is teached back. ChatGPT said no, but the explanation was not really meaningful . I didn’t tried it, because even without that optimization my network is not learning