Need a neural network configuration for pong / Atari games

thneeb · February 3, 2025, 8:29am

I tried a lot of different WeightInits and Activation functions, but I don’t get the network to learn. The gradients are from the initialization in my opinion very small 1e-3 to 1e-5. The starting q-values are okay between +1 and -1 but during training they are getting nearer and nearer to ±0 (1e-3 to 1e-5). Insert to the network are gray scale 1 byte images, with are scaled from 0-255 to 0-1 values.

            MultiLayerNetwork model = new MultiLayerNetwork(new NeuralNetConfiguration.Builder()
                    .seed(seed)
                    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                    .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                    .gradientNormalizationThreshold(4.0)
                    .updater(new Adam(0.00025))
                    .list()
                    .layer(new ConvolutionLayer.Builder(8, 8)
                            .stride(4, 4)
                            .nIn(CHANNELS)
                            .nOut(32)
                            .weightInit(WeightInit.RELU_UNIFORM)
                            .activation(Activation.LEAKYRELU)
                            .build())
//                    .layer(new BatchNormalization())
                    .layer(new ConvolutionLayer.Builder(4, 4)
                            .stride(2, 2)
                            .nOut(64)
                            .weightInit(WeightInit.RELU_UNIFORM)
                            .activation(Activation.LEAKYRELU)
                            .build())
//                    .layer(new BatchNormalization())
                    .layer(new ConvolutionLayer.Builder(3, 3)
                            .stride(1, 1)
                            .nOut(64)
                            .weightInit(WeightInit.RELU_UNIFORM)
                            .activation(Activation.LEAKYRELU)
                            .build())
//                    .layer(new BatchNormalization())
                    .layer(new DenseLayer.Builder()
                            .nOut(512)
                            .weightInit(WeightInit.XAVIER)
                            .activation(Activation.LEAKYRELU)
                            .build())
                    .layer(new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
                            .nOut(GameAction.values().length) // Ausgabe (z.B. 10 Klassen)
                            .activation(Activation.IDENTITY) // No transformation
                            .weightInit(WeightInit.NORMAL)
                            .build())
                    .setInputType(InputType.convolutional(HEIGHT, WIDTH, CHANNELS)) // Input-Shape definieren
                    .build());
            model.init();

If I comment in the BatchNormatization layers, the the starting q values are much higher and escalate after a couple 1000 iterations.
I talked a lot about the Network configuration with ChatGPT. The discussion goes forward and backward and in the end tried all his advises, but nothing made it better.
Q value calculation I already verified with other little games like ConnectFour and TicTacToe.
Can you give me an advise, how I can optimize the Network configuration.

And I have another question in that context.
For Q-Value update I first do an output() with the actual state and then an output with the next state. Getting the max q-Value for the next state. Calculate the target value like:

double target = transition.getReward() + q * getGamma();

Then I replace in the Q-Vaue of the taken action in the current state and teach this vector back. All like it is in the literature.
But I wonder, if I can use the “labelMask” and just teaching back the changed Q value. That could result, that I don’t need to do an output() for actual state. because the value teached back is nevertheless overriden and with the labelMask only this is teached back. ChatGPT said no, but the explanation was not really meaningful . I didn’t tried it, because even without that optimization my network is not learning

agibsonccc · February 3, 2025, 10:44am

Your gradients matter. Make sure that the movements aren’t too big or it will just swing back and forth. If you need a breakdown ask it about regularization and make sure your input data is scaled in some way. Either 0 to 1 or zero mean unit variance. Your numbers being too big can also cause inconsistent learning.

thneeb · February 3, 2025, 1:27pm

I already encountered, that the input data is relevant. I already scaled it between 0 and 1 and the scoring output became better. Do you have a clue, why my grandients are so small. As you wrote grandients do matter. I already tried with different weightInit (RELU, RELU_UNIFROM, XAVIER and so on), but it didn’t help really.

thneeb · February 4, 2025, 5:41pm

I continued analyzing my code and the results further. I pretty often get negative q-values back. If the maxq-value is negative, then "gamma * max(q-value) " makes the value greater instead of lower (discount of future rewards). That leads to a wrong learning. How can I prevent, that I get negative value with ouput. I tried to make a Transforms.max(output, 0), but that leads to the fact that the output contains just 0 and the maxIndex is always the first one. Is there a possibility to get better weights, that I don’t receive negatve values as Q-Values?

treo · February 5, 2025, 8:35am

You can use an activation function that will only return positive values, e.g. RelU does that.

thneeb · February 5, 2025, 9:24am

As you can see in my model above, I am using LeakyRELU on all layers except for the last one, because I had the problem of dead neurons. LeakyRELU should as well just return positive values (really small ones instead of zero, can be adjusted by the alpha parameter (DEFAULT_ALPHA = 0.01). On the output level I use identity because I want to feed back the q values for the action, which I have not taken as they are. I tried with LeakyRELU on this layer as well - I still get negative values.
I think as well, that that should not be the case. Can anybody tell me, why that happens?

treo · February 5, 2025, 9:59am

You were asking what you can do about making the output be positive only: And Relu being max(0, x) does exactly that. So if you want to enforce that on the output you’ll want to use Relu as the activation function on your output.

The leaky variant, will give you both positive and negative numbers, as it will be identity for positive numbers (just like relu) and a small factor for negative numbers but a negative value times a positive factor will still be a negative number.

Topic		Replies	Views
Simple CNN predicts NaNs DL4J	3	501	November 2, 2020
Custom Loss Function and Gradient DL4J	1	274	August 3, 2023
Wrong input size - expected matrix DL4J	2	1364	June 1, 2020
scoreExamples and outputSingle output shape DL4J	0	31	June 20, 2024
Layers is null for a MultiLayerNetwork object Tuning Help	0	282	February 3, 2022

Need a neural network configuration for pong / Atari games

Related topics