Not able to train my Connect4 Network

Hello together.

I am new here and new to Neural Networks and new to dl4j. I am interested in Deep Reinforcement Learning, but the structure which used to be with dl4j is gone :frowning: . But I created my own framework und to test it I implemented TicTacToe and Connect4. Beside that I have a OpenMind Gymnasium and trying to implement a network for Pong (implementation is done, but network is too bad :frowning: . All sources can be found here:

The project uses spring-boot and the functionality is accessible via URLs or TestClasses.

But more according to to my Connect4 problem: The game logic is located in the following class (including the network config)

Here is a Connect4 Agent which plays the game okay, based on a AB-Pruning MiniMax Algorithm. The game can be accessed by two modes:

  1. To train the model
curl --location 'http://localhost:8081/connectfour' \
--header 'Content-Type: application/json' \
--data '{
    "startFresh": false,
    "saveModel": true,
    "saveInterval": null,
    "episodes": 3000
}'

The two playing agents need to be changed in code, if you like. Several available agents are commented out in the code (RandomAgent, NextFreeAgent, DoTheFollowingAgent)

  1. To play interactivly with an agent there are two endpoints. One to can an initial board and a second to make a step. The server is stateless so the state is returned and pushed back to the server, what can be easily implemented in Postman. Code is below.
curl --location 'http://localhost:8081/connectfour/manual' \
--header 'Content-Type: application/json' \
--data '{
    "starter": "COMPUTER"
}'

starter can be COMPUTER or HUMAN. If COMPUTER is used, then this endpoint returns the initial board including the first move. If HUMAN is used the board is empty.
I use postman to execute the requests and if you show the response in hex mode, the game field is quite readable.

curl --location 'http://localhost:8081/connectfour/manual/actions/4' \
--header 'Content-Type: application/json' \
--data '{"py":0,"board":[[0,0,0,1,0,0,0],[0,0,0,1,0,0,0],[0,0,0,1,2,0,0],[0,0,0,2,2,0,0],[0,0,0,1,2,0,0],[0,0,0,1,2,0,0]],"done":true,"winner":"HUMAN"}'

The POST body of the actions request is the response of the previous request. With the following code under test scripts that can be recorded:

pm.globals.set("observation", JSON.stringify(pm.response.json()));

and the request body of the actions request can hold the following content:

{{observation}}

If you don’t want to look deep in my code. I have my network configured as following:

            int input = new GameState().getFlattenedObservation().length;
            int output = GameAction.values().length;
            MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
//                    .updater(new Adam(0.001))
                    .updater(RmsProp.builder().learningRate(0.00025).build())
                    .miniBatch(false)
                    .weightInit(WeightInit.XAVIER)
                    .seed(seed)
//                    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                    .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                    .gradientNormalizationThreshold(1.0)
                    .list()
                    .layer(new DenseLayer.Builder()
                            .nIn(input)
                            .nOut(128)
                            .activation(Activation.LEAKYRELU)
                            .weightInit(WeightInit.XAVIER)
                            .build())
                    .layer(new DenseLayer.Builder()
                            .nIn(128)
                            .nOut(128)
                            .activation(Activation.LEAKYRELU)
                            .weightInit(WeightInit.XAVIER)
                            .build())
                    .layer(new DenseLayer.Builder()
                            .nIn(128)
                            .nOut(64)
                            .activation(Activation.LEAKYRELU)
                            .weightInit(WeightInit.XAVIER)
                            .build())
                    .layer(new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
                            .nIn(64)
                            .nOut(output)
                            .activation(Activation.IDENTITY)
                            .weightInit(WeightInit.XAVIER)
                            .build())
                    .build();

When I run the learning, the QLearningAgent wins in the end nearly always, but if I play with the QLearning agent interactive, then it makes pretty dump errors, so that I win the game after 5-7 coins from my side.
EpsilonGreedPolicy, RepeatBuffer and a lot of other stuff is in place or can be enabled.
If you have any suggestions how I can improve, I would be glad :slight_smile: . If somebody already has experience with dl4j and gym I would although be pleased to connect.

BR Thomas

@thneeb yeah unfortunately the framework’s main use case is training standard networks or running model inference within microservices. A lot of the other code like automl and the like just didn’t have a big enough user base to justify continued maintenance.
For training you should look in to monitoring the networks training over time. Some networks take a while.
One thing I see you’re doing is turning minibatch mode off. For games you shouldn’t be doing that unless you’re fitting on the whole data set at once.