Hello together.
I am new here and new to Neural Networks and new to dl4j. I am interested in Deep Reinforcement Learning, but the structure which used to be with dl4j is gone . But I created my own framework und to test it I implemented TicTacToe and Connect4. Beside that I have a OpenMind Gymnasium and trying to implement a network for Pong (implementation is done, but network is too bad
. All sources can be found here:
The project uses spring-boot and the functionality is accessible via URLs or TestClasses.
But more according to to my Connect4 problem: The game logic is located in the following class (including the network config)
Here is a Connect4 Agent which plays the game okay, based on a AB-Pruning MiniMax Algorithm. The game can be accessed by two modes:
- To train the model
curl --location 'http://localhost:8081/connectfour' \
--header 'Content-Type: application/json' \
--data '{
"startFresh": false,
"saveModel": true,
"saveInterval": null,
"episodes": 3000
}'
The two playing agents need to be changed in code, if you like. Several available agents are commented out in the code (RandomAgent, NextFreeAgent, DoTheFollowingAgent)
- To play interactivly with an agent there are two endpoints. One to can an initial board and a second to make a step. The server is stateless so the state is returned and pushed back to the server, what can be easily implemented in Postman. Code is below.
curl --location 'http://localhost:8081/connectfour/manual' \
--header 'Content-Type: application/json' \
--data '{
"starter": "COMPUTER"
}'
starter can be COMPUTER or HUMAN. If COMPUTER is used, then this endpoint returns the initial board including the first move. If HUMAN is used the board is empty.
I use postman to execute the requests and if you show the response in hex mode, the game field is quite readable.
curl --location 'http://localhost:8081/connectfour/manual/actions/4' \
--header 'Content-Type: application/json' \
--data '{"py":0,"board":[[0,0,0,1,0,0,0],[0,0,0,1,0,0,0],[0,0,0,1,2,0,0],[0,0,0,2,2,0,0],[0,0,0,1,2,0,0],[0,0,0,1,2,0,0]],"done":true,"winner":"HUMAN"}'
The POST body of the actions request is the response of the previous request. With the following code under test scripts that can be recorded:
pm.globals.set("observation", JSON.stringify(pm.response.json()));
and the request body of the actions request can hold the following content:
{{observation}}
If you don’t want to look deep in my code. I have my network configured as following:
int input = new GameState().getFlattenedObservation().length;
int output = GameAction.values().length;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
// .updater(new Adam(0.001))
.updater(RmsProp.builder().learningRate(0.00025).build())
.miniBatch(false)
.weightInit(WeightInit.XAVIER)
.seed(seed)
// .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(1.0)
.list()
.layer(new DenseLayer.Builder()
.nIn(input)
.nOut(128)
.activation(Activation.LEAKYRELU)
.weightInit(WeightInit.XAVIER)
.build())
.layer(new DenseLayer.Builder()
.nIn(128)
.nOut(128)
.activation(Activation.LEAKYRELU)
.weightInit(WeightInit.XAVIER)
.build())
.layer(new DenseLayer.Builder()
.nIn(128)
.nOut(64)
.activation(Activation.LEAKYRELU)
.weightInit(WeightInit.XAVIER)
.build())
.layer(new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
.nIn(64)
.nOut(output)
.activation(Activation.IDENTITY)
.weightInit(WeightInit.XAVIER)
.build())
.build();
When I run the learning, the QLearningAgent wins in the end nearly always, but if I play with the QLearning agent interactive, then it makes pretty dump errors, so that I win the game after 5-7 coins from my side.
EpsilonGreedPolicy, RepeatBuffer and a lot of other stuff is in place or can be enabled.
If you have any suggestions how I can improve, I would be glad . If somebody already has experience with dl4j and gym I would although be pleased to connect.
BR Thomas