Hi all, I’m getting to grips with the concept of neural nets and would like to apply RL4J to a game we have running in our lab. The 2D game is grid based, with a server calling clients/players asynchronously through a method
UnitMove nextUnitMove(UnitMoveInput input). This method passes the local environment of the unit, so we can calculate a
UnitMove, which is then returned to the server so it can execute the move.
I’m basing myself on this example: https://github.com/eclipse/deeplearning4j-examples/blob/master/rl4j-examples/src/main/java/org/deeplearning4j/rl4j/examples/advanced/cartpole/A3CCartpole.java . I understood that you have to implement a custom MDP, but what is not clear to me yet is how I integrate this in our current async setup. Ideally, a solution would be implemented that returns a
UnitMove each call, while training the network.
I guess my problem lies in the fact that in
step(Integer action) I cannot execute the move and validate the ‘before’ and ‘after’ state of the world, as the action has to be returned to the server first. Only when the next call for that specific unit arrives, we’d know about the impact of the action (the ‘after’ state). Is this a common problem, for which a standard methodology exists? Or do I have to rethink the training procedure?