A3C training on asynchronous game

Hi all, I’m getting to grips with the concept of neural nets and would like to apply RL4J to a game we have running in our lab. The 2D game is grid based, with a server calling clients/players asynchronously through a method UnitMove nextUnitMove(UnitMoveInput input). This method passes the local environment of the unit, so we can calculate a UnitMove, which is then returned to the server so it can execute the move.

I’m basing myself on this example: https://github.com/eclipse/deeplearning4j-examples/blob/master/rl4j-examples/src/main/java/org/deeplearning4j/rl4j/examples/advanced/cartpole/A3CCartpole.java . I understood that you have to implement a custom MDP, but what is not clear to me yet is how I integrate this in our current async setup. Ideally, a solution would be implemented that returns a UnitMove each call, while training the network.

I guess my problem lies in the fact that in step(Integer action) I cannot execute the move and validate the ‘before’ and ‘after’ state of the world, as the action has to be returned to the server first. Only when the next call for that specific unit arrives, we’d know about the impact of the action (the ‘after’ state). Is this a common problem, for which a standard methodology exists? Or do I have to rethink the training procedure?

I’m not sure i understand you situation.
A conventional mdp processes insanely simplified in pseudocode
loop{
action= policy_network.output(old_observ)
[new_obs, reward, done, info] = enviroment.step(action)
}

your policy is your client and your enviroment would be your server im guessing?

Yes, policy@client and environment@server. The server calculates and approves moves. When the next move is required from the client, the server asks for it (and passes some UnitMoveInput context), after which the client returns the new move.

The loop is simple, as you say, perhaps I’m staring myself blind on the A3CDiscreteDense.train() method, which is hard to reconcile with my situation (train() drives the game loop, whereas in my case the server drives the loop).

The RLJ4 is written like production code and is rather dens for a newbie(which I am as well). My 2 cents is your gonna have to write something yourself and that whatever you do write on the client side should integrate with standard RL environments like the GYM API so you can benchmark your results.If im not mistaken a3c would take 40 million frames to converge on an atari game. Not sure how complex yours is.

Here is my github repo where I am working on a VPG and DDQN if you want to have a look. Not 100% done but they may give you some insight

Not as good as the RL4J but they are a single file so its much easier to understand.

All right, thanks! I’ll have a look at your code, I’ve got some learning to do :).