A3C training on asynchronous game

CountZukula · August 26, 2020, 5:09pm

Hi all, I’m getting to grips with the concept of neural nets and would like to apply RL4J to a game we have running in our lab. The 2D game is grid based, with a server calling clients/players asynchronously through a method UnitMove nextUnitMove(UnitMoveInput input). This method passes the local environment of the unit, so we can calculate a UnitMove, which is then returned to the server so it can execute the move.

I’m basing myself on this example: https://github.com/eclipse/deeplearning4j-examples/blob/master/rl4j-examples/src/main/java/org/deeplearning4j/rl4j/examples/advanced/cartpole/A3CCartpole.java . I understood that you have to implement a custom MDP, but what is not clear to me yet is how I integrate this in our current async setup. Ideally, a solution would be implemented that returns a UnitMove each call, while training the network.

I guess my problem lies in the fact that in step(Integer action) I cannot execute the move and validate the ‘before’ and ‘after’ state of the world, as the action has to be returned to the server first. Only when the next call for that specific unit arrives, we’d know about the impact of the action (the ‘after’ state). Is this a common problem, for which a standard methodology exists? Or do I have to rethink the training procedure?

cagneymoreau · August 27, 2020, 12:56am

I’m not sure i understand you situation.
A conventional mdp processes insanely simplified in pseudocode
loop{
action= policy_network.output(old_observ)
[new_obs, reward, done, info] = enviroment.step(action)
}

your policy is your client and your enviroment would be your server im guessing?

CountZukula · August 27, 2020, 6:16am

Yes, policy@client and environment@server. The server calculates and approves moves. When the next move is required from the client, the server asks for it (and passes some UnitMoveInput context), after which the client returns the new move.

The loop is simple, as you say, perhaps I’m staring myself blind on the A3CDiscreteDense.train() method, which is hard to reconcile with my situation (train() drives the game loop, whereas in my case the server drives the loop).

cagneymoreau · August 27, 2020, 10:08pm

The RLJ4 is written like production code and is rather dens for a newbie(which I am as well). My 2 cents is your gonna have to write something yourself and that whatever you do write on the client side should integrate with standard RL environments like the GYM API so you can benchmark your results.If im not mistaken a3c would take 40 million frames to converge on an atari game. Not sure how complex yours is.

Here is my github repo where I am working on a VPG and DDQN if you want to have a look. Not 100% done but they may give you some insight

Not as good as the RL4J but they are a single file so its much easier to understand.

CountZukula · August 28, 2020, 7:38am

All right, thanks! I’ll have a look at your code, I’ve got some learning to do :).

Topic		Replies	Views
Custom Loss Function and Gradient DL4J	1	279	August 3, 2023
Simplified example RL4J	2	2299	June 16, 2020
DL4J to learn the task execution steps DL4J	5	385	October 13, 2020
My Network isn't Learning anything and A3C keeps crashing RL4J	5	1215	December 20, 2020
Migrating RL4J to DL4J Contrib RL4J	2	747	March 3, 2022

A3C training on asynchronous game

Related topics