My Network isn't Learning anything and A3C keeps crashing

I am trying to teach a network how to play snake. I am using 1.0.0-beta6
I have used

  • QLearning

  • Async QLearning

  • Actor-Critic

None of these systems have had any luck. After millions of steps on Async QLearning my snake has not learned a single thing and still runs straight into walls.

Here is how my network is setup
I pass a 256 double array that represents my snake board
0=free space
.25 = body
.3+direction=head (.1<=direction<=.4)

I then have my neural network choose 1,2,3 forward,left,right.

My Reward function is

 if(!snake.inGame()) {
	 return -1.0; //dies
 if(snake.gotApple()) {
	 return 1.0; //got apple
     //return 5.0+.37*(snake.getLength());
 return  0.0; //doesn't die or get apple this move

My Async network is setup like so

 public static AsyncNStepQLearningDiscrete.AsyncNStepQLConfiguration TOY_ASYNC_QL =
            new AsyncNStepQLearningDiscrete.AsyncNStepQLConfiguration(
                    123,        //Random seed
                    400,     //Max step By epoch
                    5000000,      //Max step
                    16,          //Number of threads
                    25,          //t_max
                    10,        //target update (hard)
                    10,          //num step noop warmup
                    0.01,        //reward scaling
                    0.98,       //gamma
                    10.0,       //td-error clipping
                    0.15f,       //min epsilon
                    30000        //num step for eps greedy anneal
    public static DQNFactoryStdDense.Configuration MALMO_NET = DQNFactoryStdDense.Configuration.builder().l2(0.001)
                    .updater(new Adam(0.0025)).numHiddenNodes(64).numLayer(3).build();

I can’t figure out how to get this thing to work is one of my settings wrong or something.

Addtionally and probably more frustratingly is the fact that whenever I try to use A3C it crashes after maybe like 10000 steps saying

Exception in thread "Thread-8" java.lang.RuntimeException: Output from network is not a probability distribution: [[         ?,         ?,         ?]]
at org.deeplearning4j.rl4j.policy.ACPolicy.nextAction(
at org.deeplearning4j.rl4j.policy.ACPolicy.nextAction(
at org.deeplearning4j.rl4j.learning.async.AsyncThreadDiscrete.trainSubEpoch(
at org.deeplearning4j.rl4j.learning.async.AsyncThread.handleTraining(

Is there any reason why this is happening? I have tried tweaking the network and my reward function to no avail to stop this error.

Deep Learning seems like such a fun thing to explore but when I can’t even get the library to work it becomes very frustrating.

I will appreciate any help that ya’ll can offer.

Seems a NaN problem.
I think the question marks (?) mean your output is too small at a certain point in the training.
You need to make sure your net doesn’t go in Arithmetic Underflow (

Have you tried to adjust the regularization?

1 Like

There are a few things that look pretty fishy to me:

This means that over 30000 steps it will reduce the chance of doing something randomly from 100% to 15%. Given your configuration, this means it will take at least 75 rounds of 400 steps to get down to those 15%, and when your snake is close to the edge or close to itself, 15% random is still quite huge, and might result in it crashing into the next best wall, so it will likely be more than those 75 rounds before it even gets that low.

As @StoicProgrammer has pointed out, and as I’ve already told you on stack overflow, the question marks are NaNs. You should take a look at the training scores , if they go very close to 0 or if they get very high, it is probably just an numeric under/overflow.