Encodable and ObservationSpace

berse2212 · January 17, 2021, 2:42pm

Hi,

I am fairly new to rl4j and reinforcement learning in general. I tried to implement a very simple MDP (Snake) to gain more experience. However since the documentation is still WOP I have a problem comprehending how exactly the ObservationSpace works.

I created an Encodable to represent the Field of snake:

public class SnakeObservationSpace implements Encodable {

    private final Snake snake;

    @Override
    public double[] toArray() {
        // Deprecated method.. not implemented
        return null;
    }

    @Override
    public boolean isSkipped() {
        return false;
    }

    @Override
    public INDArray getData() {
        return Nd4j.createFromArray(snake.getField());
    }

    @Override
    public Encodable dup() {
        return new SnakeObservationSpace(snake);
    }
}

snake.field is basically a two dimensional array representing the field. But now I am kinda stuck since the MDP interface also wants me to return an ObservationSpace. In the examples I have seen the use of the ArrayObservationSpace but I am unsure what exactly I am supposed to do.

Can someone explain to me how these two elements work with each other and what exactly I am supposed to do? Thanks a lot in advance!

treo · January 17, 2021, 5:16pm

For a snake game that somewhat works, you can take a look at https://github.com/treo/rl4j-snake

I started that as an example for RL4J, but didn’t have the time to actually finish it.

Anyway, the ObservationSpace tells RL4J what shape the observation it is going to get will have.

In that example I’m using a simple one dimensional array for the observation, as I’m feeding a simple MLP type neural network with a very specialized observation.

In principle the ObservationSpace is all about the shape of the possible observations. The Encodable is the observation, and therefore what you’ve implemented is not the ObservationSpace.

Your observation space would still be an ArrayObservationSpace and the array specifies the shape, i.e. the size in each dimension, of your snake field.

berse2212 · January 17, 2021, 5:52pm

First of all thank you for your fast response! Fun fact: I already had a look at your project before, but I wanted to try to implement it with the 2D Matrix.

To create the Observation Space I tried this:

observationSpace = new ArrayObservationSpace<>(new int[]{10, 10});

But it gives me this exception:

Exception in thread "main" org.deeplearning4j.exception.DL4JInvalidInputException: Input that is not a matrix; expected matrix (rank 2), got rank 3 array with shape [1, 10, 10]. Missing preprocessor or wrong input type? (layer name: layer0, layer index: 0, layer type: DenseLayer)

I also wondered if there is advantage of encoding it as a 2D Matrix since I also could just transform it to a one dimensional Array?

treo · January 17, 2021, 6:10pm

You get that because the neural network you are training isn’t expecting a matrix. You are probably using the Dense builder there. In order for it to be able to deal with a matrix you will need the CNN variant.

Is there an advantage to looking at an image as a rectangle over looking at the pixels as a single line?

Your matrix essentially represents a picture, so yes there can be an advantage.

berse2212 · January 17, 2021, 7:40pm

You get that because the neural network you are training isn’t expecting a matrix. You are probably using the Dense builder there. In order for it to be able to deal with a matrix you will need the CNN variant.

Yes you absolutely right. I will have a closer look at CNNs to fully understand how to use them.

I have one last question about the ObservationSpace: What exactly are high and low for? In the ArrayObservationSpace they are basically just and Array with one Column:

low = Nd4j.create(1);
high = Nd4j.create(1);

treo · January 17, 2021, 8:40pm

If I remember correctly they are about the highest and lowest values that are valid for the observation space, but I can be wrong on this.

I’m not entirely sure if they are actually used somewhere or not.

Topic		Replies	Views
Issue with nextAction(Observation observation) on beta7 RL4J	2	808	June 3, 2020
Trying to use QLearning in a custom MDP environment. Chooses action 0 every time, despite the heavy negative reward RL4J	22	1519	May 6, 2021
A3C training on asynchronous game RL4J	4	942	August 28, 2020
LSTM output with open memory workspace exception DL4J	2	418	July 23, 2020
OutOfMemoryError ND4J	10	592	May 4, 2020

Encodable and ObservationSpace

Related topics