Out of Memory Error

I am using RL4J QLearning and i constantly get an out of memory error. This is what my QLearning Configuration looks like, the memory leak doesn’t come from my application, I’ve tried giving it a ton of memory, tried WorkspaceConfigurations and yeah now I ran out of ideas:

public static QLearning.QLConfiguration TETRIS_QL =
            new QLearning.QLConfiguration( 
                    123,    //Random seed
                    10000,    //Max step By epoch
                    50000000, //Max step
                    5000, //Max size of experience replay
                    32,     //size of batches
                    20,    //target update (hard)
                    10,     //num step noop warmup
                    0.1,   //reward scaling
                    0.99,   //gamma
                    1.0,    //td-error clipping
                    0.1f,   //min epsilon
                    1000,   //num step for eps greedy anneal
                    false    //double DQN
            );	
Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate new DoublePointer(14): totalBytes = 837K, physicalBytes = 8224M
	at org.bytedeco.javacpp.DoublePointer.<init>(DoublePointer.java:76)
	at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:705)
	at org.nd4j.linalg.api.buffer.DoubleBuffer.<init>(DoubleBuffer.java:48)
	at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.create(DefaultDataBufferFactory.java:288)
	at org.nd4j.linalg.factory.Nd4j.getDataBuffer(Nd4j.java:1492)
	at org.nd4j.linalg.factory.Nd4j.createTypedBuffer(Nd4j.java:1502)
	at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:344)
	at org.nd4j.linalg.factory.BaseNDArrayFactory.create(BaseNDArrayFactory.java:1350)
	at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3547)
	at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3391)
	at org.deeplearning4j.rl4j.util.LegacyMDPWrapper.getInput(LegacyMDPWrapper.java:106)
	at org.deeplearning4j.rl4j.util.LegacyMDPWrapper.step(LegacyMDPWrapper.java:68)
	at org.deeplearning4j.rl4j.learning.sync.qlearning.discrete.QLearningDiscrete.trainStep(QLearningDiscrete.java:146)
	at org.deeplearning4j.rl4j.learning.sync.qlearning.QLearning.trainEpoch(QLearning.java:124)
	at org.deeplearning4j.rl4j.learning.sync.SyncLearning.train(SyncLearning.java:96)
	at tetris.TetrisWindow.train(TetrisWindow.java:174)
	at tetris.TetrisWindow.init(TetrisWindow.java:101)
	at tetris.TetrisWindow.run(TetrisWindow.java:60)
	at tetris.TetrisWindow.<init>(TetrisWindow.java:56)
	at tetris.TetrisWindow.main(TetrisWindow.java:52)
Caused by: java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (8224M) > maxPhysicalBytes (8192M)
	at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:659)
	at org.bytedeco.javacpp.Pointer.init(Pointer.java:127)
	at org.bytedeco.javacpp.DoublePointer.allocateArray(Native Method)
	at org.bytedeco.javacpp.DoublePointer.<init>(DoublePointer.java:68)
	... 19 more

How does the network you’ve configured for it look like? The QLearning configuration on its own isn’t enough to tell you anything.

Thanks for the quick reply! It probably is just an error that I made but I can’t seem to find it. I am pretty new to DL. Here is everything that I use for my QLearning Neural Network. I am using version 1.0.0-beta6 and OpenJDK 12.0.2.

public class TetrisEnvironment implements MDP<TetrisState, Integer, DiscreteSpace> {
	
	private TetrisWindow window;
	private DeferredRenderer renderer;
	
	private DiscreteSpace discreteSpace;
	private ObservationSpace<TetrisState> observationSpace;
	
	private TetrisGame tetrisGame;
	private List<Integer> actions = new ArrayList<Integer>();
	private boolean pause;
	
	public TetrisEnvironment(TetrisWindow window, DeferredRenderer renderer, boolean pause) {
		this.tetrisGame = new TetrisGame(window, renderer);
		this.window = window;
		this.renderer = renderer;
		this.pause = pause;
		
		if (renderer != null) {
			tetrisGame.initGUI(renderer.getGUIRootComponent());
		}
		
		// the different amount of outputs
		this.discreteSpace = new DiscreteSpace(4);
		this.observationSpace = new ArrayObservationSpace<TetrisState>(new int[] { 14 });
	}

	@Override
	public ObservationSpace<TetrisState> getObservationSpace() {
		return observationSpace;
	}

	@Override
	public DiscreteSpace getActionSpace() {
		return discreteSpace;
	}

	@Override
	public TetrisState reset() {
		tetrisGame.reset();
		actions.clear();
		
		System.out.println(("free Memory: " + Pointer.formatBytes(Pointer.availablePhysicalBytes()) + "/"
                        + Pointer.formatBytes(Pointer.totalPhysicalBytes())));
		
		return new TetrisState(tetrisGame);
	}

	@Override
	public void close() {
	}

	@Override
	public StepReply<TetrisState> step(Integer action) {
		double reward = tetrisGame.makeAction(action);
		actions.add(action);
		StepReply<TetrisState> reply = new StepReply<TetrisState>(new TetrisState(tetrisGame), reward, tetrisGame.lost, null);
		
		if (renderer != null) {
			renderer.render();
			window.getWindow().swapBuffers();
			window.getWindow().update();
			window.getWindow().pollEvents();
			
			if (pause) {
				try {
					Thread.sleep(50);
				} catch (InterruptedException e) {
					e.printStackTrace();
				}
			}
		}
		
		return reply;
	}

	@Override
	public boolean isDone() {
		return tetrisGame.lost;
	}

	@Override
	public MDP<TetrisState, Integer, DiscreteSpace> newInstance() {
		return new TetrisEnvironment(window, renderer, pause);
	}
	
	public List<Integer> getActions() {
		return actions;
	}

}
public class TetrisState implements Encodable {
	
	private double[] data;
	
	public TetrisState(TetrisGame tetris) {
		int index = 0;
		
		for (int i = 0; i < tetris.tiles.length; i++) {
			if (tetris.tiles[i] == tetris.currentTile) {
				index = i;
			}
		}

		data = new double[4 + tetris.playfield.length];
		data[0] = index / (double) tetris.tiles.length;
		data[1] = tetris.tileVariation / 4.0;
		data[2] = tetris.tilePosX / (double) tetris.playfield.length;
		data[3] = tetris.tilePosY / (double) tetris.playfield[0].length;
		
		for (int x = 0; x < tetris.playfield.length; x++) {
			int height = tetris.playfield[x].length;
			
			for (int y = 0; y < tetris.playfield[x].length; y++) {
				if (tetris.playfield[x].length != 0) {
					height = y;
					break;
				}
			}
			
			data[4 + x] = height / (double) tetris.playfield[0].length;
		}
	}

	@Override
	public double[] toArray() {
		return data;
	}
	
}

and this is how I train the network:

public static QLearning.QLConfiguration TETRIS_QL =
            new QLearning.QLConfiguration( 
                    123,    //Random seed
                    10000,    //Max step By epoch
                    50000000, //Max step
                    5000, //Max size of experience replay
                    32,     //size of batches
                    20,    //target update (hard)
                    10,     //num step noop warmup
                    0.1,   //reward scaling
                    0.99,   //gamma
                    1.0,    //td-error clipping
                    0.1f,   //min epsilon
                    1000,   //num step for eps greedy anneal
                    true    //double DQN
            );

	public static DQNFactoryStdDense.Configuration TETRIS_NET =
	        DQNFactoryStdDense.Configuration.builder()
	            .updater(new Adam(0.001)).numHiddenNodes(32).numLayer(9).build();

	private void train() {
        try {
	        // define the mdp from gym (name, render)
	        TetrisEnvironment mdp = new TetrisEnvironment(this, renderer, false);
	        // define the training
	        QLearningDiscreteDense<TetrisState> dql = new QLearningDiscreteDense<TetrisState>(mdp, TETRIS_NET, TETRIS_QL);
	
	        // train
	        dql.train();
	
	        // get the final policy
	        DQNPolicy<TetrisState> pol = dql.getPolicy();
	
	        // serialize and save (serialization showcase, but not required)
			pol.save("pol1");
			
	        // close the mdp 
	        mdp.close();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

How large is your playing field?

Other than that, I don’t see anything that looks like an obvious reason why it would use that much memory.

But, you should try using Snapshots, as there was quite a lot of work on RL4J since the last release, and maybe you are running into something that was already fixed.

It is 10 so the arrays size is 14. I will give it a try and will let you know if it worked :slight_smile:

It seems like that the Snapshot has fixed the issue. So I guess it was a memory leak of some kind beforehands. Thanks for the quick help :slight_smile:

Nevermind. It reduced the memory leak but it’s not gone. Still throwing out of memory after 1 hour of training.

Does it also happen with ALE and Cartpole? Or only with your custom environment?

I tested it with a completely empty environment. So apparently I messed up some stuff in my OpenGL renderer which allocated native memory and I forgot to free it. My bad. Sorry for wasting your guys time. Thanks for the fast help anyways, much appreciated!