Trying to use QLearning in a custom MDP environment. Chooses action 0 every time, despite the heavy negative reward

ErblinIsaku · May 6, 2021, 6:11pm

Oh it worked! Thank you very much for all your help, I really appreciate it

BTW, what kind of parameters/hyperparameters do you suggest to have a “balanced” observation, are the above mentioned (batch size, no-op warmup and learning rate) good ones to have for execution and running?

treo · May 6, 2021, 6:15pm

Just start with some sane defaults. I guess a batchsize of at least 64 and a learning rate of at most 10^-3 with an L2 of at most one order of magnitude less than the learning rate.

In your case, I don’t think it makes sense for you to have a warm up phase at all. And you probably want to keep epsilon (the rate of random exploration) at 0.1

ErblinIsaku · May 6, 2021, 6:22pm

I understand, thank you again for your support!
All the best!

Topic		Replies	Views
Custom Loss Function and Gradient DL4J	1	196	August 3, 2023
Trained neural network hangs up DL4J	17	295	September 23, 2022
Custom layer: Cannot perform forward pass in layer: input field is not set	0	321	March 17, 2021
Reproducibility question DL4J	4	336	May 29, 2021
After update the gradients, model output all zeros for whatever inputs? DL4J	0	155	November 27, 2022

Trying to use QLearning in a custom MDP environment. Chooses action 0 every time, despite the heavy negative reward

Related Topics