Oh it worked! Thank you very much for all your help, I really appreciate it
BTW, what kind of parameters/hyperparameters do you suggest to have a “balanced” observation, are the above mentioned (batch size, no-op warmup and learning rate) good ones to have for execution and running?
Just start with some sane defaults. I guess a batchsize of at least 64 and a learning rate of at most 10^-3 with an L2 of at most one order of magnitude less than the learning rate.
In your case, I don’t think it makes sense for you to have a warm up phase at all. And you probably want to keep epsilon (the rate of random exploration) at 0.1