I am working with NStepQLearning model but it seems just saving ComputationGraph it is using is not enough because when i restored the model it does not give the same results given before.
Do you guys have any idea how? Or am I missing something?
@marcus.frex could you clarify your issue a bit? Also please note that rl4j was moved to a contrib module recently (1.0.0-M1.1 and newer) in new releases since it’s not heavily maintained.
Generally saving the weights should be enough. It’s hard to tell without knowing more info though.
@agibsonccc Probably I am missing something but let’s say a DQNPolicy trained with a specific QLearning configuration does not give the same score after I save it and load after a specific Epoch.
I noticed that even I use same DQNPolicy and run through the same MDP (Environment) separately it still does not return the same score. I event set target update frequency to 1 but it still does not returns the same score. I event saved MultiLayerNetwork manually but it does not gets the same score after I run again the same dataset.
Have you ever experienced something like that? What do you think am I missing?
Ok, I found what am I missing. It seems that on training course it can create random actions to improve NNs abilitiy. All epoc results should get compared with actual data scores too.