Is there any example for doing policy gradient calculation with dl4j/rl4j?

RuralHunter · March 9, 2020, 4:34am

I can not find any. Since dl4j embeds the activation function in the layer, it seems difficult to calculate the gradient externally and put it back in dl4j network.

treo · March 9, 2020, 7:21am

There is an example for external errors: https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/externalerrors/MultiLayerNetworkExternalErrors.java

If you don’t want the activation function to modify the output of a layer, you can always use an identity activation.

RuralHunter · March 9, 2020, 7:28am

Yes, but I need an activation function such as softmax to get the probility of each action as the output of the policy network.

treo · March 9, 2020, 7:30am

You can use external errors with softmax too, there should be nothing to prevent you from using a Softmax activation on your last layer.

RuralHunter · March 9, 2020, 7:39am

How is the external error should be calculated in the case of softmax as the last layer?
As the policy gradient algorithm, the gradient should be calculated with the reward, such as reward*log(actionProb). I get the reward from the environment and get the action probility from dl4j network ouput.But to backprop the network with softmax as the final output, I should get the error of the probility right?

treo · March 9, 2020, 7:47am

There are lots of different algorithms for that, so the exact details are going to be different depending on what exactly you want to do.

RuralHunter · March 9, 2020, 8:04am

Thanks for the link. That’s true. My question is how to integrate any of the algorithm into dl4j network. To be more specific, I think those algorithms are all calculating the gradient of the parameters while a dl4j network(in case of using softmax as the output) requires the probility error to do the backprop.

treo · March 9, 2020, 9:22am

If you take a closer look at the external errors example, you will find that it also shows you how to use external gradients. So you don’t even have to calculate an error, if you have the gradients already, you can pass them in directly.

RuralHunter · March 9, 2020, 1:10pm

That would be cool but I can not figure out how to transform the gradient from the algorithm to the gradient dl4j accepts. It is difficult for me to get the required shape of the dl4j gradient. Is there any example?

roman · December 8, 2024, 4:47pm

@RuralHunter hello! Did you managed to create a working example with calculation of gradients pi(s,a) and correcty updating the network with it? If yes, could you please share some simple example or link to such example? Thank you in advance.

agibsonccc · December 9, 2024, 12:08pm

@roman here’s what you’d do with external gradient:
INDArray externalGrad = Nd4j.linspace(1, 12, 12).reshape(3, 4);

    SameDiff sd = SameDiff.create();
    SDVariable var = sd.var("var", externalGrad);
    SDVariable out = var.mul("out", 0.5);

    Map<String, INDArray> gradMap = new HashMap<>();
    gradMap.put("out", externalGrad);
    ExternalErrorsFunction fn = SameDiffUtils.externalErrors(sd, null, out);

    Map<String, INDArray> m = new HashMap<>();
    m.put("out-grad", externalGrad);
    Map<String, INDArray> grads = sd.calculateGradients(m, sd.getVariables().keySet());

    INDArray gradVar = grads.get(var.name());

    assertEquals(externalGrad.mul(0.5), gradVar);

    //Now, update and execute again:
    externalGrad = Nd4j.linspace(1, 12, 12).reshape(3, 4).muli(10);

    m.put("out-grad", externalGrad);
    grads = sd.calculateGradients(m, sd.getVariables().keySet());

    gradVar = var.getGradient().getArr();

roman · December 10, 2024, 2:39am

Thank you @agibsonccc for your reply. Yes, it is now clear how to get gradients from sd, but the question is how to apply this gradient (external gradient) to update (modify) the weights of the variable in that sd? For eg I have gradients for vairable “var” from your example. I can multiply the gradient by leraning rate and then I want to add the gradient to the “var” weights to update them, like in this formula: theta = theta + lerningRate * gradient.
And it is not clear how to do this in sd? I would expect to have something like: var.weights.addi(gradVar.mul(learningRate));
Is it possible somehow?

For eg in tensorflow:

import tensorflow as tf

a = tf.Variable([2.0, 3.0])
print(a.value()) # prints: tf.Tensor([2. 3.], shape=(2,), dtype=float32)

a.assign_add([1,2])
print(a.value()) # prints: tf.Tensor([3. 5.], shape=(2,), dtype=float32)

agibsonccc · December 10, 2024, 6:16am

@roman do you want to do it declaratively or just flat out update the value?

Normally a samediff instance has a fit(…) function that you just call and we do it internally. That includes tracking training iterations, epochs, iterating through data for you,…
If you want to do it manually then you’d just call sd.math().add(gradientOfVariable)
or variable.add(gradient)

roman · December 11, 2024, 2:39am

@agibsonccc yes, but for fit(…) I need to provide labels which I don’t have. I have only gradients. And I want just update weights using this gradients. I Cannot find any method how to do it iteratively in samediff. variable.add(…) just returns another SDVariable which is not in the original SameDiff graph. Does it mean I need to recreate SameDiff graph for each iteration?

In your example above with external gradients could you please explain which line performs the update of weights of variables by gradient? Is it:

sd.calculateGradients(m, sd.getVariables().keySet());

?

agibsonccc · December 11, 2024, 6:34am

@roman see my response to your other question.

Topic		Replies	Views
Question on calculating gradient by external error DL4J	1	395	February 23, 2020
Custom Loss Function and Gradient DL4J	1	274	August 3, 2023
Computing gradient without backpropagation DL4J	4	275	May 31, 2022
Issues about modifying the source code DL4J	64	1563	June 23, 2022
Any example of external errors for SameDiff? SameDiff	0	216	June 6, 2022

Is there any example for doing policy gradient calculation with dl4j/rl4j?

Related topics