Is there any example for doing policy gradient calculation with dl4j/rl4j?

I can not find any. Since dl4j embeds the activation function in the layer, it seems difficult to calculate the gradient externally and put it back in dl4j network.

There is an example for external errors: https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/externalerrors/MultiLayerNetworkExternalErrors.java

If you don’t want the activation function to modify the output of a layer, you can always use an identity activation.

Yes, but I need an activation function such as softmax to get the probility of each action as the output of the policy network.

You can use external errors with softmax too, there should be nothing to prevent you from using a Softmax activation on your last layer.

How is the external error should be calculated in the case of softmax as the last layer?
As the policy gradient algorithm, the gradient should be calculated with the reward, such as reward*log(actionProb). I get the reward from the environment and get the action probility from dl4j network ouput.But to backprop the network with softmax as the final output, I should get the error of the probility right?

There are lots of different algorithms for that, so the exact details are going to be different depending on what exactly you want to do.

Thanks for the link. That’s true. My question is how to integrate any of the algorithm into dl4j network. To be more specific, I think those algorithms are all calculating the gradient of the parameters while a dl4j network(in case of using softmax as the output) requires the probility error to do the backprop.

If you take a closer look at the external errors example, you will find that it also shows you how to use external gradients. So you don’t even have to calculate an error, if you have the gradients already, you can pass them in directly.

That would be cool but I can not figure out how to transform the gradient from the algorithm to the gradient dl4j accepts. It is difficult for me to get the required shape of the dl4j gradient. Is there any example?

@RuralHunter hello! Did you managed to create a working example with calculation of gradients pi(s,a) and correcty updating the network with it? If yes, could you please share some simple example or link to such example? Thank you in advance.

@roman here’s what you’d do with external gradient:
INDArray externalGrad = Nd4j.linspace(1, 12, 12).reshape(3, 4);

    SameDiff sd = SameDiff.create();
    SDVariable var = sd.var("var", externalGrad);
    SDVariable out = var.mul("out", 0.5);

    Map<String, INDArray> gradMap = new HashMap<>();
    gradMap.put("out", externalGrad);
    ExternalErrorsFunction fn = SameDiffUtils.externalErrors(sd, null, out);

    Map<String, INDArray> m = new HashMap<>();
    m.put("out-grad", externalGrad);
    Map<String, INDArray> grads = sd.calculateGradients(m, sd.getVariables().keySet());

    INDArray gradVar = grads.get(var.name());

    assertEquals(externalGrad.mul(0.5), gradVar);

    //Now, update and execute again:
    externalGrad = Nd4j.linspace(1, 12, 12).reshape(3, 4).muli(10);

    m.put("out-grad", externalGrad);
    grads = sd.calculateGradients(m, sd.getVariables().keySet());

    gradVar = var.getGradient().getArr();

Thank you @agibsonccc for your reply. Yes, it is now clear how to get gradients from sd, but the question is how to apply this gradient (external gradient) to update (modify) the weights of the variable in that sd? For eg I have gradients for vairable “var” from your example. I can multiply the gradient by leraning rate and then I want to add the gradient to the “var” weights to update them, like in this formula: theta = theta + lerningRate * gradient.
And it is not clear how to do this in sd? I would expect to have something like: var.weights.addi(gradVar.mul(learningRate));
Is it possible somehow?

For eg in tensorflow:

import tensorflow as tf

a = tf.Variable([2.0, 3.0])
print(a.value()) # prints: tf.Tensor([2. 3.], shape=(2,), dtype=float32)

a.assign_add([1,2])
print(a.value()) # prints: tf.Tensor([3. 5.], shape=(2,), dtype=float32)

@roman do you want to do it declaratively or just flat out update the value?

Normally a samediff instance has a fit(…) function that you just call and we do it internally. That includes tracking training iterations, epochs, iterating through data for you,…
If you want to do it manually then you’d just call sd.math().add(gradientOfVariable)
or variable.add(gradient)

@agibsonccc yes, but for fit(…) I need to provide labels which I don’t have. I have only gradients. And I want just update weights using this gradients. I Cannot find any method how to do it iteratively in samediff. variable.add(…) just returns another SDVariable which is not in the original SameDiff graph. Does it mean I need to recreate SameDiff graph for each iteration?

In your example above with external gradients could you please explain which line performs the update of weights of variables by gradient? Is it:

sd.calculateGradients(m, sd.getVariables().keySet());

?

@roman see my response to your other question.

1 Like