Using gradient as an intermediate SDVariable

How is it possible to use gradient as intermediate variable in SameDiff?

Problem: I need to have SameDiff layer that has gradient of previous layers as its input, currently it does not seem to be possible due to

Exception in thread "main" java.lang.IllegalStateException
	at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:253)
	at org.nd4j.autodiff.util.SameDiffUtils.validateDifferentialFunctionSameDiff(SameDiffUtils.java:134)
	at org.nd4j.linalg.api.ops.BaseReduceOp.<init>(BaseReduceOp.java:85)
	at org.nd4j.linalg.api.ops.BaseReduceOp.<init>(BaseReduceOp.java:114)

Code sample of the issue:

package com.valb3r.idr.networks;

import org.nd4j.autodiff.samediff.SDVariable;
import org.nd4j.autodiff.samediff.SameDiff;
import org.nd4j.linalg.api.buffer.DataType;
import org.nd4j.weightinit.impl.XavierInitScheme;

public class Issue {

    public static void main(String[] args) {
        SameDiff sd = SameDiff.create();
        //Create input and label variables
        SDVariable sdfPoint = sd.placeHolder("point", DataType.FLOAT, -1, 3); //Shape: [?, 3]
        SDVariable ray = sd.placeHolder("ray", DataType.FLOAT, -1, 3); //Shape: [?, 3]
        SDVariable expectedColor = sd.placeHolder("expected-color", DataType.FLOAT, -1, 3); //Shape: [?, 3]

        SDVariable sdfInput = denseLayer(sd, 10, 3, sdfPoint);
        SDVariable sdf = denseLayer(sd, 3, 10, sdfInput);
        sdf.markAsLoss();

        SDVariable idrRenderGradient = sd.grad(sdfPoint.name());
        SDVariable dotGrad = idrRenderGradient.dot(ray); // org.nd4j.autodiff.util.SameDiffUtils.validateDifferentialFunctionSameDiff(SameDiffUtils.java:134)

        sd.loss().meanSquaredError(expectedColor, dotGrad, null);
    }

    private static SDVariable denseLayer(SameDiff sd, int nOut, int nIn, SDVariable input) {
        SDVariable w = sd.var(input.name() + "-w1", new XavierInitScheme('c', nIn, nOut), DataType.FLOAT, nIn, nOut);
        SDVariable b = sd.zero(input.name() + "-b1", 1, nOut);
        SDVariable z = input.mmul(w).add(b);
        return sd.nn().tanh(z);
    }
}

Is it at all possible or should I use detached gradient via calculateGradientsAndOutputs as the only option?

@valb3r that means that the variable you are passing in is not attached to the right samediff instance. Let’s focus on that first.

That doesn’t necessarily prevent anything it just means you need to manage references of the variables you’re passing around.

Make sure to just use samediff here. Could you elaborate on what denseLayer is? Thanks!

@agibsonccc
Sorry, not really sure I understand the question as denseLayer implementation is provided below in the code block
Here is self-contained repository to reproduce the issue:

I’m attempting to build the following feed-forward network:

1. Input (point, ....)
2. Dense layer
3. Dense layer -> Is also output of signed distance function, also used as an interim output
(Below not shown in example)
4. Custom layer that transforms gradient of layer (3) as well as its output
5. Dense layer
6. Dense layer
7. Actual Output

As far as I can understand call to sd.grad creates new SameDiff instance for idrRenderGradient variable inside SameDiff.createGradFunction at

defineFunction:3987, SameDiff (org.nd4j.autodiff.samediff)
defineFunction:3975, SameDiff (org.nd4j.autodiff.samediff)
createGradFunction:4186, SameDiff (org.nd4j.autodiff.samediff)
createGradFunction:4093, SameDiff (org.nd4j.autodiff.samediff)
grad:3602, SameDiff (org.nd4j.autodiff.samediff)
main:21, Issue (com.valb3r.idr.networks)

@valb3r yes gradients are stored in a sub dictionary. A new samediff instance for computing gradients is stored and that’s what used to lookup gradient states as well.

So in order to use that gradient I need to invoke it on the original SameDiff instance with invokeGraphOn?

@valb3r no that’s mainly meant to be internal. You should be able to just call grad(…) and we do the rest.

Internally, it calls doDiff on any relevant function which then extends the relevant graph.

If you need to access any gradient just call grad(…) on the samediff instance you created.

You might be able to also use some of the tests here for examples:

One other relevant concept here might be ExternalErrors which allows you to compute gradients somewhere else and pass them in to a samediff instance.

Either way this might be able to help you figure out how to use the various custom gradient behavior internals. Try not to focus too much on the internal details beyond understanding what kind of error might be caused (eg: the internal structure of the gradients) for debugging purposes.

Anything else should be handled by the maintainers.

1 Like

It seems like it is not working currently, should I create issue in Github for that?

@valb3r yes please create an issue

1 Like

Done, Support for using sd.grad output as an intermediate variable · Issue #9710 · eclipse/deeplearning4j · GitHub

@valb3r thanks. For now try just using separate samediff instances and external errors as a workaround for now. Continuing a samediff gradient calc based on an intermediate result will take a bit of thinking in the mean time. If you want to save everything together just call

samediff.putSubFunction(“your_grad_intermediate”,separateSamediffInstance);

Could you remind me of the use case here a bit so I can think of how this might be used in other contexts as well so I can generalize this a bit? One immediate way I could think of doing this would be just doing sub functions under the hood and making it seamless.

There’s a new Invoke op that might make that fairly easy. Still not sure yet though.

Could you remind me of the use case here a bit so I can think of how this might be used in other contexts as well so I can generalize this a bit?

It is for so called implicit differential rendering of 3d scenes, in this task one overfits neural network to a scene projections (images or RGB-D) in order to obtain implicit 3d model baked inside neural network. There are following components acting in this process:

  1. Neural network that represents signed distance function (1)
  2. Special layer that makes rendering process differentiable. As the input, it takes gradient of (1) and its output and generates input to (2). It is mostly linear operations.
  3. Neural network that represents shape and texture (2) - takes input from special layer and produces color value.

For more details you can check Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance - it has PyTorch implementation available

@valb3r then I think an external errors could work.

All you would have is 1 net per component with breaking up the problem so you don’t have the intermediate variable in the same graph but instead just consume it in another graph.

Combining the parts together should achieve the same result.