Cannot update a model using Gradient if it hasn't had a computeGradientAndScore() called on it

val model = new MultiLayerNetwork(generateConf())
model.init()

val temp = model.clone()

val x = Nd4j.create(Array(0.5, 0.5), Array(1,2))
val y = Nd4j.create(Array(2.0), Array(1,1))
temp.feedForward(x, true)
temp.setLabels(y)
temp.computeGradientAndScore()

val grad = temp.gradient()

println(grad)

println(model.params())
model.update(grad)
println(model.params())

def generateConf(): MultiLayerConfiguration = {
val numNodes = List(2, 3, 1)

new NeuralNetConfiguration.Builder()
  .seed(42)
  .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
  .updater(new Sgd(0.001)) // alpha
  .list()
  // input of the raw features
  .layer(0, new DenseLayer.Builder().nIn(numNodes.head).nOut(numNodes(1))
    .weightInit(WeightInit.XAVIER)
    .activation(Activation.TANH)
    .build())
  .layer(1, new OutputLayer.Builder(LossFunction.MSE)
    .weightInit(WeightInit.XAVIER)
    .activation(Activation.IDENTITY)
    .nIn(numNodes(1)).nOut(numNodes(2)).build())
  .build()
}

An error occurs on the model.update(grad) line and the output is shown below

DefaultGradient{gradients={1_W=[[-1.8363], 
[-2.7604], 
[2.5442]], 1_b=[[-5.4909]], 0_b=[[    4.3251,    0.0419,   -4.1294]], 0_W=[[    2.1625,          
0.0210,   -2.0647], 
[    2.1625,    0.0210,   -2.0647]]}}
[[    0.6854,    0.0102,    0.1997,    0.9062,   -0.8818,   -0.1214,         0,         0,         0,      
-0.8869,   -0.0102,    0.9577,         0]]
Exception in thread "main" java.lang.NullPointerException
at  
org.deeplearning4j.nn.multilayer.MultiLayerNetwork.update
(MultiLayerNetwork.java:3064)
at Testing$.main(Testing.scala:34)
at Testing.main(Testing.scala)

Essentially the gradient of model is null. Is there a way to initialize the gradient to some default value like zero for each layer? I could theoretically set the inputs as zeros and then the output as zero and then call the computeGradientAndScore() but that seems roundabout.

Any suggestions would be great. I’m trying to have a cloned copies of the main model gather the gradients and then pass it back to the main model for updating.

@sachag678 try calling: deeplearning4j/MultiLayerNetwork.java at df0d5083c33d49f3cbe6663d3c5102cf983a63fc · KonduitAI/deeplearning4j · GitHub

Unfortunately that doesn’t work. I took a look at the function and it doesn’t seem to set the gradient variable to anything.

My proposed method of setting the input to zero and the label to zero and calling the calculateGradientAndScore method works. But it’s super hacky.

@sachag678
Here’s a workaround after I downloaded your code and played with it:

    val model = new MultiLayerNetwork(generateConf())
    model.init()
    val field = classOf[MultiLayerNetwork].getDeclaredField("gradient")
    field.setAccessible(true)
    val gradient = new DefaultGradient()
    model.paramTable().entrySet.forEach(entry => {
      gradient.setGradientFor(entry.getKey,Nd4j.zeros(entry.getValue.shape(),entry.getValue.ordering()))
    })

    ReflectionUtils.setField(field,model,gradient)

Imports

import org.deeplearning4j.nn.api.OptimizationAlgorithm
import org.deeplearning4j.nn.conf.layers.{DenseLayer, OutputLayer}
import org.deeplearning4j.nn.conf.{MultiLayerConfiguration, NeuralNetConfiguration}
import org.deeplearning4j.nn.gradient.DefaultGradient
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.deeplearning4j.nn.weights.WeightInit
import org.nd4j.common.io.ReflectionUtils
import org.nd4j.linalg.activations.Activation
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.learning.config.Sgd
import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction

This is in scala.

  val model = new MultiLayerNetwork(Util.generateConf(0.00001, inputSize))

  model.init()

  val tempx = Nd4j.create((for (_ <- 0 until inputSize) yield 0.0).toArray, Array(1,inputSize))
  val tempy = Nd4j.create(Array(0.0), Array(1,1))
  model.feedForward(tempx, true)
  model.setLabels(tempy)
  model.computeGradientAndScore()

The above is my current solution. A bit more hacky than yours, but it works. I like yours too. I guess using there is no use case here for having this functionality. I may open a feature request in the github. Thanks for your help!