Upgraded from 1.0.0-beta7 to 1.0.0-M1 - Getting Exceptions in code that worked prior

Anrix · May 28, 2021, 1:21pm

Hello,

I upgraded versions, and models that use to work in the prior version are returning exceptions like the following:

Exception in thread "main" java.lang.IllegalStateException: Layer 0 returned null activations
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.ffToLayerActivationsInWs(MultiLayerNetwork.java:1154)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2781)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2739)
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174)
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61)
	at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1750)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1671)

This is being thrown while trying to fit a DataSetIterator. My layer 0 is pretty simple with only one input feature:

.layer(0, new LSTM.Builder().activation(Activation.TANH).nIn(1).nOut(50).build())

Is there some new configuration that needs to be added with the latest version?

Additionally, I am attaching a snapshot of the debugger showing there is data going in.

I tried to follow this through as far as I could with the debugger and found that after line 157 in the LSTM class the downstream process expects a value for the fwdPassOutput but for some reason the LSTMHelper.activateHelper is returning a fwd that has all null attributes.

Thank you

agibsonccc · May 28, 2021, 10:16pm

@Anrix thanks a lot for telling us. Could you DM me your model and something I can just import in to intellij? The less time I spend reproducing this, the quicker I can get a fix out. Worse case scenario, it’s very easy for us to publish a quick follow up release (which we intended on doing anyways to cover other bugs/performance enhancements etc)

Thanks!

Anrix · May 29, 2021, 2:59pm

@agibsonccc in case you are not notified by the private message, I sent you a link to a slimmed down version of the project that will create this exception along with the data files, and an instruction on how to get it working locally.

Thanks for looking into this.

agibsonccc · May 31, 2021, 3:36am

So looking in to this a bit if the difference is the helper, it probably means the helper wasn’t being used before. Could you give me sample inputs that work for both M1 and beta7 so I can quickly verify the difference?
Sorry to be a hassle, but exact code with a main method I could copy/paste would be preferrable. Something like:

public class Example {
  public static void main(String...args) {
       INDArray myInput = ...;
      MultiLayerNetwork network = ...;
      network.output(myInput);
 }
}

Something minimal would be great even if it’s just your expected output layer + the LSTM.
I appreciate you meeting me in the middle here, given the number of issues I have to look at, even a little bit of time matters right now.

Anrix · June 3, 2021, 9:47pm

Hello Adam, Sorry for the delay in getting back to you. I will have to do some playing around because I was unable to get it to work with M1, but I can try and put something simple together and get it to you by the end of the weekend.

Thank you

Anrix · June 4, 2021, 4:19pm

I can’t even get the simplest code to run in M1. What triggers the use of the helper? I see a good bit of conditionals involving the Workspace. Is this preventing my code from running in M1?

 public static void main(String... args) {
        double[] inputData = new double[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
        double[] outputData = new double[]{10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
        INDArray features = Nd4j.create(new DoubleBuffer(inputData), 1, 1, inputData.length);
        INDArray labels = Nd4j.create(new DoubleBuffer(outputData), 1, 1, outputData.length);
        MultiLayerConfiguration configuration = new NeuralNetConfiguration.Builder()
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .seed(10)
                .weightInit(WeightInit.XAVIER)
                .updater(new RmsProp())
                .list()
                .layer(0, new LSTM.Builder()
                        .activation(Activation.RELU)
                        .nIn(1)
                        .nOut(5)
                        .build())
                .layer(1, new LSTM.Builder()
                        .activation(Activation.RELU)
                        .nIn(5)
                        .nOut(5)
                        .build())
                .layer(2, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MSE)
                        .activation(Activation.RELU)
                        .nIn(5)
                        .nOut(1)
                        .build())
                .build();
        configuration.setDataType(DataType.DOUBLE);
        MultiLayerNetwork network = new MultiLayerNetwork(configuration);
        for (int i = 0; i < 200; i++)
            network.fit(features, labels);
        INDArray predictions = network.output(features);
        for (int i = 0; i < predictions.length(); i++) {
            System.out.print(predictions.data().getDouble(i) + ", ");
        }
        System.out.println("");
    }

agibsonccc · June 5, 2021, 5:25am

@Anrix thanks for getting back to me. I reproduced your issue thanks to this and it is indeed helper related. The mkldnn bindings aren’t assigning output to the result causing the null values to show up.

We’ll get a quick fix out early next week and a quick follow up release in the next week or so. The bulk of the work went in to our CI migration and getting out a new release due to the changes in the last year + the cutting of certain modules. Thanks for helping out in getting a lot of this looked at.
The goal of the milestones is to checkpoint bug fixes and improvements towards a final 1.0 release.

Regarding your question on the helpers being triggered, you can see that here:

github.com

eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/recurrent/LSTM.java#L53


      
          protected LSTMHelper helper = null;
          protected FwdPassReturn cachedFwdPass;
          public final static String CUDNN_LSTM_CLASS_NAME = "org.deeplearning4j.cuda.recurrent.CudnnLSTMHelper";
          public LSTM(NeuralNetConfiguration conf, DataType dataType) {
              super(conf, dataType);
              initializeHelper();
          }
          
          
void initializeHelper() {
              helper = HelperUtils.createHelper(CUDNN_LSTM_CLASS_NAME,
                      "",
                      LSTMHelper.class, layerConf().getLayerName(), dataType
              );
          }
          
          
@Override
          public Gradient gradient() {
              throw new UnsupportedOperationException(
                      "gradient() method for layerwise pretraining: not supported for LSTMs (pretraining not possible) "
                              + layerId());
          }

The use of the helper is optional. If you want for now, you can comment out that line in LSTM, run mvn clean install on just deeplearning4j-nn and have it run just fine.

We’ll work on a way of making that easier to control as well via a system property so people can just turn the helpers off if they cause issues.

Beyond that, they are also accessible via new classifiers on nd4j-native (which just means that’s our c++ libnd4j code base with certain compiler flags packaged as a jar file) . One of those combinations allows onednn-* in our classifiers here: Central Repository: org/nd4j/nd4j-native/1.0.0-M1

Now that we’re able to release more often and we have a lot less to maintain it will be more straightforward to get things like this ironed out and polished quickly. Community participation is really helping us to get the next version out quickly. Thanks again!

Topic		Replies	Views
RuntimeException in Bidirectional trainning DL4J	3	376	November 22, 2020
BatchNormalization layer inside FrozenLayerWithBackprop null exception DL4J	3	429	September 29, 2020
RecurrentAttentionLayer error on gpu DL4J	3	490	March 5, 2020
JsonMappingException ND4J	35	1584	December 4, 2020
Spark evaluationRegression causes NullPointerException DL4J	2	454	May 13, 2020

Upgraded from 1.0.0-beta7 to 1.0.0-M1 - Getting Exceptions in code that worked prior

Related topics