Assign value to a placeholder

Hi,

This may be nonsensical and show that I don’t really understand how samediff works.

Could I assign a value to a placeholder? How?

I have searched through the code and documentation but could not find a way to do that (if it makes sense and is possible).

Thanks

@adonnini You pass placeholders in with a map to fit or .output(…). In order to use fit(…) you need to set a training configuration and specify the feature and output mappings which maps a dataset with features and labels to a variable name. From there it’s done for you.

Thanks.I did not ask my question clearly enough. I have been using
placeholder for input and labels.

Now, I am trying to use placeholder for another training variable which
has content that changes with each iteration.

I would like to assign a value to this variable at the start of each
iteration. (How) can I do that?

Thanks

@adonnini usually you just specify placeholders each iteration. Can you specify what you’re trying to do? You can set it manually but usually the placeholder is updated each time you call fit or output.
Placeholders by their nature change every time and are inherently dynamic.

I am trying to assign value as an array with varying shape to a weight variable.

I thought that by defining the weight variable as a placeholder, I would be able to do that.

In my case with each iteration, the shape of input and labels changes. Weight needs to change accordingly.

I tried this


        SDVariable w1FromArray = sd.placeHolder("w1FromArray", DataType.FLOAT, miniBatchSize, t.getFeatures().size(2), t.getFeatures().size(2));

        SDVariable out = sd.nn.softmax("out", layer0.mmul(w1FromArray).add(b1));

Not surprisingly, execution failed with the following error:

An input placeholder “input” is required to calculate the requested outputs, but a placeholder value was not provided

In my code I do have an “input” placeholder


        SDVariable input = sd.placeHolder("input", DataType.FLOAT, miniBatchSize, nIn, t.getFeatures().size(2));

and the following TrainingConfig definition:


        TrainingConfig config = new TrainingConfig.Builder()
                .l2(1e-4)                               //L2 regularization
                .updater(new Adam(learningRate))        //Adam optimizer with specified learning rate
                .dataSetFeatureMapping("input")         //DataSet features array should be associated with variable "input"
                .dataSetLabelMapping("label")           //DataSet label array should be associated with variable "label"
                .build();

@adonnini can you post full code? The training config looks right. I don’t see how you’re attempting to execute something.

.output or .fit do everything for you. I don’t see why you need to do any of this.
Even as the write of the library I don’t need to do whatever it is this here. I just call fit or output and it works fine.

Below you will find my code.

In the code below, when

//        SDVariable out = sd.nn.softmax("out", layer0.mmul(w1).add(b1));
        SDVariable out = sd.nn.softmax("out", layer0.mmul(w1FromArray).add(b1));

the error message is
“An input placeholder “input” is required to calculate the requested outputs, but a placeholder value was not provided”

when

        SDVariable out = sd.nn.softmax("out", layer0.mmul(w1).add(b1));
//        SDVariable out = sd.nn.softmax("out", layer0.mmul(w1FromArray).add(b1));

the code runs through the first iteration then fails while attempting to execute the second because the value of #timesteps has changed. The error is the following:
"ShapeUtils::evalShapeForMatmul static method: input shapes are inconsistent: xDim 14 != yDim 33 "



    private static int lastTrainCount = 0;
    private static int lastTestCount = 0;

    private static SequenceRecordReader trainFeatures;
    private static SequenceRecordReader trainLabels;
    private static DataSetIterator trainData;
    private static SequenceRecordReader testFeatures;
    private static SequenceRecordReader testLabels;
    private static DataSetIterator testData;
    private static NormalizerStandardize normalizer;

    //Properties for dataset:
    private static int nIn = 6;
    private static int nOut = 2;

    private static int miniBatchSize = 32;
    private static int numLabelClasses = -1;

    private static SameDiff sd = SameDiff.create();

    private static long dim0 = 0L;
    private static long  dim1 = 0L;
    private static long dim2 = 0L;

    private static DataSet t;

    private static INDArray w1Array;

    public static void sameDiff3() throws IOException, InterruptedException
    {

        trainFeatures = new CSVSequenceRecordReader();
        trainFeatures.initialize(new NumberedFileInputSplit(featuresDirTrain.getAbsolutePath() + "/%d.csv", 0, lastTrainCount));
        trainLabels = new CSVSequenceRecordReader();
        trainLabels.initialize(new NumberedFileInputSplit(labelsDirTrain.getAbsolutePath() + "/%d.csv", 0, lastTrainCount));

        trainData = new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels, miniBatchSize, numLabelClasses,
                true, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);


        testFeatures = new CSVSequenceRecordReader();
        testFeatures.initialize(new NumberedFileInputSplit(featuresDirTest.getAbsolutePath() + "/%d.csv", 0, lastTestCount));
        testLabels = new CSVSequenceRecordReader();
        testLabels.initialize(new NumberedFileInputSplit(labelsDirTest.getAbsolutePath() + "/%d.csv", 0, lastTestCount));

        testData = new SequenceRecordReaderDataSetIterator(testFeatures, testLabels, miniBatchSize, numLabelClasses,
                true, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

        normalizer = new NormalizerStandardize();
        normalizer.fitLabel(true);
        normalizer.fit(trainData);           //Collect the statistics (mean/stdev) from the training data. This does not modify the input data
        trainData.reset();

        while(trainData.hasNext()) {
            normalizer.transform(trainData.next());     //Apply normalization to the training data
        }

        while(testData.hasNext()) {
            normalizer.transform(testData.next());         //Apply normalization to the test data. This is using statistics calculated from the *training* set
        }

        trainData.reset();
        testData.reset();

        trainData.setPreProcessor(normalizer);
        testData.setPreProcessor(normalizer);

        System.out.println(" Printing traindata dataset shape - 1");
        DataSet data = trainData.next();
        System.out.println(Arrays.toString(data.getFeatures().shape()));

        System.out.println(" Printing testdata dataset shape - 1");
        DataSet data2 = testData.next();
        System.out.println(Arrays.toString(data2.getFeatures().shape()));

        trainData.reset();
        testData.reset();

        UIServer uiServer = UIServer.getInstance();
        StatsStorage statsStorage = new InMemoryStatsStorage();         //Alternative: new FileStatsStorage(File), for saving and loading later
        uiServer.attach(statsStorage);
        int listenerFrequency = 1;
        sd.setListeners(new ScoreListener());


        t = trainData.next();
        dim0 = t.getFeatures().size(0);
        dim1 = t.getFeatures().size(1);
        dim2 = t.getFeatures().size(2);
        trainData.reset();

        getConfiguration();

        int whileLoopIndex = 0;
        Map<String,INDArray> placeholderData = new HashMap<>();

        whileLoopIndex = -1;
        trainData.reset();
        while(trainData.hasNext()) {
            ++whileLoopIndex;
            placeholderData = new HashMap<>();
            t = trainData.next();
            System.out.println(" ======================================================= - ");
            System.out.println(" Printing traindata feature and label dataset shape");
            System.out.println(Arrays.toString(t.getFeatures().shape()));
            System.out.println(Arrays.toString(t.getLabels().shape()));
            System.out.println(" ======================================================= - ");

            INDArray features = t.getFeatures();
            INDArray labels = t.getLabels();
            placeholderData.put("input", features);
            placeholderData.put("label", labels);

            dim0 = t.getFeatures().size(0);
            dim1 = t.getFeatures().size(1);
            dim2 = t.getFeatures().size(2);

            System.out.println(" features - dim0 - "+dim0);
            System.out.println(" features - dim1 - "+dim1);
            System.out.println(" features - dim2 - "+dim2);

            History history = sd.fit(t);

            System.out.println(" Completed training run --- ");

        }

        System.out.println(" Starting test data evaluation --- ");

        String outputVariable = "out";
        Evaluation evaluation = new Evaluation();
        sd.evaluate(testData, outputVariable, evaluation);

        System.out.println(" evaluation.stats() - "+evaluation.stats());


        String pathToSavedNetwork = "src/main/assets/location_next_neural_network_v6_07.zip";
        File savedNetwork = new File(pathToSavedNetwork);

        sd.save(savedNetwork, true);
//        ModelSerializer.addNormalizerToModel(savedNetwork, normalizer);

        System.out.println("----- Example Complete -----");

        File saveFileForInference = new File("src/main/assets/sameDiffExampleInference.fb");

        try {
            sd.asFlatFile(saveFileForInference);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

    }

    private static void getConfiguration()
    {

        SDVariable input = sd.placeHolder("input", DataType.FLOAT, miniBatchSize, nIn, t.getFeatures().size(2));
        SDVariable label = sd.placeHolder("label", DataType.FLOAT, miniBatchSize, nOut, t.getLabels().size(2));

        LSTMLayerConfig mLSTMConfiguration = LSTMLayerConfig.builder()
                .lstmdataformat(LSTMDataFormat.NST)
//                .lstmdataformat(LSTMDataFormat.NTS)
                .directionMode(LSTMDirectionMode.FWD)
//                .directionMode(LSTMDirectionMode.BIDIR_CONCAT)
                .gateAct(LSTMActivations.SIGMOID)
                .cellAct(LSTMActivations.SOFTPLUS)
                .outAct(LSTMActivations.SOFTPLUS)
                .retFullSequence(true)
                .retLastC(false)
                .retLastH(false)
                .build();

        LSTMLayerOutputs outputs = new LSTMLayerOutputs(sd.rnn.lstmLayer(
                input,
                LSTMLayerWeights.builder()
                        .weights(sd.var("weights", Nd4j.rand(DataType.FLOAT, nIn, 4 * nOut)))
                        .rWeights(sd.var("rWeights", Nd4j.rand(DataType.FLOAT, nOut, 4 * nOut)))
                        .bias(sd.var("bias", Nd4j.rand(DataType.FLOAT, 4 * nOut)))
                        .build(),
                mLSTMConfiguration), mLSTMConfiguration);

        System.out.println(" input.getShape()[0] - "+input.getShape()[0]);
        System.out.println(" input.getShape()[1] - "+input.getShape()[1]);
        System.out.println(" input.getShape()[2] - "+input.getShape()[2]);

        SDVariable layer0 = outputs.getOutput();

//            SDVariable layer1 = layer0.mean(1);

        SDVariable w1 = sd.var("w1", new XavierInitScheme('c', nIn, nOut), DataType.FLOAT, miniBatchSize, t.getFeatures().size(2), t.getFeatures().size(2));
        SDVariable b1 = sd.var("b1", Nd4j.rand(DataType.FLOAT, miniBatchSize, nOut, t.getFeatures().size(2)));

        XavierInitScheme xavierInitScheme = new XavierInitScheme('c', nIn, nOut);
        w1Array = xavierInitScheme.create(DataType.FLOAT, nIn, miniBatchSize, t.getFeatures().size(2), t.getFeatures().size(2));
        SDVariable w1FromArray = sd.placeHolder("w1FromArray", DataType.FLOAT, miniBatchSize, t.getFeatures().size(2), t.getFeatures().size(2));

//        SDVariable out = sd.nn.softmax("out", layer0.mmul(w1).add(b1));
        SDVariable out = sd.nn.softmax("out", layer0.mmul(w1FromArray).add(b1));

        SDVariable loss = sd.loss.logLoss("loss", label, out);

        sd.setLossVariables("loss");

        double learningRate = 1e-3;
        TrainingConfig config = new TrainingConfig.Builder()
                .l2(1e-4)                               //L2 regularization
                .updater(new Adam(learningRate))        //Adam optimizer with specified learning rate
                .dataSetFeatureMapping("input")         //DataSet features array should be associated with variable "input"
                .dataSetLabelMapping("label")           //DataSet label array should be associated with variable "label"
                .build();

        sd.setTrainingConfig(config);

        System.out.println(" Printing sd information");
//            System.out.println(sd.toString());
        System.out.println(sd.summary());

    }

@adonnini I still don’t get it even from your code. Samediff is declarative. You set your graph up and then you call fit and output. That’s literally it. I’m seriously not sure what you’re missing here. There’s nothing else to it.

For your iterator just use it as is. That is what the dataset mapping is for.

I understand. Yet, execution fails because of the change in #timeSteps from one iteration to the next. You saw the error message.

I thought that defining the weight variable as a placeholder would resolve the problem. However, I don’t know how to assign a value to a placeholder variable.

I thought you suggested to take the approach of defining weights as a placeholder in this thread:

@adonnini sorry post it for context. Paul was helping you before. Weights usually shouldn’t be a placeholder since it’s a fixed variable. Let’s ignore the previous thread for a second.

The way to handle dynamic shapes is typically masking. When you define an lstm layer even in dl4j weights are fixed.

Great for doing your research and apologies for the confusion. I might have proposed that at the time as potential solution but I don’t believe in workarounds here.

I think we already talked about this. Weights are fixed. Dynamic time series lengths use masking, not different weights per minibatch. That would make no sense and that’s not how anyone does this.

Yes. You have been very clear about the fact that weights are fixed. Previously I raised the possibility of using masking to address the fact that #timeSteps changes with every iteration.

I don’t quite know how to apply masking to ensure that the shape of weights is matched by the varying shape of feature files. I assume that masking is to be applied to the feature datasets. It’s not clear to me what the fixed shape of weights should be set to.

In the documentation I found guidelines for setting the shape of LSTMLayerWeigths in

I don’t quite understand what the shape of w1, the other weights variable in my code should be set to

Could you point me to an example or a test for setting weight shapes and masking to have shapes in input files and weights match?

The problem you run into is due to another problem in understanding what you are doing here.

This forces the LSTMLayer to produce a sequence output. That means it will produce an output of the shape [batchSize, nOut, timesteps].

Next you take that and try to do a matrix multiplication (.mmul) with that sequence.

And there are multiple problems there too:

  1. The shape of w1 is off. It should be [nOut, labelCount] so that it projects things into the correct shape.
  2. The shape of b1, it should have just the shape [labelCount].
  3. Matrix multiplication has very specific rules about the shapes it supports (see Matrix multiplication - Wikipedia).

As you can see here we can apply a .mmul on a mini batch directly:

In that example the inputs to the multiplication have the shape [minibatchSize, nIn] and our weights have the shape [nIn, nOut].

With that setup a regular matrix multiplication then produces the shape [minibatchSize, nOut].

I suppose that you want to apply it independently on every step-wise output and every entry in the mini batch.

One way of doing this is to .permute your lstm output and go from [batchSize, nOut, timesteps] to [batchSize, timesteps, nOut], then you .reshape to [batchSize * timesteps, nOut], then you .mmul and undo the transformations by .reshape-ing to [batchSize, timesteps, labelCount] first and then .permute to [batchSize, labelCount, timesteps] again.

Alternatively you may try using tensorMmul, which can in principle do that multiplication, but I always get lost in the dimension specification on that.

Thanks. This is very helpful.

Two quick questions:

  1. I don’t recall seeing the term labelCount used before. What does it refer to exactly?

  2. By setting retFullSequence to false will I produce an output with shape [minibatchSize, nOut, 1]?

Thanks

label count is literally what it says: the label count. If you’ve got 2 labels then your label count is 2.

If you set .retFullSequence(false), you’ll have to set one of the other .ret* things to true.

If you set .retLastH(true) it will return the output for the last timestep only and that should have a shape of [batchSize, nOut].

OK. That’s clear.

If the rank of w1 is set to 2, won’t that cause a problem when matrix multiplying with input which has rank of 3?

I’m not sure I get your question.

If w1 is a matrix of shape [k, l] and you want to multiply it with a tensor of shape [m, k, o] to get an output of shape [m, l, o], then you’ll need to apply the permutation and reshaping that I’ve explained previously.

Ok. It’s the permutation and reshaping that I did not understand was necessary. Thanks

Sorry about this question. I cannot understand why the print statement in the code below produces a null result. I need to lstmLayer shape information to see what I am doing wrong. I am not asking for value, just shape information which is defined when I create the variable. What am I missing?

Thanks

        LSTMLayerOutputs outputs = new LSTMLayerOutputs(sd.rnn.lstmLayer(
                input,
                LSTMLayerWeights.builder()
                        .weights(sd.var("weights", Nd4j.rand(DataType.FLOAT, nIn, 4 * nOut)))
                        .rWeights(sd.var("rWeights", Nd4j.rand(DataType.FLOAT, nOut, 4 * nOut)))
                        .bias(sd.var("bias", Nd4j.rand(DataType.FLOAT, 4 * nOut)))
                        .build(),
                mLSTMConfiguration), mLSTMConfiguration);

        SDVariable layer0 = outputs.getOutput();

         System.out.println(" ======================================================= - ");
         System.out.println(" Arrays.toString(layer0.getShape()) - 0 - "+ Arrays.toString(layer0.getShape()));

That is because in order to get the definitive shape, it needs to execute the calculation up to this point.

As the output shape here depends on the shape of its inputs (because it supports arbitrary many timesteps), it is therefore impossible to tell you the shape without executing it.