SameDiff optimizer

Hello. I am trying to similar to tensoflow working code with SameDiff.
So, tenflow code:

    obs_ph = tf.placeholder(shape=(None, obs_dim[0]), dtype=tf.float32, name='obs')
    act_ph = tf.placeholder(shape=(None, ), dtype=tf.int32, name='act')
    ret_ph = tf.placeholder(shape=(None, ), dtype=tf.float32, name='ret')

    x = tf.layers.dense(obs_ph, units=32, activation=tf.tanh)
    p_logits = tf.layers.dense(x, units=2, activation=None)
    actions_mask = tf.one_hot(act_ph, depth=act_dim)
    p_log = tf.reduce_sum(actions_mask * tf.nn.log_softmax(p_logits), axis=1)
    p_loss = -tf.reduce_mean(p_log * ret_ph)
    p_opt = tf.train.AdamOptimizer(lr).minimize(p_loss)

My samediff variant:

 SameDiff sd = SameDiff.create();
        var obsPh = sd.placeHolder("obs", DataType.FLOAT, 2);
        var actPh = sd.placeHolder("act", DataType.INT32, 1);
        var retPh = sd.placeHolder("ret", DataType.FLOAT, 1);

        SDVariable w0 = sd.var("w0", new XavierInitScheme('c', 2, 8), DataType.FLOAT, 2, 8);
        SDVariable b0 = sd.var("b0", 1, 8);
        SDVariable out0 = sd.nn().tanh(obsPh.mmul(w0).add(b0));
        SDVariable w1 = sd.var("w1", new XavierInitScheme('c', 8, 2), DataType.FLOAT, 8, 2);
        SDVariable b1 = sd.zero("b1", 1, 2);
        SDVariable pLogits = sd.nn().tanh(out0.mmul(w1).add(b1));

        SDVariable actSoftmax = sd.nn().softmax(pLogits);

        SDVariable actionMasks = sd.oneHot(actPh, 2);
        SDVariable pLog = sd.sum(actionMasks.mmul(sd.nn.logSoftmax(pLogits)), 1);
        SDVariable pLoss = sd.mean(pLog.mmul(retPh)).mul(-1);
        sd.setLossVariables(pLoss);

Is there anything similar to “p_opt = tf.train.AdamOptimizer(lr).minimize(p_loss)” in same diff (regarding that it is RL4j and there is no labels)?

@roman yes use a training config.
Something like this:

  SameDiff sd = SameDiff.create();

            SDVariable in = sd.placeHolder("input", DataType.DOUBLE, -1, 4);
            SDVariable label = sd.placeHolder("label", DataType.DOUBLE, -1, 3);

            SDVariable w0 = sd.var("w0", new XavierInitScheme('c', 4, 10), DataType.DOUBLE, 4, 10);
            SDVariable b0 = sd.var("b0", Nd4j.create(DataType.DOUBLE, 1, 10));

            SDVariable w1 = sd.var("w1", new XavierInitScheme('c', 10, 3), DataType.DOUBLE, 10, 3);
            SDVariable b1 = sd.var("b1", Nd4j.create(DataType.DOUBLE,  1, 3));

            SDVariable z0 = in.mmul(w0).add(b0);
            SDVariable a0 = sd.nn().tanh(z0);
            SDVariable z1 = a0.mmul(w1).add("prediction", b1);
            SDVariable a1 = sd.nn().softmax("softmax", z1);

            SDVariable diff = sd.math().squaredDifference(a1, label);
            SDVariable lossMse = diff.mean();
            lossMse.markAsLoss();

            IUpdater updater;
            double lr;
            switch (u) {
                case "sgd":
                    lr = 3e-1;
                    updater = new Sgd(lr);
                    break;
                case "adam":
                    lr = 1e-2;
                    updater = new Adam(lr);
                    break;
                case "nesterov":
                    lr = 1e-1;
                    updater = new Nesterovs(lr);
                    break;
                case "adamax":
                    lr = 1e-2;
                    updater = new AdaMax(lr);
                    break;
                case "amsgrad":
                    lr = 1e-2;
                    updater = new AMSGrad(lr);
                    break;
                default:
                    throw new RuntimeException();
            }

            List<Regularization> r = new ArrayList<>();
            if(l2Val > 0){
                r.add(new L2Regularization(l2Val));
            }
            if(l1Val > 0){
                r.add(new L1Regularization(l1Val));
            }
            if(wdVal > 0){
                r.add(new WeightDecay(wdVal, true));
            }
            TrainingConfig conf = new TrainingConfig.Builder()
                    .updater(updater)
                    .regularization(r)
                    .dataSetFeatureMapping("input")
                    .dataSetLabelMapping("label")
                    .build();
            sd.setTrainingConfig(conf);

Thank you @agibsonccc for your reply. If I would have labels then it is more or less clear how to use training config. In my example I have pLoss which is not canculated from labels. And I need just minimize it by using gradients. In tensorflow example the line tf.train.AdamOptimizer(learningRate).minimize(pLoss) uses gradients under the hood as I understand.

@roman you’d declare your loss variables and call calculateGradients manually then call update. The gradients will automatically be calculated with resepect to whatever loss variables you configured. You can also pass parameters in to calculateGradients.

@agibsonccc I am sorry, but I cannot understand what to do with gradients. Usually gradients are used for updating weigths of the model, I tried to do this, but it is not clear how to do it exactly with SameDiff. And I cannot find any practical example. The case - I don’t have labels, only features. It means I cannot use training config, because TrainingConfig.Builder.build() requires feaures and labels:

java.lang.IllegalStateException: No DataSet label mapping has been provided. A mapping between DataSet array positions and variables/placeholders must be provided - use dataSetLabelMapping(...) to set this, or use markLabelsUnused() to mark labels as unused (for example, for unsupervised learning)

I can only define loss function which I want to minimize. Is there any working example for my case? I am just trying to implement REINFORCE algorithm (RL4j policy gradient) and it means that I don’t have labels - only rewards which I can use for gradinent calculation. After gradients are calculated it is not clear how to use it for updating weights of SameDiff variables - beacuse I need to update them iteratively. Perhaps you can give some link on practical example for SameDiff and RL4J policy gradient method?

Also I faced with problem of multiple fit(…) invocation. The idea is just collect some outputs (which are kinda trajectories) from network, then calculate gradients and update network, then again collect outputs and again calculate gradients on this updated network and again update network and so on, so on. So, we don’t have the initial dataset with features we collect such ds epoch by epoch. The problem is when I execute fit(…) for the second time I get the error related to the closed placeholders:

java.lang.IllegalStateException: Op.X argument was closed before call

Looks like I need somehow update trainig session, right? How can I do it?

@roman When you calculate gradients you calculate them manually then apply them. Something like this:

TrainingConfig config = TrainingConfig.builder()
.updater(new Adam(learningRate))
.dataSetFeatureMapping(“inputVar”)
.markLabelsUnused()
.lossVariables(“lossVar”)
.build();

sd.setTrainingConfig(config);
sd.setLossVariables(“lossVar”);

Map<String, INDArray> placeholders = new HashMap<>();
placeholders.put(“inputVar”, currentStateBatch); // currentStateBatch is the batch of observations
sd.execBackwards(placeholders);

// After backprop, gradients are available internally.
// Get updaters and apply them:
GradientUpdater updater = sd.getUpdaterMap().get(sd);
updater.update(sd.getVariables());

Basically you can mark the training config as not having any labels and you can use gradient updaters to do manual gradient updates for each variable depending on how you want to apply the gradients.

@agibsonccc thank you! I really appreciate your help and time you spend helping me.
Well, sd.execBackwards(placeholders) - I couldn’t find such function… I tried sd.calculateGradients(placeholders,…) and then sd.gerUpdaterMap() - it’s empty. I searched a bit for it and found that updaterMap is populated when you invoke fit(…). So, anyway I need to use fit(…).

But with fit(…) as I wrote above I have a problem. My current code is:

        var trainingConfig = new TrainingConfig.Builder()
            .updater(new Adam(learningRate))
            .dataSetFeatureMapping("input")
            .markLabelsUnused()
            .minimize("mse")
            .build();
        sd.setTrainingConfig(trainingConfig);

//        var listener = new HistoryListener(trainingConfig);
        DataSet ds = new DataSet();
        ds.setFeatures(inputs);
        var iterator = new SingletonDataSetIterator(ds);
        // Here I emulate the situation when I iteratively will improve weights in network
        for (int i = 0; i < 100; i++) {
            // Error on the second iteration here
            sd.fit().epochs(10).train(iterator).exec(); // Here I have an error with closed placeholder 
        }

The problem which I am strugling now is:

java.lang.IllegalStateException: Op.X argument was closed before call

Could you please tell how the fit can be invoked several times on the same “sd”?

Finally I managed to get some simple working example which, I suppose, can be useful for rl policy gradient method.

    @Test
    public void testLinearRegression() {
        /**
         * Test of linear regression by training the model using some cost function
         * without labels.
         * (The goal is find the way to use the algorithm in
         * Reinforcement learning gradient policy method where
         * you need to calculate gradients of network and multiply
         * them by trajectory rewards in order to converge)
         */

        // Generate test data by model: y = kx + b
        double k = nextDouble(0, 1.0);
        double b = nextDouble(0, 1.0);
        int numIn = 1;
        double learningRate = 0.01;
        int epochNumber = 5000;

        int testDataLength = 10000;
        double[][] x = new double[testDataLength][numIn];
        for (int i = 0; i < testDataLength; i++) {
            x[i][0] = nextDouble(0, 1.0);
        }
        INDArray inputs = Nd4j.create(x);

        // Create very simple strategy
        SameDiff sd = SameDiff.create();
        SDVariable xPh = sd.placeHolder("input", DataType.FLOAT, 1);
        SDVariable w0 = sd.var("w0", new XavierInitScheme('c', 1, 1), DataType.FLOAT, 1, 1);
        SDVariable b0 = sd.var("b0", 1);
        SDVariable out = xPh.mmul(w0).add(b0);

        // Here the cost which we want to minimize.
        SDVariable exp = xPh.mul(k).add(b);
        SDVariable mse = sd.mean("mse", out.squaredDifference(exp));
        mse.markAsLoss();

        for (int i = 0; i < epochNumber; i++) {
            // calculate gradients in order to minimize mse
            var grads = sd.calculateGradients(Map.of("input", inputs), w0.name(), b0.name());
            // Update weights of layers
            var w0Update = grads.get(w0.name()).mul(learningRate);
            var b0Update = grads.get(b0.name()).mul(learningRate);
            w0.setArray(w0.getArr().sub(w0Update));
            b0.setArray(b0.getArr().sub(b0Update));
        }

        // Some simple tests
        for (float xt = 0.0f; xt < 1.0; xt += 0.1f) {
            INDArray h = out.eval(Map.of("input", Nd4j.create(new float[]{xt})));
            double yt = k * xt + b;
            log.info("y={}, h={}", yt, h);
        }

        // Just to compare original model and trained weights
        log.info("y={}*x + {}, w0:{} bo:{}", k, b, w0.getArr(), b0.getArr());
    }

@roman great work! Ping me with an at (so I get a notification) if you have any other issues!

Thank you @agibsonccc again for your help

1 Like