Retrain lstm model

Vlad-Karpov · July 29, 2021, 4:34pm

expect {0.1, 0.11, 0.12, 0.13} get {0.087, 0.0777, 0.067, 0.0577 } ???
why?

@Test
public void test3() throws InterruptedException {
    MultiLayerNetwork net = createNeuralNet();
    double[] v = new double[]{0.01, 0.02, 0.03, 0.04, 0.05, 0.06};
    fitValues(net, v);
    extractedOut(net, v); //expect {0.07, 0.08, 0.09, 0.1, 0.11, 0.12} get {0.069, 0.079, 0.088, 0.097, 0.10, 0.11, 0.117} it`s ok
    double[] vn = new double[]{0.05, 0.06};
    extractedOut(net, vn); // expect {0.07, 0.08, 0.09} get {0.0299, 0.0398, 0.0496}  ???
    v = new double[]{0.07, 0.08, 0.09};  // retrain (add data no model)
    fitValues(net, v);
    extractedOut(net, v); // expect {0.1, 0.11, 0.12, 0.13} get {0.087, 0.0777, 0.067, 0.0577 }  ???
}

private void extractedOut(MultiLayerNetwork net, double[] v) {
    double[] p = nextValues(net, v);
    System.out.print(p[p.length-1]);
    Arrays.stream(nextValues(net, p)).forEach(e->System.out.print(", " + e));
    net.rnnClearPreviousState();
    System.out.println(" ");
}

public void fitValues(MultiLayerNetwork net, double... v) {
    double[] firstPeriod = Arrays.copyOf(v, v.length-1);
    INDArray y0 = Nd4j.create(firstPeriod);
    double[][][] data_1 = new double[][][]{{y0.toDoubleVector()}};
    INDArray data1 = Nd4j.create(data_1);

    double[] firstShiftPeriod = Arrays.copyOfRange(v, 1, v.length);
    final INDArray y0s = Nd4j.create(firstShiftPeriod);
    double[][][] data_1s = new double[][][]{{y0s.toDoubleVector()}};
    INDArray data1s = Nd4j.create(data_1s);

    long t = System.currentTimeMillis();
    int epoch = 1024;
    while (epoch > 0) {
        net.fit(data1, data1s);
        epoch--;
    }
    t = System.currentTimeMillis() - t;
    System.out.println("Time: " + (t / 1000) + " sec. and " + (t - t / 1000) + " msec.");
}

public double[] nextValues(MultiLayerNetwork net, double... v) {
    double[] firstPeriod = Arrays.copyOf(v, v.length);
    INDArray y0 = Nd4j.create(firstPeriod);
    double[][][] data_1 = new double[][][]{{y0.toDoubleVector()}};
    INDArray data1 = Nd4j.create(data_1);
    INDArray out = net.rnnTimeStep(data1);
    return out.get(NDArrayIndex.indexesFor(0L, 0L)).toDoubleVector();
}

public MultiLayerNetwork createNeuralNet() {
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .seed(12345)
            .weightInit(WeightInit.XAVIER)
            .updater(new AdaGrad(0.005))
            //.updater(new Nesterovs(MathFunctionsModel.learningRate, 0.9))
            .list()
            .layer(0, new LSTM.Builder()
                    .activation(Activation.TANH)
                    .nIn(1)
                    .nOut(100)
                    .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                    .gradientNormalizationThreshold(10)
                    .build())
            .layer(1, new LSTM.Builder()
                    .activation(Activation.TANH)
                    .nIn(100)
                    .nOut(100)
                    .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                    .gradientNormalizationThreshold(10)
                    .build())
            .layer(2, new LSTM.Builder()
                    .activation(Activation.TANH)
                    .nIn(100)
                    .nOut(100)
                    .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                    .gradientNormalizationThreshold(10)
                    .build())
            .layer(3, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MSE)
                    .activation(Activation.TANH)
                    .nIn(100)
                    .nOut(1)
                    .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                    .gradientNormalizationThreshold(10)
                    .build())
            .backpropType(BackpropType.TruncatedBPTT)
            .tBPTTLength(100)
            .build();
    MultiLayerNetwork net = new MultiLayerNetwork(conf);
    net.init();
    return net;
}

agibsonccc · July 29, 2021, 10:21pm

@Vlad-Karpov could you elaborate on what you are asking exactly? Some code and a 2 word sentence doesn’t really give us much to go on. I can guess that you’re asking about why those are the expected values?

Why do you expect those values specifically? Where is this code from?

Vlad-Karpov · July 30, 2021, 7:27am

I create a lstm network and train it time series.
Then I predict next time serias values, and its work correct. When come new data of time serias I need to add them in network, and I do that, but prediction after this add is wrong. I wrote this code, its my little test case.

treo · August 2, 2021, 7:00am

What you are seeing is something that is known as catastrophic forgetting.

As you fit only on the new data for another 1024 epochs, you are essentially overfitting on the new data.

You will need to include the old data in your training data set to mitigate that issue.

Vlad-Karpov · August 2, 2021, 8:00am

Really?

You whant to say that if my model holds weights in links between nodes and states in lstm recurent nodes for a 5 years and if I get new data for yesterday I have to teach my model from sсratch?

Are you serious?

treo · August 2, 2021, 8:03am

This is not magic, it is plain math.

You are changing the weights when you train the model.

When you are training it only with new data, it will change the weights to fit just the new data. It effectively treats the existing weights as just initialization.

agibsonccc · August 2, 2021, 9:20am

@Vlad-Karpov please try to keep it professional. You have someone giving their time at no cost to you to answer your question. If I see condescending responses again, I’ll just ban you from the forums.

Paul was describing a real concept that’s already in the literature: [1612.00796] Overcoming catastrophic forgetting in neural networks

Try to give someone who’s donating their time to help you a bit of credit and reply with an open mind. It will make the community as a whole better.

Vlad-Karpov · August 2, 2021, 9:52am

Thanks for answer.
And sorry for “condescending”, my english so poor. I have not wanted offend nobody.

agibsonccc · August 2, 2021, 9:59am

@Vlad-Karpov thanks for following up! I just wanted to make sure we cleared that up. Sometimes folks come in and think there’s some underlying sarcasm when we are just trying to answer their question.

@treo was explaining a mechanic to you. Usually terms like that have specific names that come from papers like what I linked you.

Try to ask what something is if it’s not clear, sometimes when we answer questions we assume the person might either google it or know what the term is and that might prevent the intended answer from getting through.

Thanks for being understanding! I appreciate you being receptive to feedback.

Topic		Replies	Views
Getting output prediction for LSTM DL4J	3	547	September 24, 2022
Training data shapping DL4J	4	710	June 5, 2021
A post in "Weird results from my LSTM prediction" requires staff attention DL4J	2	403	January 11, 2021
DL4J Need help with my input data DL4J	5	459	December 25, 2020
Invalid input in LSTM prediction DL4J	0	228	June 27, 2022

Retrain lstm model

Related topics