Prediction of Disk Space

Hello everyone

I am new to DL4J and working on a first small project with the main goal to predict when a disk/computer could run into a “No space left on disk”-problem.

Would like to do this with an LSTM (Time Series prediction). The input data looks like the following:
timestamp; freeSpaceInMegaBytes

My problem: I am struggling with predicting the time step values after the training phase of the neural network and after predicting the test data.

Training Code:

        for (int i = 0; i < nEpochs; i++) {
            net.fit(trainData);
            LOGGER.info("Epoch " + i + " complete. Time series evaluation:");

            //Run regression evaluation on our single column input
            RegressionEvaluation evaluation = new RegressionEvaluation(1);
            INDArray features = testData.getFeatures();

            INDArray lables = testData.getLabels();
            INDArray predicted = net.output(features, false);

            evaluation.evalTimeSeries(lables, predicted);

            //Just do sout here since the logger will shift the shift the columns of the stats
            System.out.println(evaluation.stats());
        }

Prediction of test data:

        //Init rrnTimeStemp with train data and predict test data
        net.rnnTimeStep(trainData.getFeatures());
        INDArray predicted = net.rnnTimeStep(testData.getFeatures());

Goal:

Code which predicts the next 50 timesteps after the prediction of the test data...

Is such a use case even possible to implement with DL4J?

Thanks in advance

First of all: Make sure to normalize your input data. If you put in raw data, your output is going to be pretty useless.

You’ve got two problems:

  • Timestamps usually are huge numbers, so they will probably fall out of the sensitive range of floats and out of the active range of your activation function (unless you use relu, but that may not be the best activation for your problem)
  • free space size in mb usually will also be a rather large number and the problems you’ve got with timestamps will likely also apply there

Ideally, your input and output data is going to be in a range between -1 and 1. Instead of free space in mb, try relative free space (i.e. 0 = disc full and 1 = disc entirely empty).

If timestamps are your only input data, you are also likely to just get useless output, because those numbers just don’t mean anything to the neural network, and getting that into the -1 to 1 range isn’t really viable. If you know that your data has some kinds of patterns (e.g. hourly, daily, monthly, yearly) then make the data that is necessary to see that pattern more available. Turn your timestamp into multiple features (e.g. day of year, day of week, week number, month, seconds, minutes, hours) and turn those into a the given range.

The more the network gets as input to work with, the more it can learn based of that. Ask yourself: Given just those inputs, can I predict a valid output? If no, you will need to find better inputs, esp. while you are trying to learn something.

As for your goal: You haven’t told us anything about what you tried, what didn’t work, what particular problems you have run into. So I can’t help you any more than what I’ve told you about your data.

1 Like

Thanks for your answer.

First of all: Make sure to normalize your input data. If you put in raw data, your output is going to be pretty useless.

The input data is already normalized with:

        NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler(0, 1);
        normalizer.fitLabel(true);
        normalizer.fit(trainData);              
        normalizer.transform(trainData);
        normalizer.transform(testData);

The more the network gets as input to work with, the more it can learn based of that. Ask yourself: Given just those inputs, can I predict a valid output? If no, you will need to find better inputs, esp. while you are trying to learn something.

I think the input is already in a good shape since the predictions of the test data is pretty good:

My main problem is:
How can I predict the timesteps which follow after the test data prediction (see diagram above). So I would like to predict the steps after 2479 (x-axis). So I am searching for a method which returns me the next timesteps after the 2479.

That probably works well enough for the free memory size, but for anything like a monotonically increasing value, like a timestamp, it will just continue to increase anyway and your model will performe worse and worse.

I’m not entirely sure I get the problem. You just feed net.rnnTimeStep with your input and it will continue with an output. Just make sure it is normalized in the same way as the training data was.

1 Like

I’m not entirely sure I get the problem. You just feed net.rnnTimeStep with your input and it will continue with an output. Just make sure it is normalized in the same way as the training data was.

Just for my understanding:
I have trained the model with the data from week 1 and week 2 and I would like to predict the values from week 4. In code I will do the following:

INDArray predicted = net.rnnTimeStep(week3Data.getFeatures());

And I will receive the predictions for week 4 in the variable predicted.
Is this correct?

You can also get your predictions one day (or hour or whatever your step size is) at a time, or you can give it the entire sequence.

Just take a look at the JavaDoc of rnnTimeStep, it tells you quite a lot:
https://javadoc.io/doc/org.deeplearning4j/deeplearning4j-nn/latest/org/deeplearning4j/nn/multilayer/MultiLayerNetwork.html#rnnTimeStep-org.nd4j.linalg.api.ndarray.INDArray-

1 Like

Thanks.

So the following code would deliver me the 10 next timesteps based on the 10 input steps which is defined in the flat array:

        for (int i = 0; i < nEpochs; i++) {
            net.fit(trainData);
            LOGGER.info("Epoch " + i + " complete. Time series evaluation:");

            //Run regression evaluation on our single column input
            RegressionEvaluation evaluation = new RegressionEvaluation(1);
            INDArray features = testData.getFeatures();

            INDArray lables = testData.getLabels();
            INDArray predicted = net.output(features, false);

            evaluation.evalTimeSeries(lables, predicted);

            //Just do sout here since the logger will shift the shift the columns of the stats
            System.out.println(evaluation.stats());
        }

        net.rnnClearPreviousState();

        // Predict next 10 values based on 10 input values
        double[] flat = ArrayUtil.flattenDoubleArray(new double[] { 31,
            40,
            29,
            30,
            43,
            45,
            49,
            44,
            40,
            35});
        int[] shape = {1,1,10};    //Array shape here
        INDArray ndArrayWith10Steps = Nd4j.create(flat,shape,'c');

        normalizer.transform(ndArrayWith10Steps);

        INDArray predicted = net.rnnTimeStep(ndArrayWith10Steps);
        normalizer.revertLabels(predicted);

        System.out.println(predicted);

Correct?

I’m not a compiler. Why don’t you run it and see what the output is?

From the general shape of it, it looks about right.

Great - thanks.

Could recommend a good example regarding time series prediction in DeepLearning4J?