I have used two fit functions in MultiLayerNetwork when running MLPMnistSingleLayerExample and another program similar to MLPMnistSingleLayerExample as:
(1) fit(org.nd4j.linalg.dataset.api.DataSet data) with another program similar to MLPMnistSingleLayerExample
(2) fit(DataSetIterator iterator, int numEpochs) with MLPMnistSingleLayerExample
The models and settings by both program are the same:
single layer, seed, batchsize, iterations, and so on.
But both results are somewhat different as:
(a) with (1)
Accuracy: 0.9332
Precision: 0.9324
Recall: 0.9325
F1 Score: 0.9323
(b) with (2)
Accuracy: 0.9318
Precision: 0.9312
Recall: 0.9310
F1 Score: 0.9309
The difference is little, but is not neglected. I think both fitting functions should be functionally equivalent. Is there some small difference in designing these two functions?
MLPMnistSingleLayerExample is the same as in DL4J, except for setting epoch to 2.
The program (1) is as below
If everything else is identical, those two methods should return the same result. I guess that there is some kind of small difference between the two executions of your programs.
In any case, the second option, i.e. fit(DataSetIteator, int) is usually prefered as it provides more than just Async preloading.
I copied code from MultiLayerNetwork to my program (1) as below, but the result is still different from that with fit(org.nd4j.linalg.dataset.api.DataSet data) . I’m sure that both fitting functions are not identical.
try
{
mnistTrain = new MnistDataSetIterator(batchSize, true, 123);
mnistTest = new MnistDataSetIterator(batchSize, false, 123);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
log.info("Build model....");
//print the score with every 1 iteration
model.setListeners(new ScoreIterationListener(1));
log.info("Train model....");
model.fit(mnistTrain, 2);
log.info("Evaluate model....");
Evaluation eval = model.evaluate(mnistTest);
log.info(eval.stats());
I wrote a program test.OnebatchFitVSOnepochFit following DL4J examples to reproduce the problem above. The program is to test fit(DataSetIterator iterator, int numEpochs) and fit(org.nd4j.linalg.dataset.api.DataSet data). I got different results which can be reproduced.
===================fit(DataSetIterator iterator, int numEpochs) ===================
09:21:20,463 INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener.iterationDone(ScoreIterationListener.java:53) ~ Score at iteration 399 is 0.3696645966271476
=====================Evaluation Metrics=====================
# of classes: 10
Accuracy: 0.9145
Precision: 0.9135
Recall: 0.9137
F1 Score: 0.9134
=====================fit(org.nd4j.linalg.dataset.api.DataSet data) ==================
09:32:31,827 INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener.iterationDone(ScoreIterationListener.java:53) ~ Score at iteration 399 is 0.3796121184546202
======================Evaluation Metrics======================
# of classes: 10
Accuracy: 0.9140
Precision: 0.9133
Recall: 0.9129
F1 Score: 0.9127
//=============== below is the program=================
package test;
import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.conf.layers.DenseLayer;
import org.deeplearning4j.nn.conf.layers.OutputLayer;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
import org.nd4j.evaluation.classification.Evaluation;
import org.nd4j.linalg.activations.Activation;
import org.nd4j.linalg.dataset.AsyncDataSetIterator;
import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.learning.config.Nesterovs;
import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;
public class OnebatchFitVSOnepochFit
{
public static void test_onebatch_fit() throws Exception
{
int numRows = 28;
int numColumns = 28;
int outputNum = 10; // number of output classes
int batchSize = 150; // batch size for each epoch
int rngSeed = 123; // random number seed for reproducibility
int iteration=400;
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
DataSetIterator train;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(rngSeed)
.updater(new Nesterovs(0.006, 0.9))
.l2(1e-4)
.list()
.layer(new DenseLayer.Builder() //create the first, input layer with xavier initialization
.nIn(numRows * numColumns)
.nOut(1000)
.activation(Activation.RELU)
.weightInit(WeightInit.XAVIER)
.build())
.layer(new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
.nIn(1000)
.nOut(outputNum)
.activation(Activation.SOFTMAX)
.weightInit(WeightInit.XAVIER)
.build())
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.setListeners(new ScoreIterationListener(1));
train = new MnistDataSetIterator(batchSize, true, rngSeed);
if (train.asyncSupported())
{
mnistTrain = new AsyncDataSetIterator(train, Math.min(Nd4j.getAffinityManager().getNumberOfDevices() * 2, 2), true);
}
else
{
mnistTrain = train;
}
mnistTrain.reset();
for(int i=0;i<iteration;i++)
{
if(mnistTrain.hasNext())
{
model.fit(mnistTrain.next());
}
else
{
if (train.asyncSupported())
{
mnistTrain = new AsyncDataSetIterator(train, Math.min(Nd4j.getAffinityManager().getNumberOfDevices() * 2, 2), true);
}
else
{
mnistTrain = train;
}
}
model.score();
}
Evaluation eval = model.evaluate(mnistTest);
System.out.println(eval.stats());
}
public static void test_onepoch_fit() throws Exception
{
int numRows = 28;
int numColumns = 28;
int outputNum = 10; // number of output classes
int batchSize = 150; // batch size for each epoch
int rngSeed = 123; // random number seed for reproducibility
int numEpochs = 1; // number of epochs to perform
DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(rngSeed)
.updater(new Nesterovs(0.006, 0.9))
.l2(1e-4)
.list()
.layer(new DenseLayer.Builder() //create the first, input layer with xavier initialization
.nIn(numRows * numColumns)
.nOut(1000)
.activation(Activation.RELU)
.weightInit(WeightInit.XAVIER)
.build())
.layer(new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
.nIn(1000)
.nOut(outputNum)
.activation(Activation.SOFTMAX)
.weightInit(WeightInit.XAVIER)
.build())
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.setListeners(new ScoreIterationListener(1));
model.fit(mnistTrain, numEpochs);
Evaluation eval = model.evaluate(mnistTest);
System.out.println(eval.stats());
}
public static void main(String[] args) throws Exception
{
test_onebatch_fit();
}
}
If you haven’t been resetting your dataset iterator, it means that your training didn’t actually see the same data for training. So a difference isn’t unexpected behavior in that case.
You can also see now why we suggest to use the fit(DataSetIterator, int) signature instead of rolling your own loop. It is easy to forget about things like this.