Thanks for your answer, that helps! AsyncDataSetIterator caches, which could be disastrous when I override the underlaying DataBuffer with “put”. My question was more focused on the model and if there is any data (pointer) stored which I then override in the subsequent next() call.
But to the general context, I am happy to share some details and get ideas how to improve what I want to do:
The iterator is derived from the RNN version of it and flattens n-timesteps as features.
What I feed as data is a matrix of [ features | label ] with the past, let’s say 1 million rows (~2 years of data, which is stock prices & indicators, etc.). I did notice a big memory issue, using the GC to clean that up has a very negative impact on performance.
Below you find the RNN version of the iterator (just the relevant “next” part).
P.S: What it compares to with is the SequenceRecordReaderDataSetIterator from DataVec. But again: performance was the reason why I derived my own Iterator.
public DataSet next(int num) {
if (num != miniBatchSize) {
throw new RuntimeException("num must equal batchSize");
}
var countDownLatch = new CountDownLatch(num);
for (int i = 0; i < num; i++) {
long dataOffset = (i + cursor) * cols;
DataBuffer buff = Nd4j.createBuffer(data.data(), dataOffset, cols * windowSize);
dataOffset = currentI + cursor; //masks just have 1 column
DataBuffer maskBuffIn = Nd4j.createBuffer(maskIn.data(), dataOffset, windowSize);
DataBuffer maskBuffOut = Nd4j.createBuffer(maskOut.data(), dataOffset, windowSize);
// iteration contains features as well as labels
INDArray iteration = Nd4j.create(buff, new long[]{windowSize, cols});
INDArray maskIterationIn = Nd4j.create(maskBuffIn, new long[]{windowSize});
INDArray maskIterationOut = Nd4j.create(maskBuffOut, new long[]{windowSize});
INDArray in = iteration.get(NDArrayIndex.all(), NDArrayIndex.interval(0, numInputFeatures));
in.setOrder('f');
allIn.tensorAlongDimension(i, 1, 2).permutei(1, 0).assign(in);
if (num == 1) {
allMaskIn.tensorAlongDimension(i, 1).assign(maskIterationIn);
} else {
allMaskIn.tensorAlongDimension(i, 1).permutei(0).assign(maskIterationIn);
}
INDArray out = iteration.get(NDArrayIndex.all(), NDArrayIndex.interval(numInputFeatures, cols));
out.setOrder('f');
allOut.tensorAlongDimension(i, 1, 2).permutei(1, 0).assign(out);
if (num == 1) {
allMaskOut.tensorAlongDimension(i, 1).assign(maskIterationOut);
} else {
allMaskOut.tensorAlongDimension(i, 1).permutei(0).assign(maskIterationOut);
}
}
DataSet dataSet = new DataSet(allIn, allOut, allMaskIn, allMaskOut);
cursor += num; //moves on to next time window
return dataSet;
}