I wanted a way to split a datasetIterator and couldn’t find it. Did I miss something? Based on my understanding this is assumed to be done earlier in pipeline so its not possible with api.
Any thoughts on my approach below? I implemented a dataset iterator but only return a partial dataset. This cuts off the last portion of my sequence data. Si the iterator returns every file but each file si split but some double.
If I wanted to perform a regular file split I suppose I would have to create a similar wrapper and change the reset functionality to only iterate through a portion. As I see it there is no way to reach the iterators recordreaderlist and change the file list itself.
public class SplitIterator implements DataSetIterator {
private double split; //what percent of data is for training only
boolean training;
DataSetIterator iterator;
public SplitIterator(DataSetIterator iter, double split, boolean training)
{
iterator = iter;
this.split = split;
this.training = training;
}
//the normal next method are colld in training so send out training dataset here
@Override
public boolean hasNext() {
return iterator.hasNext();
}
@Override
public DataSet next() {
if (training){
return FormattedDataRequester.splitDataSet(iterator.next(), split).get(0);
}else{
return FormattedDataRequester.splitDataSet(iterator.next(), split).get(1);
}
}
@Override
public void reset() {
iterator.reset();
}