Creating a DataIterator for the Google Quick, Draw! data set

Hi there, I’m just getting started with dl4j. After going through some of the examples I thought I would have a go at implementing a simple convolutional network to classify a subset of images from the Google Quick, Draw! Dataset. The DataIterators I have been using from the examples are customised for specific datasets (MNIST, CIFAR etc.).

How would I create a custom DataIterator that converts the bunch of .npy files into a format that I can load into my network. More importantly, is this the correct approach?

I dont know of a .npy or json iterator that comes with the library. A quick internet search shows that there may be some options but they are slim. If you do end up making one please post it here as I didnt know about the dataset you posted but may play with it mysef. Yes your on the right path IMO

If you do try and build it I have two thought things for you

Here is a datasetiterator for images that is rather simple. I use it for all kinds of image datsets instead of the ones you mentioned above

public DataSetIterator getIter()
{
try{
ImageRecordReader recordReader = new ImageRecordReader(height,width, 3);
recordReader.initialize(new FileSplit(new File(source)));

        DataSetIterator dataSetIterator = new RecordReaderDataSetIterator(recordReader, minibatch);

        DataNormalization scaler = new ImagePreProcessingScaler(0,1);
        scaler.fit(dataSetIterator);
        dataSetIterator.setPreProcessor(scaler);


        return dataSetIterator;

    }catch (Exception e){
        e.printStackTrace();
    }

    return null;


}

second here is a really simple class that acts as an iterator. I use it to provide random arrays of a certain shape. But you could write a simple class that converts the .npy or json file and puts it into the iter.next method. If the classes are labeled the you could return a dataset object instead of an array and use dataset.setfeatures and dataset.setlabels instead

private class GaussianIterator implements Iterator<INDArray> {

    int width;
    int height;

    public GaussianIterator(int w, int h){
        width = w;
        height = h;
    }

    public void setWidth(int w){
        width = w;
    }

    public void setHeight(int h)
    {
        height = h;
    }


    @Override
    public boolean hasNext() {
        return true;
    }

    @Override
    public INDArray[] next() {
        return new INDArray[] {getArr()};
    }


    private INDArray getArr(){

        INDArray ind = Nd4j.rand(minibatch,512,1,1);
        //ind = ind.mul(255);
        return ind;

    }

Just of note, I answered this over here:

Summary, focus on using the create from npy call and build a datasetiterator around that.