Retrieving specific columns

I need to access multiple csv files but only specific columns. Please look at my buildDataSet() method.

Is my current approach considered normal according to the api?

Is their an alternate approach better than this?

Right now the datasetiterator will return false on hasnext and I dont know how best to get the data out of it. The datasetiterator class obscures my knowledge. How can I better debug it and fin out what happening inside?

java.lang.NumberFormatException: For input string: "510800.0
112.110001"



public Classifier()
{

    try{

        ArrayList<DataSetIterator> dataIters = getDataSets();

        ArrayList<DataNormalization> normies = new ArrayList<>();

        for (DataSetIterator di :
                dataIters) {

            DataNormalization norm = new NormalizerStandardize();
            norm.fit(di);
            di.setPreProcessor(norm);
            normies.add(norm);
            di.reset();

        }


        MultiLayerNetwork net = buildNetwork(dataIters.get(0).inputColumns());

        for (int i = 0; i < dataIters.size(); i++) {

            while (dataIters.get(i).hasNext()){

                //INDArray out = dataIters.get(i).next().getLabels();
                //INDArray outt = dataIters.get(i).next().getLabels();
               //net.fit(out, outt);
            }



        }

        //Get top half of encoder
        TransferLearning.Builder builder = new TransferLearning.Builder(net)
                .removeLayersFromOutput(3);
        MultiLayerNetwork newNet = builder.build();







    }catch (Exception e){
        e.printStackTrace();
    }




}


private ArrayList<DataSetIterator> getDataSets() throws Exception
{

    //ArrayList<String> names = FileManager.getStocks();
    ArrayList<String> names = new ArrayList<>();names.add("zbra"); //tester

    ArrayList<DataSetIterator> out = new ArrayList<>();

    for (String s :
            names) {
        out.add(buildDataSet(Constants.dataPath + s + ".csv"));
    }

    return out;
}


private DataSetIterator buildDataSet(String path) throws Exception
{

    //year, month, day, high, low, close, adjclose, volume

    Schema inputSchema = new Schema.Builder()
            .addColumnInteger("year")
            .addColumnInteger("month")
            .addColumnInteger("day")
            .addColumnDouble("high")
            .addColumnDouble("low")
            .addColumnDouble("close")
            .addColumnDouble("adjclose")
            .addColumnDouble("volume")
            .addColumnString("nothing")
            .build();

    TransformProcess tp = new TransformProcess.Builder(inputSchema)
            .removeColumns("year", "month", "day", "adjclose", "nothing")
            .build();

        List<String> in = FileManager.retreiveFullCSV(path);
        List<List<Writable>> almost = new ArrayList<>();
        StringToWritablesFunction wt = new StringToWritablesFunction(new CSVRecordReader());

        for (String s :
                in) {
            almost.add(wt.apply(s));
        }

        List<List<Writable>> processedData = LocalTransformExecutor.execute(almost, tp);


        StringBuilder sb = new StringBuilder();

    for (List ll:
         processedData) {

        sb.append(new WritablesToStringFunction(",").apply(ll));
        sb.append(",");

    }
    sb.setLength(sb.length()-1);


        StringSplit sp = new StringSplit(sb.toString());

        RecordReader recordReader = new CSVRecordReader();
        recordReader.initialize(sp);

        DataSetIterator iterator = new RecordReaderDataSetIterator(recordReader, 64);

        return iterator;
}

It is usually better to not build a DataSetIterator on your own, it is very easy to do something wrong there.

If you want to merge data from multiple sources, it is usually the easiest to build a MultiDataSet using a RecordReaderMultiDataSetIterator that just reads from multiple record readers.

Another option is to Join the data. There is an example for joins here: deeplearning4j-examples/JoinExample.java at master · eclipse/deeplearning4j-examples · GitHub
Note that the example uses some spark specific features, you would have to rework this a bit to use the local executor to run it without spark.

I tried following the examples as close as possible. But I am getting an error when I call iterator.inputcolum() which is called to help the lstms first layer initialize. If I remove that I get the same error on fit(). Howver iterator.reset() works just fine.

My features input is a csv fie with many rows of 4 column data
My label file is matching in length with a single column of label

Cannot reset iterator - reset not supported (resetSupported() == false): one or more underlying (sequence) record readers do not support resetting

    String fromPath = Constants.dataPath + ticker + ".csv";
    String featureToPath = Constants.featurePath + ticker + ".csv";
    String labelToPath = Constants.labelPath + ticker + "_label" + ".csv";

    createSimpleDataSet(fromPath, featureToPath, deleteRow.last);
    createSeqSingleLabelDataSet(fromPath, labelToPath, deleteRow.first);

    FileSplit feat = new FileSplit(new File(featureToPath));
    FileSplit label = new FileSplit(new File(labelToPath));

    SequenceRecordReader featureReader = new CSVSequenceRecordReader();
    featureReader.initialize(feat);

    SequenceRecordReader labelReader = new CSVSequenceRecordReader();
    featureReader.initialize(label);

    DataSetIterator it = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, batch, 1);

made an idiot mistake. wasted a day.