Retrieving specific columns

cagneymoreau · July 8, 2020, 1:03am

I need to access multiple csv files but only specific columns. Please look at my buildDataSet() method.

Is my current approach considered normal according to the api?

Is their an alternate approach better than this?

Right now the datasetiterator will return false on hasnext and I dont know how best to get the data out of it. The datasetiterator class obscures my knowledge. How can I better debug it and fin out what happening inside?

java.lang.NumberFormatException: For input string: "510800.0
112.110001"



public Classifier()
{

    try{

        ArrayList<DataSetIterator> dataIters = getDataSets();

        ArrayList<DataNormalization> normies = new ArrayList<>();

        for (DataSetIterator di :
                dataIters) {

            DataNormalization norm = new NormalizerStandardize();
            norm.fit(di);
            di.setPreProcessor(norm);
            normies.add(norm);
            di.reset();

        }


        MultiLayerNetwork net = buildNetwork(dataIters.get(0).inputColumns());

        for (int i = 0; i < dataIters.size(); i++) {

            while (dataIters.get(i).hasNext()){

                //INDArray out = dataIters.get(i).next().getLabels();
                //INDArray outt = dataIters.get(i).next().getLabels();
               //net.fit(out, outt);
            }



        }

        //Get top half of encoder
        TransferLearning.Builder builder = new TransferLearning.Builder(net)
                .removeLayersFromOutput(3);
        MultiLayerNetwork newNet = builder.build();







    }catch (Exception e){
        e.printStackTrace();
    }




}


private ArrayList<DataSetIterator> getDataSets() throws Exception
{

    //ArrayList<String> names = FileManager.getStocks();
    ArrayList<String> names = new ArrayList<>();names.add("zbra"); //tester

    ArrayList<DataSetIterator> out = new ArrayList<>();

    for (String s :
            names) {
        out.add(buildDataSet(Constants.dataPath + s + ".csv"));
    }

    return out;
}


private DataSetIterator buildDataSet(String path) throws Exception
{

    //year, month, day, high, low, close, adjclose, volume

    Schema inputSchema = new Schema.Builder()
            .addColumnInteger("year")
            .addColumnInteger("month")
            .addColumnInteger("day")
            .addColumnDouble("high")
            .addColumnDouble("low")
            .addColumnDouble("close")
            .addColumnDouble("adjclose")
            .addColumnDouble("volume")
            .addColumnString("nothing")
            .build();

    TransformProcess tp = new TransformProcess.Builder(inputSchema)
            .removeColumns("year", "month", "day", "adjclose", "nothing")
            .build();

        List<String> in = FileManager.retreiveFullCSV(path);
        List<List<Writable>> almost = new ArrayList<>();
        StringToWritablesFunction wt = new StringToWritablesFunction(new CSVRecordReader());

        for (String s :
                in) {
            almost.add(wt.apply(s));
        }

        List<List<Writable>> processedData = LocalTransformExecutor.execute(almost, tp);


        StringBuilder sb = new StringBuilder();

    for (List ll:
         processedData) {

        sb.append(new WritablesToStringFunction(",").apply(ll));
        sb.append(",");

    }
    sb.setLength(sb.length()-1);


        StringSplit sp = new StringSplit(sb.toString());

        RecordReader recordReader = new CSVRecordReader();
        recordReader.initialize(sp);

        DataSetIterator iterator = new RecordReaderDataSetIterator(recordReader, 64);

        return iterator;
}

treo · July 8, 2020, 8:25am

It is usually better to not build a DataSetIterator on your own, it is very easy to do something wrong there.

If you want to merge data from multiple sources, it is usually the easiest to build a MultiDataSet using a RecordReaderMultiDataSetIterator that just reads from multiple record readers.

Another option is to Join the data. There is an example for joins here: deeplearning4j-examples/JoinExample.java at master · eclipse/deeplearning4j-examples · GitHub
Note that the example uses some spark specific features, you would have to rework this a bit to use the local executor to run it without spark.

cagneymoreau · July 9, 2020, 3:38pm

I tried following the examples as close as possible. But I am getting an error when I call iterator.inputcolum() which is called to help the lstms first layer initialize. If I remove that I get the same error on fit(). Howver iterator.reset() works just fine.

My features input is a csv fie with many rows of 4 column data
My label file is matching in length with a single column of label

Cannot reset iterator - reset not supported (resetSupported() == false): one or more underlying (sequence) record readers do not support resetting

    String fromPath = Constants.dataPath + ticker + ".csv";
    String featureToPath = Constants.featurePath + ticker + ".csv";
    String labelToPath = Constants.labelPath + ticker + "_label" + ".csv";

    createSimpleDataSet(fromPath, featureToPath, deleteRow.last);
    createSeqSingleLabelDataSet(fromPath, labelToPath, deleteRow.first);

    FileSplit feat = new FileSplit(new File(featureToPath));
    FileSplit label = new FileSplit(new File(labelToPath));

    SequenceRecordReader featureReader = new CSVSequenceRecordReader();
    featureReader.initialize(feat);

    SequenceRecordReader labelReader = new CSVSequenceRecordReader();
    featureReader.initialize(label);

    DataSetIterator it = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, batch, 1);

cagneymoreau · July 10, 2020, 8:51pm

made an idiot mistake. wasted a day.

Topic		Replies	Views
How to fix problems with DataSetIterators? DL4J	3	435	December 23, 2020
Problem in loading input of 1D CNN for regression DataVec	6	1092	March 19, 2020
Correct resetable DataSetIterator for a Dataset? DL4J	0	378	June 17, 2021
Modifying UCI Example DL4J	22	1816	May 21, 2020
How should I analyze the MnistDataSetIterator?	8	660	December 8, 2020

Retrieving specific columns

Related topics