"Correct" way to feed dataset to neural network for training

EquanimeAugello · June 5, 2023, 11:04pm

Hello,
the neural network I’m implementing has an Integer 2D (square) array as input (on which applies convolution) and a Double as output.
My dataset is now a CSV file with each line composed by n integers and 1 double (i.e., each line contains the flatten version of the input array followed by the double of the desired output). The dataset is meant to grow quite large.
What would be a good way to feed the data into the network to train it?
As far as I’ve understood I should try to use a RecordReader, but I’m not sure how to join convolution with the CSVRecordReader, doing the intermediate flat to 2D array transformation. Moreover, should I treat those Integers as Double for uniformity of ND4j type?
Thank you kindly in advance

agibsonccc · June 6, 2023, 8:42am

@EquanimeAugello you would use the ImageRecordeder plus RecordReaderDataSetIterator. Here’s some tests covering different cases: deeplearning4j/TestImageRecordReader.java at 1b595d363685d979096d0ffff42f5b36d3ef6c24 · deeplearning4j/deeplearning4j · GitHub

EquanimeAugello · June 6, 2023, 12:31pm

Thank you very much.
Though, it’s still not clear to me how I should feed my data to ImageRecordReader, since my data isn’t images but instead a single CSV file where each line constitute an input-output data-piece. The constructors don’t seem to me to give this possibility, unless I’m mistaken.
Thank you

agibsonccc · June 6, 2023, 8:56pm

@EquanimeAugello sorry I’m not clear what you’re trying to do exactly. Rather than me piecing together everything could you describe your problem and input data?
What do you have CSV? Images?

EquanimeAugello · June 7, 2023, 9:41am

Thank you. Yes I’ll try to be more clear: my input is a CSV like this
int1,int2,int3,…,intN,double
int1,int2,int3,…,intN,double
…
Each line of the csv is a complete data-piece. The first N integers are the flatten version of a 2D square matrix, that represents a single input to the first convolutional layer of the network. The final double of each line represents the desired output given as input the matrix. What I actually want to do, as usual, is to give batches of (lines of) data to the network during training, i.e. an array of matrices as input, and an array of double as output, constructed from the CSV file. I had actually implemented a custom dataSetIterator that managed to do this, but I’ve seen it is not advised as good practice(?).

agibsonccc · June 7, 2023, 10:10am

@EquanimeAugello not normally but for your case it’s fairly unusual so it’s advisble. The only thing I’d say is you could also implement a custom record reader instead. That would allow you to leverage most of the extra infrastructure we wrote in that iterator.

EquanimeAugello · June 22, 2023, 6:21pm

Sorry for replying so late: thank you very much.

Topic		Replies	Views
How To feed model with data from memory DataVec	4	405	August 18, 2021
Problem in loading input of 1D CNN for regression DataVec	6	1086	March 19, 2020
Image Inputs for ImageRecordReader DataVec	6	1384	April 26, 2020
Reading Audio files into a CNN network DataVec	17	397	November 14, 2023
Read CSV file to using in RNN DL4J	0	372	October 6, 2020

"Correct" way to feed dataset to neural network for training

Related topics