I’ve a CSV data with nulls both in training as well as testing set. Hence this requires imputation in the data and then I would like to remove outliers using z score for columns with continuous data in training set.
Now the challenge is if datavec is matured enough to handle such process.
I need to use spark dataframes which will surely support these tasks. This way the challenge comes in convertion to a data format dl4j understands, could be a dataset.
Please suggest a way to handle such data pre processing for a dl4j application.
A workaround to the challenge with second approach would be to write the spark dataframe to CSV and load it back in dl4j via CSVRecordReader. But this would break the flow of data in the application. Hence, also suggest if this is how a data science application handles data in production.
Thanks and Regards