Load Data Set Asynchronously From Data Base

Hi,
I am trying to figure out a way to find a solution to what I think would be a common problem:

My training data is stored in a data base, and I want to avoid loading the full data set into memory. Also, I don’t need to preprocess the data anymore. (So I don’t really need datavec)

My approach would therefore be to implement a AsyncDataSetIterator, however, that is apparently not the way to go, as Adam Gibson pointed out here:

What would be the way to go here, or is a custom implementation of AsyncDataSetIterator indeed best to use here?

In principle all you need to do is find or implement an appropriate RecordReader. DataSetIterators build from record readers by default load the data asynchronously.

As someone who has implemented several custom data set iterators, I can tell you that there are very few reasons to actually do it. And when you’re doing it, there are plenty of ways that you can mess it up.

As you’ve got your data in a database, I guess you will be able to connect to it via JDBC, so in principle you don’t even need a custom record reader, because a JDBC Record Reader already exists:

It isn’t exactly the best documented thing, but there are a few examples of using it in the tests:
https://github.com/eclipse/deeplearning4j/blob/master/datavec/datavec-jdbc/src/test/java/org/datavec/api/records/reader/impl/JDBCRecordReaderTest.java#L307-L311

To create a dataset iterator from it, you use the same RecordReaderDataSetIterator as with any other record reader and you’ll get a well working iterator without all the headache.