Does anybody have any idea of what sort of classification speed one can expect using DL4J?
I have a single-hidden-LSTM-layer RNN doing sentiment analysis (heavily inspired by this example) of tweets with the Cuda-10.1-backend (Without cudNN, working on getting that installed but I have limited privileges on the machine) and two Tesla P100-16GB GPU’s. Classifying using
net.output(); I get a throughput of about 100 tweets processed per second. This is way lower than I was hoping for, as I achieved a throughput of 15k tweets per second using a CPU-based implementation of Naive Bayes last semester.
Why am I using
net.output(); instead of a DataSetIterator you might ask. I am using the network in a streaming context and not on a static dataset.
Does anybody have any experience trying to make NN’s faster and more scalable? Would greatly appreciate any nudge in the right direction.