Upcoming removal of modules and roadmap changes

Hello all,

Just as a warning for the upcoming release. Many modules will be removed as they not maintained anymore.
The high level modules removed include:

  1. Arbiter
  2. Jumpy
  3. Many datavec modules that didn’t really have users outside of the computer vision module.
  4. Deep learning4j built in tokenizers for different languages. This contained a lot of 3rd party code that wasn’t really being maintained. We will address this problem via other libraries while still providing the same standard interfaces.
  5. Nd4s: We weren’t really maintaining it and we saw very sparse interest. Scala has also declined in recent years. However, we are heavily interested in kotlin as a much better and simpler way of adding syntatic sugar to the framework.

Documentation:
Many new improved documents/tutorials based on what users care about will be in the works soon. We’re aware the documentation needs improvements (especially for just the basic features like samediff and some of the lesser known modules)

New CI Infrastructure:
We spend the last few weeks migrating to github actions from our own custom jenkins. It’s still taking some time to get through all the kinks yet. The good news is snapshots are now being published from there. Please reach out if there are any issues with the binaries before the release. One community member already gave me some great feedback.

The framework’s focus is moving away from being broader down to a few use cases people have:

  1. Model import and retraining. Major focus will be on TF, onnx and keras.
  2. Easier deployment of applications. This will include integration with graalvm allowing self contained binaries for ease of deployment.
  3. Spark. Spark 3 will be coming, but it’s not as big of a priority right now. If anyone has an immediate use case, feel free to reach out to me.
  4. Mobile/embedded: Right now, we can run on mobile but it’s been pointed out that binary sizes are too large. We would like to fix that. We have a minifier available that given a samediff graph is capable of outputting a binary with just the ops needed rather than everything. This unfortunately isn’t very friendly or well documented. We’ll work on making this (and other aspects of ways of addressing this problem) better during the next release.
  5. Easy python execution: With python4j and javacpp we’ll make it easy to run python applications in a java environment.
  6. Onnx runtime runner: Under nd4j-onnxruntime you will have the ability to leverage nd4j arrays but onnxruntime for execution.
  7. TVM Thanks to @saudet we now support running TVM as well. This will allow people to leverage DL compilers. This feature is not fully baked yet, but please do ask about it if you have a specific use case. This is for people who want faster performance.
  8. TF runner: Many people don’t know this exists so I’ll highlight it here. We also have a TF runner allowing the usage of TF java with nd4j arrays. This is for cases where model import may have an edge case, but you still want ndarray usage.

Re namespacing:
An upcoming re namespace: We will be moving to the org.eclipse namespace soon. However, for this release for backwards compatibility, we will publish modules of org.dataec,org.deeplearning4j, and org.nd4j one final time. Afterwards, we will encourage people to migrate to the newer namespace.

Next release expected features highlights:

  1. Onnx import
  2. A rewrite of the model import framework for TF (also extended to onnx)
  3. CTC loss for audio use cases
  4. Many performance improvements
  5. Inclusion of armcompute for faster routines on ARM processors
  6. nd4j-tvm: Again thanks to @saudet for adding a way of running tvm from java.

Why this didn’t go out sooner:
Mostly spending time regrouping, the project has recently lost resources and had to be stripped down to what customer use cases we had. Those are the ones you see before you here. The project sprawl was also fairly large due to years of adding features. In lieu of the 1.0 release, time was better spent on cutting things down and serving customers.

As mentioned, documentation is to come and I’m happy to take requests from stakeholders on what people find interesting. We’re happy to welcome contributions and input from various companies…

More will be released later, but since I’m seeing questions about the upcoming release I figured I would address at least some of the concerns and open the floor to questions.

7 Likes

Minor update. We managed to find a good way to deal with the avx classifiers. We now publish classifiers that allow optional usage of the various cnn optimized libraries for each platform in the form of:
helper name - optimization type

This will allow people to pick whether they want cudnn optimized routines or mkldnn as well as whether they want certain other optimizations.
Please find the snapshots here for this:
https://oss.sonatype.org/content/repositories/snapshots/org/nd4j/nd4j-native/1.0.0-SNAPSHOT/

Scroll down to the bottom for the latest snapshots.

Is this in version M2.1 and if so, does it use Microsoft onnxruntime by default or do I need to specify it?

@craig88 no this is just an extra module that allows people to use onnxruntime with nd4j ndarrays as a data transfer mechanism (similar to numpy in python land)
We have a similar module for tvm and tensorflow 1.x