I just wondered with the announced/massive push of Apple towards M4 Silicon chips, containing faster and more neural core, if there are any plans to support M(1-4) silicon neural engines? Is that in fact possible? I presume that entails writing an ND4J back-end
@kgoderis yes more or less. What you’d probably want to do is add c++ kernels with a platform specification like we do with cudnn. Integrating native routines for specific platforms would be the easiest way to handle this.
I intend to do this at large scale with some of the more cutting edge transformer kernels to avoid reinventing the wheel as well.