INDArray in ND4J 1.0.0-M2.1 slower on Android than 1.0.0-beta6

Hello. I wrote a neural network class SCN (Stochastic Configuration Network) on Kotlin in an Android application for research purposes. The n-dimensional arrays from ND4J were very useful to me. Initially, I imported DL4J (and, accordingly, ND4J) version 1.0.0-beta6 into the project. I trained the network on the MNIST dataset (60k images with a size of 28x28). Then I updated the library version to 1.0.0-M2.1. Training began to take almost twice as long. Moreover, this is not an accident, depending on my phone. Repeated experiments on a smartphone and an emulator confirm a twofold slowdown in the learning rate. What could this be related to? I want to use latest version because of such functions as qr decomposition and triangular solve. These functions not present in 1.0.0-beta6.

class SCN(  
    private val lambdas: DoubleArray,  
    private val max_neurons: Int,  
    private val reconfig_number: Int  
) {  
  
  
    fun forward(X: INDArray, Y: INDArray): LearningResult {  
  
        val d = X.columns()  
        var e = Y.dup()  
  
        val H = Nd4j.zeros(X.rows(), max_neurons)  
        val W = Nd4j.zeros(d, max_neurons)  
        val b = Nd4j.zeros(max_neurons)  
        var beta = Nd4j.zeros(max_neurons)  
  
        val W_random_lst = Array<INDArray?>(lambdas.size) {null}  
  val b_random_lst = Array<INDArray?>(lambdas.size) {null}  
  
  for (k in 0..<max_neurons) {  
  
            for ((index, L) in lambdas.withIndex()) {  
                val WL = (Nd4j.rand(d, reconfig_number).muli(2).subi(1)).muli(L)  
                val bL = (Nd4j.rand(1, reconfig_number).muli(2).subi(1)).muli(L)  
                W_random_lst[index] = WL  
                b_random_lst[index] = bL  
            }  
            val W_random = Nd4j.hstack(*W_random_lst)  
            val b_random = Nd4j.hstack(*b_random_lst)  
  
            val h = Transforms.sigmoid(X.mmul(W_random).addiRowVector(b_random))  
  
            val v_values = Transforms.pow((e.transposei().mmul(h)), 2).diviRowVector(h.muli(h).sum(0))  
            val v_values_tmp = v_values.mean(0)  
            val best_idx = v_values_tmp.argMax().getLong(0)  
            val h_c = h.getColumn(best_idx)  
            H.putColumn(k,h_c)  
  
            val W_c = W_random.getColumn(best_idx)  
            val b_c = b_random.getScalar(best_idx).getFloat(0)  
            W.putColumn(k, W_c)  
            b.putScalar(k.toLong(), b_c)  
  
            val H_k = H.getColumns(*IntArray(k + 1) { it })  
            beta = InvertMatrix.invert((H_k.transpose().mmul(H_k)), true).mmul(H_k.transpose()).mmul(Y)  
            val y = H_k.mmul(beta)  
            e = Y.sub(y)  
            val rmse = kotlin.math.sqrt((e.mul(e).meanNumber().toDouble()))  
            Log.d("SCN_TEST", "$k $rmse")  
        }  
  
        return LearningResult(W, b, beta)  
    }  
  
}

data class LearningResult(  
    val W: INDArray,  
    val b: INDArray,  
    val beta: INDArray  
)

My build.gradle file with new library version:

plugins {  
  alias(libs.plugins.android.application)  
    alias(libs.plugins.kotlin.android)  
}  
  
android {  
  packagingOptions {  
  exclude 'META-INF/native-image/**/**.json'  
  exclude 'META-INF/native-image/*.json'  
  pickFirst 'nd4j-native.properties'  
  }  
  
  namespace 'com.example.scndl4j'  
  compileSdk 35  
  
  defaultConfig {  
  applicationId "com.example.scndl4j"  
  minSdk 24  
  targetSdk 35  
  versionCode 1  
  versionName "1.0"  
  
  testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"  
  }  
  
  buildTypes {  
  release {  
  minifyEnabled false  
  proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'  
  }  
 }  compileOptions {  
  sourceCompatibility JavaVersion.VERSION_1_8  
        targetCompatibility JavaVersion.VERSION_1_8  
    }  
  kotlinOptions {  
  jvmTarget = '1.8'  
  }  
  
  packagingOptions {  
  exclude 'META-INF/native-image/**/**.json'  
  exclude 'META-INF/native-image/*.json'  
  pickFirst 'nd4j-native.properties'  
  }  
}  
  
configurations {  
  javacpp  
}  
  
task javacppExtract(type: Copy) {  
  dependsOn configurations.javacpp  
  
    from { configurations.javacpp.collect { zipTree(it) } }  
  include "lib/**"  
  into "$buildDir/javacpp/"  
  android.sourceSets.main.jniLibs.srcDirs += ["$buildDir/javacpp/lib/"]  
  
    tasks.getByName('preBuild').dependsOn javacppExtract  
}  
  
dependencies {  
  
  def dl4jVersion = '1.0.0-M2'  
  def openblasVersion = '0.3.19-1.5.7'  
  def opencvVersion = '4.5.5-1.5.7'  
  def leptonicaVersion = '1.82.0-1.5.7'  
  
  implementation fileTree(dir: 'libs', include: ['*.jar'])  
  
    implementation(group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: dl4jVersion) {  
  exclude group: 'org.bytedeco', module: 'opencv-platform'  
  exclude group: 'org.bytedeco', module: 'leptonica-platform'  
  exclude group: 'org.bytedeco', module: 'hdf5-platform'  
  exclude group: 'org.nd4j', module: 'nd4j-base64'  
  exclude group: 'org.nd4j', module: 'nd4j-api'  
  }  
  
  implementation  group: 'org.nd4j', name: 'nd4j-native', version: dl4jVersion  
    implementation  group: 'org.nd4j', name: 'nd4j-native', version:  dl4jVersion, classifier: "android-arm"  
  implementation  group: 'org.nd4j', name: 'nd4j-native', version:  dl4jVersion, classifier: "android-arm64"  
  implementation  group: 'org.nd4j', name: 'nd4j-native', version:  dl4jVersion, classifier: "android-x86"  
  implementation  group: 'org.nd4j', name: 'nd4j-native', version:  dl4jVersion, classifier: "android-x86_64"  
  implementation  group: 'org.bytedeco', name: 'openblas', version: openblasVersion  
    implementation  group: 'org.bytedeco', name: 'openblas', version: openblasVersion, classifier: "android-arm"  
  implementation  group: 'org.bytedeco', name: 'openblas', version: openblasVersion, classifier: "android-arm64"  
  implementation  group: 'org.bytedeco', name: 'openblas', version: openblasVersion, classifier: "android-x86"  
  implementation  group: 'org.bytedeco', name: 'openblas', version: openblasVersion, classifier: "android-x86_64"  
  implementation  group: 'org.bytedeco', name: 'opencv', version: opencvVersion  
    implementation  group: 'org.bytedeco', name: 'opencv', version: opencvVersion, classifier: "android-arm"  
  implementation  group: 'org.bytedeco', name: 'opencv', version: opencvVersion, classifier: "android-arm64"  
  implementation  group: 'org.bytedeco', name: 'opencv', version: opencvVersion, classifier: "android-x86"  
  implementation  group: 'org.bytedeco', name: 'opencv', version: opencvVersion, classifier: "android-x86_64"  
  implementation  group: 'org.bytedeco', name: 'leptonica', version: leptonicaVersion  
    implementation  group: 'org.bytedeco', name: 'leptonica', version: leptonicaVersion, classifier: "android-arm"  
  implementation  group: 'org.bytedeco', name: 'leptonica', version: leptonicaVersion, classifier: "android-arm64"  
  implementation  group: 'org.bytedeco', name: 'leptonica', version: leptonicaVersion, classifier: "android-x86"  
  implementation  group: 'org.bytedeco', name: 'leptonica', version: leptonicaVersion, classifier: "android-x86_64"  
  
  annotationProcessor group: 'org.projectlombok', name: 'lombok', version: '1.18.4'  
  
  
  implementation(platform(libs.okhttp.bom))  
    implementation(libs.okhttp)  
    implementation(libs.logging.interceptor)  
  
    implementation libs.kotlinx.coroutines.android  
    implementation libs.androidx.core.ktx  
    implementation libs.androidx.appcompat  
    implementation libs.material  
    implementation libs.androidx.activity  
    implementation libs.androidx.constraintlayout  
}

I also used android:extractNativeLibs=“true” in manifest file. Without this new version not works at all.

@panassevich do you have some sort of profiler activity we can see?
Your gradle build is also showing 1.0.0-M2 not M2.1.

I can’t really just magically guess what could be the issue. The code base is huge and could (just like any other software) be slow for a myriad of reasons.