Importing a pytorch model

Great, thank you!
I know it may be a frustrating question, but how long do you expect until the next release?

@mdebeer probably a few weeks at the latest. You can use snapshots when this gets merged though.

Excellent, thank you. I’ve subscribed to that PR to get updates, so will try the snapshot when it’s merged.

@mdebeer great, will ping you if I merge that work without doing this. It will depend on how big adding the next few target ops is. (There’s around 5 to 7 I wanted to get done) Thanks for your patience!

1 Like

Hi Adam, I see that the PR was merged to master.

I’ve tried to set up my SBT to pull the snapshots (using this page as guide). It’s not clear what the correct, latest version number should be (after 1.0.0-M1.1), but I presume it’s 1.0.0-SNAPSHOT.

With this version + sonatype repo, I can resolve "deeplearning4j-core" and "deeplearning4j-modelimport", but I can’t seem to resolve "org.nd4j" % "samediff-import-onnx".

Do you have any advice for how I can test the model import with the latest snapshot?

You should be able to resolve it, as it does exist:
https://oss.sonatype.org/content/repositories/snapshots/org/nd4j/samediff-import-onnx/1.0.0-SNAPSHOT/

What exactly does sbt tell you?

Sometimes sbt can be a bit of a pain when trying to actually pull snapshot dependencies. In that case, the easiest workaround is to create a dummy maven project, set up all the dependencies that you are missing there, and run mvn dependency:go-offline. That will download everything to your local repository, and then sbt can usually resolve everything just fine.

Aah, interesting –

So my build.sbt contains the following:

resolvers += "Snapshot Repository" at "https://oss.sonatype.org/content/repositories/snapshots",
libraryDependencies ++= Seq(
      "org.deeplearning4j" % "deeplearning4j-core" % "1.0.0-SNAPSHOT",
      "org.deeplearning4j" % "deeplearning4j-modelimport" % "1.0.0-SNAPSHOT",
      "org.nd4j" % "samediff-import-onnx" % "1.0.0-M1.1",
...)

If I change samediff-import-onnx to also pull "1.0.0-SNAPSHOT", then I get an XML parsing error …

 org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 7; The processing instruction target matching "[xX][mM][lL]" is not allowed.

Which doesn’t seem to be SBT related. When I search the error with ‘SBT’, I don’t get any helpful results. It seems unable to parse some XML result… which, heck if I know. :man_shrugging:

But your idea to use a dummy maven project to pull the dependencies does indeed seem like a good approach, so I’ll give that a try.

@mdebeer worse case scenario just installing the java dependencies from source should be fine as well.
You only need to install nd4j-api and samediff-onnx-import as the main modules for this.

The converter mainly parses protobuf and converts it to a samediff model then saves it. It’s a worse case scenario for you, but shouldn’t be nearly as bad as installing the c++ dependencies. Depending on what route you go, we can help you out though.

Please try Paul’s advice first though.

This is quite a tricky one to figure out …

I’ve created the dummy maven project, defined all the dependencies, and running mvn dependency:go-offline completes successfully and I can find the downloaded dependencies in ~/.m2/repository.

However, adding my local maven repo as a resolver (resolvers += "Local Maven Repository" at "file://" + Path.userHome.absolutePath + "/.m2/repository") still does not allow SBT to compile … because it appears coursier tries to validate the xml for samediff-import-onnx and fails specifically for that one (snapshot works for org.deeplearning4j.* and org.nd4j.nd4j-native-platform).

I’ve looked at the maven-metadata.xml, and it’s perfectly valid. I’ve compared the .pom files as well, for a working example and the samediff-import-onnx one, and the only difference is indentation depth and some extra whitespace, but it’s still valid xml … (the SBT errors appears to occur because of the following stackoverflow explanation).

Anyway – not clear how to proceed, apart from possibly taking the maven dummy project and packaging our own artifact and building from that. But this would just be for intermediate testing before the next version release, so waiting is also an option. Thought I’d at least share the symptoms / potential bug with samediff-import-onnx's xml, but maybe it’s just our SBT config?
I can post our sample build.sbt if you’re interested in debugging …

You don’t need to do that. It should automatically check the local repository as that is where it caches everything.

But resolving the artifact is probably not your actual problem, because…

This looks like it does indeed resolve it correctly, but then has issues with processing what it finds.

Can you share more of that stacktrace? In particular, I’d like to know what sub-system is throwing this, and what file it is trying to read.

@treo Thank you for the feedback.

Unfortunately, the stacktrace wasn’t very helpful to me when I worked through it:

[error] org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 7; The processing instruction target matching "[xX][mM][lL]" is not allowed.
[error]         at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
[error]         at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
[error]         at lmcoursier.internal.shaded.coursier.core.compatibility.package$.xmlParseSax(package.scala:116)
[error]         at lmcoursier.internal.shaded.coursier.maven.MavenRepository$.parseRawPomSax(MavenRepository.scala:74)
[error]         at lmcoursier.internal.shaded.coursier.maven.MavenRepository.$anonfun$findVersioning$1(MavenRepository.scala:369)
[error]         at lmcoursier.internal.shaded.coursier.util.EitherT.$anonfun$flatMap$1(EitherT.scala:18)
[error]         at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1(Task.scala:14)
[error]         at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1$adapted(Task.scala:14)
[error]         at lmcoursier.internal.shaded.coursier.util.Task$.wrap(Task.scala:84)
[error]         at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$2(Task.scala:14)
[error]         at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
[error]         at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
[error]         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
[error]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error]         at java.lang.Thread.run(Thread.java:748)
[error] (update) org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 7; The processing instruction target matching "[xX][mM][lL]" is not allowed.

because I could find no reference to a file name or input that is causing the error.
It does seem to be a pom file, based on .parseRawPomSax(MavenRepository.scala:74), but together with a couple other scala devs here, we couldn’t really make sense of it (in ~20 minutes or so).

Maybe we’re missing something… so here’s a sample build.sbt if you want to try replicate it (sbt 1.5.5):

scalaVersion := "2.13.6"

//val dl4jVersion = "1.0.0-M1.1"
val dl4jVersion = "1.0.0-SNAPSHOT"

//resolvers += "Local Maven Repository" at "file://" + Path.userHome.absolutePath + "/.m2/repository"
resolvers += "Snapshot Repository" at "https://oss.sonatype.org/content/repositories/snapshots"
libraryDependencies ++= Seq(
    "org.slf4j" % "slf4j-api" % "1.7.31",
    "org.deeplearning4j" % "deeplearning4j-core" % dl4jVersion,
    "org.deeplearning4j" % "deeplearning4j-modelimport" % dl4jVersion,
    "org.deeplearning4j" % "deeplearning4j-nn" % dl4jVersion,
    "org.deeplearning4j" % "deeplearning4j-zoo" % dl4jVersion,
    "org.nd4j" % "nd4j-native-platform" % dl4jVersion,
    "org.nd4j" % "samediff-import-onnx" % "1.0.0-M1.1"  // <-- Works on -M1.1, errors on SNAPSHOT
)

Before I go ahead and try to reproduce this, can you do a quick sanity check please:
Run mvn dependency:purge-local-repository, that will remove all of the projects dependencies from the local repository and then you can try to redownload them (see Apache Maven Dependency Plugin – Purging project dependencies)

If it is due to some kind of corruption in your local repository, that may already resolve the issue.

If not, it would be nice if you could put up a simple demo git repository that reproduces the problem, so we can try to debug what exactly is going on.

Right - I’ve purged the repo and tried again. Unfortunately, no dice.
For further sanity check, we’ve replicated the aforementioned error on Ubuntu as well as Mac.

Running latest SBT (1.5.5), all that needs to be done to replicate the error is create the above build.sbt file in a test folder, and in that folder run sbt and update.
It should succeed with samediff-import-onnx set to version M1.1, but fails when set to snapshot.

Thanks for the sanity check.

I’ve got a SBT debugging setup working now, so I can put a breakpoint into it and figure out what the problem is.

Turns out it is a stray space character in one of the dependencies of samediff-import-onnx.

Once this PR is merged and the snapshots are rebuilt, it should start working again

Thank you very much for providing enough detail for me to be able to debug it :slight_smile:

@treo @mdebeer ditto for being patient there. I’ve merged Paul’s PR.

Excellent, I’m glad you were able to spot it! @treo

Next up, I’ll give the model import a test with the new snapshot tomorrow :slight_smile:

So, progress. I could now pull the latest snapshots.

Aaand when I test importing of the Detectron2 onnx model, I get the following NullPointerException:

Exception in thread "main" java.lang.NullPointerException
	at org.nd4j.samediff.frameworkimport.onnx.ir.OnnxIRGraph.nodeList(OnnxIRGraph.kt:135)
	at org.nd4j.samediff.frameworkimport.onnx.ir.OnnxIRGraph.<init>(OnnxIRGraph.kt:69)
	at org.nd4j.samediff.frameworkimport.onnx.importer.OnnxFrameworkImporter.runImport(OnnxFrameworkImporter.kt:50)
	at TestImportONNX$.delayedEndpoint$TestImportONNX$1(TestImportONNX.scala:13)
...

The code I am running to test simply looks like:

object TestImportONNX extends App {
  val onnxImport = new OnnxFrameworkImporter()

  val onnxFile: File  = new File("model.onnx")
  val graph: SameDiff = onnxImport.runImport(onnxFile.getAbsolutePath, Collections.emptyMap())
}

At first it threw an error that there is no variable “Placeholder”, but then I updated the .pb and .pbtxt files that I added to resources to match that in the master branch here, and then I get the NullPointer.

For interest, if I delete the resources (assuming they are now included in snapshot), I get:

Exception in thread "main" java.lang.IllegalArgumentException: Rule listnumbertolistnumber for framework onnx with input framework name perm and framework op name Transpose does not accept output type [INPUT_TENSOR] for attribute name permuteDims and mapping process for op transpose
	at org.nd4j.samediff.frameworkimport.process.AbstractMappingProcess.<init>(AbstractMappingProcess.kt:107)
	at org.nd4j.samediff.frameworkimport.onnx.process.OnnxMappingProcess.<init>(OnnxMappingProcess.kt:48)
	at org.nd4j.samediff.frameworkimport.onnx.process.OnnxMappingProcess.<init>(OnnxMappingProcess.kt:47)
	at org.nd4j.samediff.frameworkimport.onnx.process.OnnxMappingProcessLoader.instantiateMappingProcess(OnnxMappingProcessLoader.kt:50)
	at org.nd4j.samediff.frameworkimport.process.AbstractMappingProcessLoader.createProcess(AbstractMappingProcessLoader.kt:155)
	at org.nd4j.samediff.frameworkimport.registry.OpMappingRegistry.loadFromDefinitions(OpMappingRegistry.kt:180)
	at org.nd4j.samediff.frameworkimport.onnx.opdefs.OnnxOpDescriptorLoader.createOpMappingRegistry(OnnxOpDescriptorLoader.kt:104)
	at org.nd4j.samediff.frameworkimport.onnx.importer.OnnxFrameworkImporter.<init>(OnnxFrameworkImporter.kt:40)
	at TestImportONNX$.delayedEndpoint$TestImportONNX$1(TestImportONNX.scala:10)
...

@agibsonccc Please let me know if there’s something I am missing or if I can try something further. Could you get the import to work with the model that I sent you?

@mdebeer the error you’re seeing is unrleated but JFYI I haven’t implemented AliasWithName yet. It’s still on my list. We had internal use cases to prioritize first. I’ll let you know when I get around to it.

If you want to work around the issue you’re seeing there, you’re seeing an issue with mismatched definitions from the generated op descriptors and the model import there. If you want to try to work through that you can, but then you’ll still need to add the definition for the AliasWithName as well.

@agibsonccc Aah okay, thank you.
That’s alright then for the time being. Importing the model is a priority for us so we can move away from Python, but not too urgent. So I’ll check in again soon :slight_smile: Thanks for your time!

@mdebeer I’ll let you know when we add it. It’s on average taken me 1 day or so to add ops. Our internal use cases just had me implementing a lot more of onnx than I thought. I wouldn’t over think it as an indefinite priority. Watch this space for the next week or so. I’m about caught up and will ping you with the related op. I’ll upload the converted model for you as a test case as well.

1 Like