Great, thank you!
I know it may be a frustrating question, but how long do you expect until the next release?
@mdebeer probably a few weeks at the latest. You can use snapshots when this gets merged though.
Excellent, thank you. Iāve subscribed to that PR to get updates, so will try the snapshot when itās merged.
@mdebeer great, will ping you if I merge that work without doing this. It will depend on how big adding the next few target ops is. (Thereās around 5 to 7 I wanted to get done) Thanks for your patience!
Hi Adam, I see that the PR was merged to master.
Iāve tried to set up my SBT to pull the snapshots (using this page as guide). Itās not clear what the correct, latest version number should be (after 1.0.0-M1.1), but I presume itās 1.0.0-SNAPSHOT
.
With this version + sonatype repo, I can resolve "deeplearning4j-core"
and "deeplearning4j-modelimport"
, but I canāt seem to resolve "org.nd4j" % "samediff-import-onnx"
.
Do you have any advice for how I can test the model import with the latest snapshot?
You should be able to resolve it, as it does exist:
https://oss.sonatype.org/content/repositories/snapshots/org/nd4j/samediff-import-onnx/1.0.0-SNAPSHOT/
What exactly does sbt tell you?
Sometimes sbt can be a bit of a pain when trying to actually pull snapshot dependencies. In that case, the easiest workaround is to create a dummy maven project, set up all the dependencies that you are missing there, and run mvn dependency:go-offline
. That will download everything to your local repository, and then sbt can usually resolve everything just fine.
Aah, interesting ā
So my build.sbt contains the following:
resolvers += "Snapshot Repository" at "https://oss.sonatype.org/content/repositories/snapshots",
libraryDependencies ++= Seq(
"org.deeplearning4j" % "deeplearning4j-core" % "1.0.0-SNAPSHOT",
"org.deeplearning4j" % "deeplearning4j-modelimport" % "1.0.0-SNAPSHOT",
"org.nd4j" % "samediff-import-onnx" % "1.0.0-M1.1",
...)
If I change samediff-import-onnx
to also pull "1.0.0-SNAPSHOT"
, then I get an XML parsing error ā¦
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 7; The processing instruction target matching "[xX][mM][lL]" is not allowed.
Which doesnāt seem to be SBT related. When I search the error with āSBTā, I donāt get any helpful results. It seems unable to parse some XML resultā¦ which, heck if I know.
But your idea to use a dummy maven project to pull the dependencies does indeed seem like a good approach, so Iāll give that a try.
@mdebeer worse case scenario just installing the java dependencies from source should be fine as well.
You only need to install nd4j-api and samediff-onnx-import as the main modules for this.
The converter mainly parses protobuf and converts it to a samediff model then saves it. Itās a worse case scenario for you, but shouldnāt be nearly as bad as installing the c++ dependencies. Depending on what route you go, we can help you out though.
Please try Paulās advice first though.
This is quite a tricky one to figure out ā¦
Iāve created the dummy maven project, defined all the dependencies, and running mvn dependency:go-offline
completes successfully and I can find the downloaded dependencies in ~/.m2/repository
.
However, adding my local maven repo as a resolver (resolvers += "Local Maven Repository" at "file://" + Path.userHome.absolutePath + "/.m2/repository"
) still does not allow SBT to compile ā¦ because it appears coursier tries to validate the xml for samediff-import-onnx
and fails specifically for that one (snapshot works for org.deeplearning4j.*
and org.nd4j.nd4j-native-platform
).
Iāve looked at the maven-metadata.xml, and itās perfectly valid. Iāve compared the .pom files as well, for a working example and the samediff-import-onnx one, and the only difference is indentation depth and some extra whitespace, but itās still valid xml ā¦ (the SBT errors appears to occur because of the following stackoverflow explanation).
Anyway ā not clear how to proceed, apart from possibly taking the maven dummy project and packaging our own artifact and building from that. But this would just be for intermediate testing before the next version release, so waiting is also an option. Thought Iād at least share the symptoms / potential bug with samediff-import-onnx
ās xml, but maybe itās just our SBT config?
I can post our sample build.sbt if youāre interested in debugging ā¦
You donāt need to do that. It should automatically check the local repository as that is where it caches everything.
But resolving the artifact is probably not your actual problem, becauseā¦
This looks like it does indeed resolve it correctly, but then has issues with processing what it finds.
Can you share more of that stacktrace? In particular, Iād like to know what sub-system is throwing this, and what file it is trying to read.
@treo Thank you for the feedback.
Unfortunately, the stacktrace wasnāt very helpful to me when I worked through it:
[error] org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 7; The processing instruction target matching "[xX][mM][lL]" is not allowed.
[error] at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
[error] at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
[error] at lmcoursier.internal.shaded.coursier.core.compatibility.package$.xmlParseSax(package.scala:116)
[error] at lmcoursier.internal.shaded.coursier.maven.MavenRepository$.parseRawPomSax(MavenRepository.scala:74)
[error] at lmcoursier.internal.shaded.coursier.maven.MavenRepository.$anonfun$findVersioning$1(MavenRepository.scala:369)
[error] at lmcoursier.internal.shaded.coursier.util.EitherT.$anonfun$flatMap$1(EitherT.scala:18)
[error] at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1(Task.scala:14)
[error] at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1$adapted(Task.scala:14)
[error] at lmcoursier.internal.shaded.coursier.util.Task$.wrap(Task.scala:84)
[error] at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$2(Task.scala:14)
[error] at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
[error] at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
[error] at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
[error] (update) org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 7; The processing instruction target matching "[xX][mM][lL]" is not allowed.
because I could find no reference to a file name or input that is causing the error.
It does seem to be a pom file, based on .parseRawPomSax(MavenRepository.scala:74)
, but together with a couple other scala devs here, we couldnāt really make sense of it (in ~20 minutes or so).
Maybe weāre missing somethingā¦ so hereās a sample build.sbt if you want to try replicate it (sbt 1.5.5):
scalaVersion := "2.13.6"
//val dl4jVersion = "1.0.0-M1.1"
val dl4jVersion = "1.0.0-SNAPSHOT"
//resolvers += "Local Maven Repository" at "file://" + Path.userHome.absolutePath + "/.m2/repository"
resolvers += "Snapshot Repository" at "https://oss.sonatype.org/content/repositories/snapshots"
libraryDependencies ++= Seq(
"org.slf4j" % "slf4j-api" % "1.7.31",
"org.deeplearning4j" % "deeplearning4j-core" % dl4jVersion,
"org.deeplearning4j" % "deeplearning4j-modelimport" % dl4jVersion,
"org.deeplearning4j" % "deeplearning4j-nn" % dl4jVersion,
"org.deeplearning4j" % "deeplearning4j-zoo" % dl4jVersion,
"org.nd4j" % "nd4j-native-platform" % dl4jVersion,
"org.nd4j" % "samediff-import-onnx" % "1.0.0-M1.1" // <-- Works on -M1.1, errors on SNAPSHOT
)
Before I go ahead and try to reproduce this, can you do a quick sanity check please:
Run mvn dependency:purge-local-repository
, that will remove all of the projects dependencies from the local repository and then you can try to redownload them (see Apache Maven Dependency Plugin ā Purging project dependencies)
If it is due to some kind of corruption in your local repository, that may already resolve the issue.
If not, it would be nice if you could put up a simple demo git repository that reproduces the problem, so we can try to debug what exactly is going on.
Right - Iāve purged the repo and tried again. Unfortunately, no dice.
For further sanity check, weāve replicated the aforementioned error on Ubuntu as well as Mac.
Running latest SBT (1.5.5), all that needs to be done to replicate the error is create the above build.sbt
file in a test folder, and in that folder run sbt
and update
.
It should succeed with samediff-import-onnx
set to version M1.1, but fails when set to snapshot.
Thanks for the sanity check.
Iāve got a SBT debugging setup working now, so I can put a breakpoint into it and figure out what the problem is.
Turns out it is a stray space character in one of the dependencies of samediff-import-onnx.
Once this PR is merged and the snapshots are rebuilt, it should start working again
Thank you very much for providing enough detail for me to be able to debug it
Excellent, Iām glad you were able to spot it! @treo
Next up, Iāll give the model import a test with the new snapshot tomorrow
So, progress. I could now pull the latest snapshots.
Aaand when I test importing of the Detectron2 onnx model, I get the following NullPointerException:
Exception in thread "main" java.lang.NullPointerException
at org.nd4j.samediff.frameworkimport.onnx.ir.OnnxIRGraph.nodeList(OnnxIRGraph.kt:135)
at org.nd4j.samediff.frameworkimport.onnx.ir.OnnxIRGraph.<init>(OnnxIRGraph.kt:69)
at org.nd4j.samediff.frameworkimport.onnx.importer.OnnxFrameworkImporter.runImport(OnnxFrameworkImporter.kt:50)
at TestImportONNX$.delayedEndpoint$TestImportONNX$1(TestImportONNX.scala:13)
...
The code I am running to test simply looks like:
object TestImportONNX extends App {
val onnxImport = new OnnxFrameworkImporter()
val onnxFile: File = new File("model.onnx")
val graph: SameDiff = onnxImport.runImport(onnxFile.getAbsolutePath, Collections.emptyMap())
}
At first it threw an error that there is no variable āPlaceholderā, but then I updated the .pb and .pbtxt files that I added to resources to match that in the master branch here, and then I get the NullPointer.
For interest, if I delete the resources (assuming they are now included in snapshot), I get:
Exception in thread "main" java.lang.IllegalArgumentException: Rule listnumbertolistnumber for framework onnx with input framework name perm and framework op name Transpose does not accept output type [INPUT_TENSOR] for attribute name permuteDims and mapping process for op transpose
at org.nd4j.samediff.frameworkimport.process.AbstractMappingProcess.<init>(AbstractMappingProcess.kt:107)
at org.nd4j.samediff.frameworkimport.onnx.process.OnnxMappingProcess.<init>(OnnxMappingProcess.kt:48)
at org.nd4j.samediff.frameworkimport.onnx.process.OnnxMappingProcess.<init>(OnnxMappingProcess.kt:47)
at org.nd4j.samediff.frameworkimport.onnx.process.OnnxMappingProcessLoader.instantiateMappingProcess(OnnxMappingProcessLoader.kt:50)
at org.nd4j.samediff.frameworkimport.process.AbstractMappingProcessLoader.createProcess(AbstractMappingProcessLoader.kt:155)
at org.nd4j.samediff.frameworkimport.registry.OpMappingRegistry.loadFromDefinitions(OpMappingRegistry.kt:180)
at org.nd4j.samediff.frameworkimport.onnx.opdefs.OnnxOpDescriptorLoader.createOpMappingRegistry(OnnxOpDescriptorLoader.kt:104)
at org.nd4j.samediff.frameworkimport.onnx.importer.OnnxFrameworkImporter.<init>(OnnxFrameworkImporter.kt:40)
at TestImportONNX$.delayedEndpoint$TestImportONNX$1(TestImportONNX.scala:10)
...
@agibsonccc Please let me know if thereās something I am missing or if I can try something further. Could you get the import to work with the model that I sent you?
@mdebeer the error youāre seeing is unrleated but JFYI I havenāt implemented AliasWithName yet. Itās still on my list. We had internal use cases to prioritize first. Iāll let you know when I get around to it.
If you want to work around the issue youāre seeing there, youāre seeing an issue with mismatched definitions from the generated op descriptors and the model import there. If you want to try to work through that you can, but then youāll still need to add the definition for the AliasWithName as well.
@agibsonccc Aah okay, thank you.
Thatās alright then for the time being. Importing the model is a priority for us so we can move away from Python, but not too urgent. So Iāll check in again soon Thanks for your time!
@mdebeer Iāll let you know when we add it. Itās on average taken me 1 day or so to add ops. Our internal use cases just had me implementing a lot more of onnx than I thought. I wouldnāt over think it as an indefinite priority. Watch this space for the next week or so. Iām about caught up and will ping you with the related op. Iāll upload the converted model for you as a test case as well.