TinyYolo training goes into NaN with CPU Backend

You can remove the snapshot repository from your pom.xml again, and then it will use what is cached in your local maven repository.

At the moment we’ve only been able to reproduce this with cuDNN, so we have to assume that it is because of that. Numerical overflow/underflow issues do happen given specific hardware implementations. For, example on hardware ARM systems we sometimes run into those issues as well, while with a virtualized ARM system we don’t get those problems.

Our yolo implementation appears to work fine in our test cases, so we have to assume that it is working as intended on CPU. Hunting a bug that is hard to pinpoint requires a lot of resources, and we are still limited on that.

If you want to help with that, we will answer any questions that come up, but unless we have a better way to pin point the issue, we will have to prioritize the bugs that we have already confirmed to be fixable from our side.