TinyYolo training goes into NaN with CPU Backend

treo · April 5, 2020, 10:16am

You can remove the snapshot repository from your pom.xml again, and then it will use what is cached in your local maven repository.

At the moment we’ve only been able to reproduce this with cuDNN, so we have to assume that it is because of that. Numerical overflow/underflow issues do happen given specific hardware implementations. For, example on hardware ARM systems we sometimes run into those issues as well, while with a virtualized ARM system we don’t get those problems.

Our yolo implementation appears to work fine in our test cases, so we have to assume that it is working as intended on CPU. Hunting a bug that is hard to pinpoint requires a lot of resources, and we are still limited on that.

If you want to help with that, we will answer any questions that come up, but unless we have a better way to pin point the issue, we will have to prioritize the bugs that we have already confirmed to be fixable from our side.

Topic		Replies	Views
TinyYOLO training goes into NaN with cuDNN DL4J	19	1698	April 15, 2020
Random seed with GPU backend DL4J	12	1003	February 24, 2020
Manually destroying AMDSI workspace Debug by ND4J AsyncMultiDataSetIterator ND4J	1	441	February 6, 2021
Simple CNN predicts NaNs DL4J	3	500	November 2, 2020
Help with TinyYoloHouseNumberDetection example	2	37	June 14, 2024

TinyYolo training goes into NaN with CPU Backend

Related topics