I found out strange thing. When training TinyYOLO with pretrained weights and if all data (all batches) has same dimensions, everything is fine. But if every batch has different data dimensions (same dimensions in actual batch, but different between other batches) training score goes into NaN on the 5th iteration, no matter on batch size.
I tested it on CPU backend -> working well without NaN. CUDA backend (without cuDNN) -> working well too. But CUDA with cuDNN is not working.
What do you think about it?