I have rather a newbee question about the training (see the picture).
Is it normal if a model shows divergence after epoch N (say N=100)? Or ist it a sign of poorly designed model/dataset?
What could be the cause for the score collapsing at iteration #5000? (see the green in the picture)
The score seems to be affected by some bias (see the yellow in the picture). Can we deduce something from such behaviour?
UPDATE: after reducing the learning rate the model converges well