Differences in Shakespeare generator example

Hello, I’m a new visitor to the deep learning community, trying to find my way around after reading the O’Reilly book about DeepLearning4J. I tried to adapt the example from the book (GravesLSTMCharModellingExample), only to be pointed by Adam Gibson on Gitter that the example has already been ported here ( deeplearning4j-examples/GenerateTxtModel.java at master · eclipse/deeplearning4j-examples (github.com). My attempt at the adaptation is here:

Graves LSTM example adapted for DL4J 1.0 (github.com)

Now, the one thing I noticed is that while my adapted example works very badly (the network hardly converges), the official one works very well (even the first set of samples is already pretty good). I’m wondering - what would be the major differences here? Is it the choice of the updater? Was the regularization rate too high in the original example (0.01 vs 0.001)? Or did I make a configuration error somewhere that I didn’t notice?

And BTW, the ScoreIterationListener doesn’t seem to print out anything to the console in this case - is this expected behavior?