Parallel training on multicore single CPU

I would like to understand how the parallel training on multicore single CPU work. I plan to write my own EvaluativeListener so that I can control when to stop the training. When I look at the original implementation here, I noticed that iteration counter uses ThreadLocal. My question is what kind of parallel training approach is used by default. I am working in multicore single CPU environment. Does each thread uses portion of the training data to run the optimizer and update the weights synchronously? Or the parallel training happens in some other way? Thanks for info.

That is what you would typically use the early stopping functionality for.

That is because the listener can be used in a multi thread environment, e.g. when multiple gpus or cpus are used. When training in a single cpu multi core environment, usually only the actual math is parallelized, as that can still be done efficiently enough.

Makes sense. Thanks.