Intel efficiency cores massively increase training time

daviddbal · April 12, 2023, 7:20am

I’ve been training networks on Intel 12700k (8 performance, 4 efficiency cores) and 13900k (8 performance, 16 efficiency cores) CPUs in Linux. I see massive increases in training time if the efficiency cores are turned on. For example, A 8 minute test with the efficiency cores turned off will take 13 minutes with them turned on for the 12700k. The 13900k performs even worse if the efficiency are on - taking about 39 minutes.

I don’t know if this problem is limited to Linux, and caused by poor thread management by the kernel (Linux kernel 6.1.9-060109-generic x86_64). My guess is that there is an misaligned distribution of performance/efficiency cores assigned to liner algebra and BLAS.

My work-around is to disable efficiency cores if I’m training, which isn’t ideal because some of my CPU hardware is not utilized. Does anyone have a better solution?

treo · April 12, 2023, 7:34am

The problem is that in those calculations blas is expecting every piece of work to be done in about the same time.

As the efficiency cores are slower, giving them work will essentially make everything wait for them.

For other tasks like Cinebench, every piece of work is independent from every other piece of work, that is why using those cores there benefits the speed.

agibsonccc · April 12, 2023, 7:42am

@daviddbal this actually might be allocation related like I mentioned din the other thread. Could you show me a jvisualvm or similar tool’s profiling output to clarify?

Topic		Replies	Views
Linux performance issues in highly threaded environment DL4J	1	336	July 29, 2021
AMD Ryzen 5000 CPU - Poor Performance DL4J	16	936	August 14, 2021
Perfomance issue DL4J	8	457	October 2, 2020
Unexpected Slow Performance ND4J	28	775	October 15, 2021
Only two threads are running when switching training to GPU DL4J	2	417	May 29, 2020

Intel efficiency cores massively increase training time

Related topics