I’ve been training networks on Intel 12700k (8 performance, 4 efficiency cores) and 13900k (8 performance, 16 efficiency cores) CPUs in Linux. I see massive increases in training time if the efficiency cores are turned on. For example, A 8 minute test with the efficiency cores turned off will take 13 minutes with them turned on for the 12700k. The 13900k performs even worse if the efficiency are on - taking about 39 minutes.
I don’t know if this problem is limited to Linux, and caused by poor thread management by the kernel (Linux kernel 6.1.9-060109-generic x86_64). My guess is that there is an misaligned distribution of performance/efficiency cores assigned to liner algebra and BLAS.
My work-around is to disable efficiency cores if I’m training, which isn’t ideal because some of my CPU hardware is not utilized. Does anyone have a better solution?