I started to set up a longer test, but something strange is happening on my current linux system. The NUMA version of texel is something like 70% faster than the non-NUMA version. The only thing I have changed since the last measurement is the Fedora version. I am now running Fedora 24, before I was running Fedora 19.petero2 wrote:I took the average improvement for auto on and auto off, that is:syzygy wrote:In this thread Peter Österlund reports some interesting results for Texel on a 2-node PC with 2 x 8 = 16 threads:So Linux' automatic NUMA balancing feature hurts rather than helps, and Texel's own NUMA awareness increases speed by over 10%. (He wrote 14%, but if I take 18.16 vs 16.44 it is 10.46%.)Code: Select all
Auto Awareness Mn/s mean Mn/s std no no 16.44 1.55 yes no 15.22 1.67 no yes 18.16 0.37 yes yes 17.88 0.39
(18.16/16.44 + 17.88/15.22) / 2 = 1.1397
My assumption was that there is no real difference between auto on and auto off for a chess program, so taking the average should give a better estimate. I don't know if that assumption is correct though. More measurements would be required to find out.
The NUMA version runs at the expected speed, but the non-NUMA version runs a lot slower than expected (based on the speeds I got before I upgraded to Fedora 24). Running "numatop" shows that RMA/LMA (number of remote memory accesses divided by the number of local memory accesses) varies a lot when the non-NUMA version is running.
It seems like something in the kernel scheduler is broken with respect to NUMA. However another weird thing is that when I run Cfish and compare NUMA vs non-NUMA speeds the difference is only around 10%.
Automatic NUMA balancing is disabled. Transparent huge pages is enabled. The kernel version is: Linux version 4.7.2-201.fc24.x86_64 (mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 6.1.1 20160621 (Red Hat 6.1.1-3) (GCC) ) #1 SMP Fri Aug 26 15:58:40 UTC 2016