NUMA-awareness

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

NUMA-awareness

Post by zullil »

In general terms, how would an engine be coded to take advantage of the following NUMA set-up (assuming the engine will run 16 search threads)? What methods might be used to minimize threads running on one node needing to access memory on the other node? Thanks.

Code: Select all

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 15992 MB
node 0 free: 14202 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 16126 MB
node 1 free: 14711 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: NUMA-awareness

Post by jdart »

Most modern multi-CPU systems are NUMA.

General rule is to keep memory access local to the NUMA node whenever possible. There are tools that can measure this for you (Intel vTune, TAO, etc).

That means among other things generally avoiding access to globals that are shared across all threads. In the case of the main hash table you can't avoid that. But for anything else, don't do it, not even one byte. If you need a small amount of global data like option settings and it does not change, consider caching it per-thread. If you need to update a global counter or something, do it less often if possible.

Apply the "first touch rule:" make sure memory that is used exclusively by a thread is initialized by that thread. (This means moving memory access to the thread procedure).

Specifically for NUMA systems: you can use NUMA APIs to pin threads so that they will not migrate to another node (Linux and Window schedulers will both do this and when this happens the thread's memory is not migrated). Most engines that do that also let you set an index so you can run one engine on nodes 1..2 and another on nodes 3..4 if you want.

--Jon