NUMA in a YBWC implementation

Edsel Apostol
Full name: Edsel Apostol

Post by Edsel Apostol » Wed Jul 20, 2016 9:37 am

The trend nowadays and in the future are 2 or more CPUS in a NUMA configuration. I would like to implement NUMA awareness in a YBWC parallel search. Is there any tips from our SMP experts here besides pinning the threads to a fix cpu core?

Post by jdart » Wed Jul 20, 2016 3:21 pm

Pinning is not so easy because different OS's have different APIs for this and also there is hyperthreading: you want to pin to a physical core, not a virtual one. I am using the hwloc library to facilitate doing this:

You also have to pay extra attention on a NUMA system to shared memory access. The hash table is necessarily shared memory, although you can take care to not map all the table into a single core's memory space (I think Crafty does some partial initialization of the table in each thread). Otherwise you should try to reduce global memory access. I copy some info like search options into each thread's stack so that all the threads aren't trying to hit a single copy. For things like performance counters I also accumulate them in the threads and only periodically update a global copy. I would recommend you run a profiler like oprofile or VTune to find hot spots.

There are a lot of resources on this on the Web. See for example: ... Slides.pdf ... ticore.pdf ... r_NUMA.pdf


Dann Corbit
Post by Dann Corbit » Wed Jul 20, 2016 8:52 pm

I guess that you already looked at Daniel Shawul's Scorpio implementation. which allows simultaneous NUMA and SMP searching?

I think his search is YBW (IIRC).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

