Page 1 of 1

NUMA in a YBWC implementation

Posted: Wed Jul 20, 2016 11:37 am
by Edsel Apostol
The trend nowadays and in the future are 2 or more CPUS in a NUMA configuration. I would like to implement NUMA awareness in a YBWC parallel search. Is there any tips from our SMP experts here besides pinning the threads to a fix cpu core?

Re: NUMA in a YBWC implementation

Posted: Wed Jul 20, 2016 5:21 pm
by jdart
Pinning is not so easy because different OS's have different APIs for this and also there is hyperthreading: you want to pin to a physical core, not a virtual one. I am using the hwloc library to facilitate doing this: https://www.open-mpi.org/projects/hwloc/.

You also have to pay extra attention on a NUMA system to shared memory access. The hash table is necessarily shared memory, although you can take care to not map all the table into a single core's memory space (I think Crafty does some partial initialization of the table in each thread). Otherwise you should try to reduce global memory access. I copy some info like search options into each thread's stack so that all the threads aren't trying to hit a single copy. For things like performance counters I also accumulate them in the threads and only periodically update a global copy. I would recommend you run a profiler like oprofile or VTune to find hot spots.

There are a lot of resources on this on the Web. See for example:

http://ircc.fiu.edu/download/sc13/Pract ... Slides.pdf

https://blogs.fau.de/hager/files/2010/0 ... ticore.pdf

http://www.cs.utexas.edu/~skeckler/pubs/ISPASS_2011.pdf

https://software.intel.com/sites/defaul ... r_NUMA.pdf

--Jon

Re: NUMA in a YBWC implementation

Posted: Wed Jul 20, 2016 10:52 pm
by Dann Corbit
I guess that you already looked at Daniel Shawul's Scorpio implementation. which allows simultaneous NUMA and SMP searching?

I think his search is YBW (IIRC).