NUMA 101
Moderators: hgm, Rebel, chrisw
Re: NUMA 101
FWIW: on my celeron laptop I see a +/- 2% speed improvement by using "huge pages" - well not that huge; the cpu in my laptop can only do 2MB pages.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: NUMA 101
OK, it is fixed in 25.1 which will be out pretty soon. Some more NUMA updates that I was using privately but am now ready to include since I am sure they work as expected will also be in this next version..zullil wrote:Bob, the code that causes this error is in the released 25.0.bob wrote:That library call was changed. I don't really use any of those any longer, that one was used strictly to print the greeting noting that this is a NUMA box.zullil wrote:Including -DNUMA leads to the following error. This is on a linux system:bob wrote: First, since not all machines support NUMA, I have a -DNUMA Makefile option that turns this on. Leave -DNUMA off, and it doesn't do any NUMA-related tricks at all.
Code: Select all
$ make profile make -j unix-gcc-profile make[1]: Entering directory `/home/louis/Documents/Chess/Crafty' make -j target=UNIX \ CC=gcc-5 CXX=g++-5 \ opt='-DTEST -DINLINEASM -DPOPCNT -DCPUS=20 -DAFFINITY -DNUMA' \ CFLAGS='-Wall -Wno-array-bounds -pipe -O3 -march=native -fprofile-arcs \ -pthread' \ CXFLAGS='-Wall -Wno-array-bounds -pipe -O3 -march=native -fprofile-arcs \ -pthread' \ LDFLAGS=' -fprofile-arcs -pthread -lstdc++ ' \ crafty-make make[2]: Entering directory `/home/louis/Documents/Chess/Crafty' make[3]: Entering directory `/home/louis/Documents/Chess/Crafty' gcc-5 -Wall -Wno-array-bounds -pipe -O3 -march=native -fprofile-arcs \ -pthread -DTEST -DINLINEASM -DPOPCNT -DCPUS=20 -DAFFINITY -DNUMA -DUNIX -c crafty.c g++-5 -c -Wall -Wno-array-bounds -pipe -O3 -march=native -fprofile-arcs \ -pthread -DTEST -DINLINEASM -DPOPCNT -DCPUS=20 -DAFFINITY -DNUMA -DUNIX egtb.cpp In file included from crafty.c:45:0: main.c: In function ‘main’: main.c:4309:26: warning: passing argument 2 of ‘numa_node_to_cpus’ from incompatible pointer type [-Wincompatible-pointer-types] numa_node_to_cpus(0, cpus, 64); ^ In file included from main.c:9:0, from crafty.c:45: /usr/include/numa.h:283:5: note: expected ‘struct bitmask *’ but argument is of type ‘long unsigned int *’ int numa_node_to_cpus(int, struct bitmask *); ^ In file included from crafty.c:45:0: main.c:4309:5: error: too many arguments to function ‘numa_node_to_cpus’ numa_node_to_cpus(0, cpus, 64); ^ In file included from main.c:9:0, from crafty.c:45: /usr/include/numa.h:283:5: note: declared here int numa_node_to_cpus(int, struct bitmask *); ^ make[3]: *** [crafty.o] Error 1 make[3]: *** Waiting for unfinished jobs.... make[3]: Leaving directory `/home/louis/Documents/Chess/Crafty' make[2]: *** [crafty-make] Error 2 make[2]: Leaving directory `/home/louis/Documents/Chess/Crafty' make[1]: *** [unix-gcc-profile] Error 2 make[1]: Leaving directory `/home/louis/Documents/Chess/Crafty' make: *** [profile] Error 2
Is this the latest 25.0 or the one I sent you a while back. I did fix that to use a different numa library routine, and thought it was in 25.0 as released?
-
- Posts: 859
- Joined: Mon Aug 10, 2009 10:05 pm
- Location: Italy
- Full name: Stefano Gemma
Re: NUMA 101
Thanks for your very interesting post, it is very clear and complete.
A whole unknown world has been opened to me!
A whole unknown world has been opened to me!
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
http://www.linformatica.com
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: NUMA 101
There is a lot to think about. For example, when you use magics, you can end up with ALL of the magic lookup tables on a single node? Good, bad or indifferent? I haven't addressed this yet, but one idea is that such data can easily be duplicated. IE generate the original, and then let each thread (on each NUMA node) copy it to a private copy. I have not done this because I am doing this per-thread at the moment, because it is not easy to figure out how many actual numa nodes you have and how cache is configured and shared between nodes. Same sort of idea applies to the actual program executable code. Plenty to think about for the next year or two.stegemma wrote:Thanks for your very interesting post, it is very clear and complete.
A whole unknown world has been opened to me!
-
- Posts: 859
- Joined: Mon Aug 10, 2009 10:05 pm
- Location: Italy
- Full name: Stefano Gemma
Re: NUMA 101
This multi-processor + multi-threading stuffs seems to me to be very important, for the future of programming. We are talking about games but now I'm applying the multithreading environment wrote for satana to a web-application server. Having to handle multiple clients is somehow similar to split the search to multiple threads, in fact, and the database looks like the hash table (but more complex, for some reason). The most important aspect is that to reach a true AI we should exasperate multithreading... because the only real AI unit is a brain, with hundred of billion of simple processing units. Let all this processing unit works together sharing memory and other resource is more than a hard challenge, today.bob wrote:There is a lot to think about. [...] Plenty to think about for the next year or two.stegemma wrote:Thanks for your very interesting post, it is very clear and complete.
A whole unknown world has been opened to me!
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
http://www.linformatica.com
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: NUMA 101
Read-only data that stays in a cache (e.g. L2 or L1 cache) shouldn't be much of a problem. As long as nobody writes to those cachelines, they can be stored in multiple caches at once.bob wrote:There is a lot to think about. For example, when you use magics, you can end up with ALL of the magic lookup tables on a single node? Good, bad or indifferent? I haven't addressed this yet, but one idea is that such data can easily be duplicated. IE generate the original, and then let each thread (on each NUMA node) copy it to a private copy. I have not done this because I am doing this per-thread at the moment, because it is not easy to figure out how many actual numa nodes you have and how cache is configured and shared between nodes. Same sort of idea applies to the actual program executable code. Plenty to think about for the next year or two.stegemma wrote:Thanks for your very interesting post, it is very clear and complete.
A whole unknown world has been opened to me!
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: NUMA 101
Some of it is big however. IE the rook magics are on the range of 800kb, which blows L1 and L2 (processor-local caches typically).wgarvin wrote:Read-only data that stays in a cache (e.g. L2 or L1 cache) shouldn't be much of a problem. As long as nobody writes to those cachelines, they can be stored in multiple caches at once.bob wrote:There is a lot to think about. For example, when you use magics, you can end up with ALL of the magic lookup tables on a single node? Good, bad or indifferent? I haven't addressed this yet, but one idea is that such data can easily be duplicated. IE generate the original, and then let each thread (on each NUMA node) copy it to a private copy. I have not done this because I am doing this per-thread at the moment, because it is not easy to figure out how many actual numa nodes you have and how cache is configured and shared between nodes. Same sort of idea applies to the actual program executable code. Plenty to think about for the next year or two.stegemma wrote:Thanks for your very interesting post, it is very clear and complete.
A whole unknown world has been opened to me!