Concerning the following structures of your chess engine:
- Main Hash Table
- Pawn Hash Table
- Killers array
- History array
How you deal with these structures, when implementing a SMP search? To be more specific:
- For what structures, you keep a separate instance for each thread and what structures share a single instance with all threads;
- How you handle the concurrent access - if applicable - for each of these structures;
- How to maintain the killers/history arrays.
Thanks,
Questions on SMP search
Moderators: hgm, Rebel, chrisw
-
- Posts: 482
- Joined: Thu Oct 16, 2008 4:23 am
- Location: Milky Way
Questions on SMP search
Ben-Hur Carlos Langoni Junior
http://sourceforge.net/projects/redqueenchess/
http://sourceforge.net/projects/redqueenchess/
-
- Posts: 1563
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: Questions on SMP search
I have almost everything thread global except for the killer arrays, they are maintained locally.
For the hash table(s) I use lockless (Hyatt style) hashing, locking each TT access is way to expensive.
For the history array I don't do anything special, because in my engine this is an atomic operation.
Recently I tried to change the history array from thread global to thread local. After doing al lot of tests it seems this performes worse, probably because there is less information per thread.
For the hash table(s) I use lockless (Hyatt style) hashing, locking each TT access is way to expensive.
For the history array I don't do anything special, because in my engine this is an atomic operation.
Recently I tried to change the history array from thread global to thread local. After doing al lot of tests it seems this performes worse, probably because there is less information per thread.
Re: Questions on SMP search
Robert Hyatt has a short article here : A lockless transposition table implementation for parallel search.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Questions on SMP search
You have to share the main one for SMP. Sharing that table is so important for alpha-beta , distributed hash tables are sometimes used even on clusters.
I currently do not share them there since it seems a lot of work for a 15% speed up. I do not share the others even on SMP systems. I do not "lockless hash" but test for the legality of the move instead.
I currently do not share them there since it seems a lot of work for a 15% speed up. I do not share the others even on SMP systems. I do not "lockless hash" but test for the legality of the move instead.
-
- Posts: 348
- Joined: Sat Feb 27, 2010 12:21 am
Re: Questions on SMP search
I have everything separated by address space.bhlangonijr wrote:Concerning the following structures of your chess engine:
- Main Hash Table
- Pawn Hash Table
- Killers array
- History array
How you deal with these structures, when implementing a SMP search? To be more specific:
- For what structures, you keep a separate instance for each thread and what structures share a single instance with all threads;
- How you handle the concurrent access - if applicable - for each of these structures;
- How to maintain the killers/history arrays.
Thanks,
Only the transposition table is in shared memory.
I don't use locking or 'lockless' tricks but just accept the
race conditions and make sure the engine doesn't
choke on reading an invalid move from the table but
silently discards it.
I might speed up communication speed in a later version
also through shared memory.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions on SMP search
I share hash and phash. I have local killers and don't do history at all in Crafty. I don't think shared killers make much sense as they are pretty much a "local issue."marcelk wrote:I have everything separated by address space.bhlangonijr wrote:Concerning the following structures of your chess engine:
- Main Hash Table
- Pawn Hash Table
- Killers array
- History array
How you deal with these structures, when implementing a SMP search? To be more specific:
- For what structures, you keep a separate instance for each thread and what structures share a single instance with all threads;
- How you handle the concurrent access - if applicable - for each of these structures;
- How to maintain the killers/history arrays.
Thanks,
Only the transposition table is in shared memory.
I don't use locking or 'lockless' tricks but just accept the
race conditions and make sure the engine doesn't
choke on reading an invalid move from the table but
silently discards it.
I might speed up communication speed in a later version
also through shared memory.
+
-
- Posts: 481
- Joined: Thu Apr 16, 2009 12:00 pm
- Location: Slovakia, EU
Re: Questions on SMP search
My implementation (in Critter):
- Main hash is shared. No locking nor lockless xoring is done. Moves from TT are checked for legality instead.
- Pawn hash and evaluation cache are local to thread.
- History array is shared. Since writes here are atomic there is no need to care about conflicts. History tends to be very noisy anyway with higher depths. There might be some speed penalty due to cache coherency traffic / snooping.
- With killer moves it is a bit more complicated. While each thread maintains its private killer array, during split() operation a few plies worth of killer moves are copied over.
- Main hash is shared. No locking nor lockless xoring is done. Moves from TT are checked for legality instead.
- Pawn hash and evaluation cache are local to thread.
- History array is shared. Since writes here are atomic there is no need to care about conflicts. History tends to be very noisy anyway with higher depths. There might be some speed penalty due to cache coherency traffic / snooping.
- With killer moves it is a bit more complicated. While each thread maintains its private killer array, during split() operation a few plies worth of killer moves are copied over.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Questions on SMP search
In Stockfish is 100% the same.rvida wrote:My implementation (in Critter):
- Main hash is shared. No locking nor lockless xoring is done. Moves from TT are checked for legality instead.
- Pawn hash and evaluation cache are local to thread.
- History array is shared. Since writes here are atomic there is no need to care about conflicts. History tends to be very noisy anyway with higher depths. There might be some speed penalty due to cache coherency traffic / snooping.
- With killer moves it is a bit more complicated. While each thread maintains its private killer array, during split() operation a few plies worth of killer moves are copied over.
The only difference is that we don't have an evaluation cache but we use the main hash table to store evaluation together with the usual data.
-
- Posts: 482
- Joined: Thu Oct 16, 2008 4:23 am
- Location: Milky Way
Re: Questions on SMP search
Interesting. I understand Bob's position that killers are a "local issue", although in the non-smp version of the search implementation we usually don't keep many killers arrays per sub-tree, for example. I guess we end up creating a somewhat different move ordering scheme, when implementing the SMP version of the search. Is this a problem when testing new changes - not related to the SMP code?rvida wrote:My implementation (in Critter):
- Main hash is shared. No locking nor lockless xoring is done. Moves from TT are checked for legality instead.
- Pawn hash and evaluation cache are local to thread.
- History array is shared. Since writes here are atomic there is no need to care about conflicts. History tends to be very noisy anyway with higher depths. There might be some speed penalty due to cache coherency traffic / snooping.
- With killer moves it is a bit more complicated. While each thread maintains its private killer array, during split() operation a few plies worth of killer moves are copied over.
Ben-Hur Carlos Langoni Junior
http://sourceforge.net/projects/redqueenchess/
http://sourceforge.net/projects/redqueenchess/
-
- Posts: 482
- Joined: Thu Oct 16, 2008 4:23 am
- Location: Milky Way
Re: Questions on SMP search
Thanks all you guys for the answers.bob wrote:
I share hash and phash. I have local killers and don't do history at all in Crafty. I don't think shared killers make much sense as they are pretty much a "local issue."
+
Another important question is, how you do the load balance when distributing the work among the available threads?
In my second attempt to implement SMP (the first one failed miserably), I have a SMP search method with a thread-safe access to the move stack, control variables, hash, etc. All the available threads are booked and then call simultaneously the SMP search method, passing as parameter the same move stack, control variables, etc. The master is defined as the thread which called the "split" method. If some slave finishes its work it will become available for helping in a further split call. But when the master finishes its work it will stay in a busy loop until all slaves get their work done.
With limited testing, I got in average 1.6 speed up when using 2 cores and about 2.9 when using up to 6 cores (Considering NPS). Also, it appears that the speed up gained when using 2 cores doesn't help at all, in Elo terms...
What are your numbers? Mine are too bad?
Thanks,
Ben-Hur Carlos Langoni Junior
http://sourceforge.net/projects/redqueenchess/
http://sourceforge.net/projects/redqueenchess/