Questions on SMP search

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
bhlangonijr
Posts: 482
Joined: Thu Oct 16, 2008 2:23 am
Location: Milky Way
Contact:

Questions on SMP search

Post by bhlangonijr » Thu Apr 21, 2011 5:27 am

Concerning the following structures of your chess engine:

- Main Hash Table
- Pawn Hash Table
- Killers array
- History array

How you deal with these structures, when implementing a SMP search? To be more specific:
- For what structures, you keep a separate instance for each thread and what structures share a single instance with all threads;
- How you handle the concurrent access - if applicable - for each of these structures;
- How to maintain the killers/history arrays.

Thanks,

Joost Buijs
Posts: 986
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

Re: Questions on SMP search

Post by Joost Buijs » Thu Apr 21, 2011 5:52 am

I have almost everything thread global except for the killer arrays, they are maintained locally.

For the hash table(s) I use lockless (Hyatt style) hashing, locking each TT access is way to expensive.

For the history array I don't do anything special, because in my engine this is an atomic operation.

Recently I tried to change the history array from thread global to thread local. After doing al lot of tests it seems this performes worse, probably because there is less information per thread.

ldesnogu

Re: Questions on SMP search

Post by ldesnogu » Thu Apr 21, 2011 8:47 am


Daniel Shawul
Posts: 3757
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: Questions on SMP search

Post by Daniel Shawul » Thu Apr 21, 2011 2:33 pm

You have to share the main one for SMP. Sharing that table is so important for alpha-beta , distributed hash tables are sometimes used even on clusters.
I currently do not share them there since it seems a lot of work for a 15% speed up. I do not share the others even on SMP systems. I do not "lockless hash" but test for the legality of the move instead.

User avatar
marcelk
Posts: 348
Joined: Fri Feb 26, 2010 11:21 pm
Contact:

Re: Questions on SMP search

Post by marcelk » Thu Apr 21, 2011 6:11 pm

bhlangonijr wrote:Concerning the following structures of your chess engine:

- Main Hash Table
- Pawn Hash Table
- Killers array
- History array

How you deal with these structures, when implementing a SMP search? To be more specific:
- For what structures, you keep a separate instance for each thread and what structures share a single instance with all threads;
- How you handle the concurrent access - if applicable - for each of these structures;
- How to maintain the killers/history arrays.

Thanks,
I have everything separated by address space.
Only the transposition table is in shared memory.

I don't use locking or 'lockless' tricks but just accept the
race conditions and make sure the engine doesn't
choke on reading an invalid move from the table but
silently discards it.

I might speed up communication speed in a later version
also through shared memory.

bob
Posts: 20556
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Questions on SMP search

Post by bob » Thu Apr 21, 2011 6:45 pm

marcelk wrote:
bhlangonijr wrote:Concerning the following structures of your chess engine:

- Main Hash Table
- Pawn Hash Table
- Killers array
- History array

How you deal with these structures, when implementing a SMP search? To be more specific:
- For what structures, you keep a separate instance for each thread and what structures share a single instance with all threads;
- How you handle the concurrent access - if applicable - for each of these structures;
- How to maintain the killers/history arrays.

Thanks,
I have everything separated by address space.
Only the transposition table is in shared memory.

I don't use locking or 'lockless' tricks but just accept the
race conditions and make sure the engine doesn't
choke on reading an invalid move from the table but
silently discards it.

I might speed up communication speed in a later version
also through shared memory.
I share hash and phash. I have local killers and don't do history at all in Crafty. I don't think shared killers make much sense as they are pretty much a "local issue."

+

User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 10:00 am
Location: Slovakia, EU

Re: Questions on SMP search

Post by rvida » Thu Apr 21, 2011 6:56 pm

My implementation (in Critter):

- Main hash is shared. No locking nor lockless xoring is done. Moves from TT are checked for legality instead.

- Pawn hash and evaluation cache are local to thread.

- History array is shared. Since writes here are atomic there is no need to care about conflicts. History tends to be very noisy anyway with higher depths. There might be some speed penalty due to cache coherency traffic / snooping.

- With killer moves it is a bit more complicated. While each thread maintains its private killer array, during split() operation a few plies worth of killer moves are copied over.

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Questions on SMP search

Post by mcostalba » Fri Apr 22, 2011 5:47 am

rvida wrote:My implementation (in Critter):

- Main hash is shared. No locking nor lockless xoring is done. Moves from TT are checked for legality instead.

- Pawn hash and evaluation cache are local to thread.

- History array is shared. Since writes here are atomic there is no need to care about conflicts. History tends to be very noisy anyway with higher depths. There might be some speed penalty due to cache coherency traffic / snooping.

- With killer moves it is a bit more complicated. While each thread maintains its private killer array, during split() operation a few plies worth of killer moves are copied over.
In Stockfish is 100% the same.

The only difference is that we don't have an evaluation cache but we use the main hash table to store evaluation together with the usual data.

bhlangonijr
Posts: 482
Joined: Thu Oct 16, 2008 2:23 am
Location: Milky Way
Contact:

Re: Questions on SMP search

Post by bhlangonijr » Fri Apr 22, 2011 1:46 pm

rvida wrote:My implementation (in Critter):

- Main hash is shared. No locking nor lockless xoring is done. Moves from TT are checked for legality instead.

- Pawn hash and evaluation cache are local to thread.

- History array is shared. Since writes here are atomic there is no need to care about conflicts. History tends to be very noisy anyway with higher depths. There might be some speed penalty due to cache coherency traffic / snooping.

- With killer moves it is a bit more complicated. While each thread maintains its private killer array, during split() operation a few plies worth of killer moves are copied over.
Interesting. I understand Bob's position that killers are a "local issue", although in the non-smp version of the search implementation we usually don't keep many killers arrays per sub-tree, for example. I guess we end up creating a somewhat different move ordering scheme, when implementing the SMP version of the search. Is this a problem when testing new changes - not related to the SMP code?

bhlangonijr
Posts: 482
Joined: Thu Oct 16, 2008 2:23 am
Location: Milky Way
Contact:

Re: Questions on SMP search

Post by bhlangonijr » Fri Apr 22, 2011 2:37 pm

bob wrote:
I share hash and phash. I have local killers and don't do history at all in Crafty. I don't think shared killers make much sense as they are pretty much a "local issue."

+
Thanks all you guys for the answers. :)

Another important question is, how you do the load balance when distributing the work among the available threads?

In my second attempt to implement SMP (the first one failed miserably), I have a SMP search method with a thread-safe access to the move stack, control variables, hash, etc. All the available threads are booked and then call simultaneously the SMP search method, passing as parameter the same move stack, control variables, etc. The master is defined as the thread which called the "split" method. If some slave finishes its work it will become available for helping in a further split call. But when the master finishes its work it will stay in a busy loop until all slaves get their work done.

With limited testing, I got in average 1.6 speed up when using 2 cores and about 2.9 when using up to 6 cores (Considering NPS). Also, it appears that the speed up gained when using 2 cores doesn't help at all, in Elo terms...

What are your numbers? Mine are too bad? :)

Thanks,

Post Reply