Parallel search/LazySMP question

jdart · Post by **jdart** » Sun Dec 17, 2017 9:55 pm

So, I understand some engines are basically using the ABDADA approach, where a bit in the hash table signifies that a thread is searching that node, so that other parallel threads will not also try to visit it (although they may help searching some of its successor nodes).

But it seems that engines using (or saying they use) LazySMP do not necessarily do this. A few I have looked at are: Texel, Stockfish, Demolito. I do not claim to understand everything I see there. But one thing I don't see is a mechanism to prevent redundant searches. It looks like each worker has its own move generator, and if started at the same depth, they will produce the same moves, and search in parallel the same move list, which means that some moves will be searched twice. I assume I am missing something?
(My engine, which uses YBWC, prevents this because the same MoveGenerator instance is shared by all parallel threads at the same depth (it is thread-safe), and it will not deliver the same move twice).

--Jon

Pio · Post by **Pio** » Sun Dec 17, 2017 10:02 pm

They will search different moves due to things like the history table and other structures shared by the threads that make difference in the move ordering and LMR

another thing that will prevent duplicate searches is to launch the threads with different depths

jdart · Post by **jdart** » Sun Dec 17, 2017 10:12 pm

History etc. may make the order of the moves different but don't change the fact that the overall set of moves is the same.

And some threads are being launched with the same depth.

--Jon

hgm · Post by **hgm** » Sun Dec 17, 2017 10:20 pm

LazySMP is not ABDADA. It is a lot lazier.

Pio · Post by **Pio** » Sun Dec 17, 2017 10:24 pm

The thing is that the depths will be different because the move order might change and that LMR is a function of move order. This in some way prevents the threads to search the same tree

CheckersGuy · Post by **CheckersGuy** » Sun Dec 17, 2017 10:57 pm

Pio wrote:The thing is that the depths will be different because the move order might change and that LMR is a function of move order. This in some way prevents the threads to search the same tree

On top of that I would add that they will search different moves because of thread scheduling and information shared with the transposition table.

D Sceviour · Post by **D Sceviour** » Mon Dec 18, 2017 12:32 am

jdart wrote: It looks like each worker has its own move generator,...

Almost. There are no slave/worker threads. Each thread acts independently with a shared hash table.

and if started at the same depth, they will produce the same moves, and search in parallel the same move list,...

Each thread might produce different moves and values due to different hash reads at different times. Peter Osterlund accepts the first PV returned and rejects all others.

which means that some moves will be searched twice. I assume I am missing something?

The search tries to avoid this, but the odd move still gets through at the root due to overlapping and racing conditions. A problem that is it is unclear what root aspiration values each thread should get, especially if racing conditions get two or more threads searching the same move.

sedicla · Post by **sedicla** » Mon Dec 18, 2017 1:05 am

I just released my engine with parallel search. I looked mostly at stockfish and demolito code.
My implementation uses shared hash table. Each thread starts at the same time, and for even threads (thread id 2, 4, 6...) I change the depth to be searched so it creates different "search spots".
The threads only shares the hash table, each one has its own search tables and move generation.
I think this is a very simple implementation and I want to do more research. It looks to me that in general there is not much cooperation between threads and there's something to gain in the area.

jdart · Post by **jdart** » Mon Dec 18, 2017 1:10 am

I think the YBWC improvements in recent Crafty (25.2) are quite interesting. Basically threads enter an idle loop where they scan a list of available split points and select the most promising one to join. Threads can create split points even pre-emptively when no other threads are available to service them at the moment.

The overall implementation is designed to boost efficiency by locking less and distributing the work of splitting a thread. (It also has some optional NUMA optimizations).

I have seen Crafty hit 40-50M nps even on fairly common hardware. Part of the speed is that it has a simpler search than most and does futility cutoffs based only on material, not on a full eval. But part of the speed is also the parallel search efficiency.

--Jon

sedicla · Post by **sedicla** » Mon Dec 18, 2017 1:25 am

Crafty approach seems interesting, I will take a look, specially how it defines promising split points.

Parallel search/LazySMP question

Parallel search/LazySMP question

Re: Parallel search/LazySMP question

Re: Parallel search/LazySMP question

Re: Parallel search/LazySMP question

Re: Parallel search/LazySMP question

Re: Parallel search/LazySMP question

Re: Parallel search/LazySMP question

Re: Parallel search/LazySMP question

Re: Parallel search/LazySMP question

Re: Parallel search/LazySMP question