Parallel iterative search function

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Parallel iterative search function

Post by Fabio Gobbato »

The DTS algorithm use an iterative search function where each thread can replace the others.
This is good because there never be a thread waiting others to finish his splitpoint.
It needs also more time to copy in the splitpoint all the stack.
Has anyone tried to use a parallel iterative search function in his engine?
Could be a good idea to try it?
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Parallel iterative search function

Post by Daniel Shawul »

I use parallel iterative search and am able to do certain things that you wouldn't be able to with a recursive search. For example a thread that started a split does not have to be the one that returns the result back to the parent. It may or may not be work the effort but you will learn something from implementing parallel search in a not so common way.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Parallel iterative search function

Post by bob »

Fabio Gobbato wrote:The DTS algorithm use an iterative search function where each thread can replace the others.
This is good because there never be a thread waiting others to finish his splitpoint.
It needs also more time to copy in the splitpoint all the stack.
Has anyone tried to use a parallel iterative search function in his engine?
Could be a good idea to try it?
Several have done it. Cray Blitz. Zappa. At least a couple of others but I don't recall specific names. The recursive negamax search is really nice and clean, but the iterative approach is actually faster. DTS doesn't have to copy any more or less than the normal recursive search when done in parallel. You simply have to copy the local state from the parent when a split is done.
User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Parallel iterative search function

Post by Fabio Gobbato »

Maybe I could be wrong but I think you should copy all the stack of the search because the last thread that close the splitpoint is the one that have to resume the search after the split call.
In a recursive search the same thread open and close the splitpoint.
User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Parallel iterative search function

Post by Fabio Gobbato »

Have you ever compared your engine with 1 thread and 4 thread?
What is the difference in ELO points?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Parallel iterative search function

Post by bob »

Fabio Gobbato wrote:Maybe I could be wrong but I think you should copy all the stack of the search because the last thread that close the splitpoint is the one that have to resume the search after the split call.
In a recursive search the same thread open and close the splitpoint.
In an iterated search there is no stack to copy. All of the various data items are simply defined as alpha[ply] rather than just alpha, and so forth. To back up one ply, just ply = ply -1 and you are there, to go to the next ply ply = ply + 1. When you split you have to copy the position so that it can be searched in parallel, but that is also done with an array.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Parallel iterative search function

Post by Dann Corbit »

Fabio Gobbato wrote:Have you ever compared your engine with 1 thread and 4 thread?
What is the difference in ELO points?
From CCRL 40/40:

Scorpio 2.7.6 64-bit 4CPU 2898 +32 -32 49.5% +3.8 43.3% 298
Scorpio 2.7.6 64-bit 2783 +18 -17 49.7% +2.1 36.2% 1069

diff = 2898 - 2783 = 115 Elo
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Parallel iterative search function

Post by Dann Corbit »

Dann Corbit wrote:
Fabio Gobbato wrote:Have you ever compared your engine with 1 thread and 4 thread?
What is the difference in ELO points?
From CCRL 40/40:

Scorpio 2.7.6 64-bit 4CPU 2898 +32 -32 49.5% +3.8 43.3% 298
Scorpio 2.7.6 64-bit 2783 +18 -17 49.7% +2.1 36.2% 1069

diff = 2898 - 2783 = 115 Elo
For comparison, Stockfish 6:

Stockfish 6 64-bit 4CPU 3311 +20 −20 69.6% −120.4 54.2% 784
Stockfish 6 64-bit 3225 +28 −27 72.2% −139.9 48.1% 422
3311 - 3225 = 86

Zappa Mexico II (known to scale particularly well):
Zappa Mexico II 64-bit 4CPU 2980 +10 −10 48.7% +7.4 48.1% 3198
Zappa Mexico II 64-bit 2866 +10 −10 47.9% +14.3 41.6% 3251

2980 - 2866 = 114

Komodo 8 (also known to scale particularly well):
Komodo 8 64-bit 4CPU 3303 +16 −16 70.1% −129.2 49.0% 1286
Komodo 8 64-bit 3220 +20 −19 69.9% −125.0 50.2% 821
3303 - 3220 = 83
User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Parallel iterative search function

Post by Fabio Gobbato »

Thank you, I'll try it
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Parallel iterative search function

Post by Adam Hair »

Dann Corbit wrote:
Dann Corbit wrote:
Fabio Gobbato wrote:Have you ever compared your engine with 1 thread and 4 thread?
What is the difference in ELO points?
From CCRL 40/40:

Scorpio 2.7.6 64-bit 4CPU 2898 +32 -32 49.5% +3.8 43.3% 298
Scorpio 2.7.6 64-bit 2783 +18 -17 49.7% +2.1 36.2% 1069

diff = 2898 - 2783 = 115 Elo
For comparison, Stockfish 6:

Stockfish 6 64-bit 4CPU 3311 +20 −20 69.6% −120.4 54.2% 784
Stockfish 6 64-bit 3225 +28 −27 72.2% −139.9 48.1% 422
3311 - 3225 = 86

Zappa Mexico II (known to scale particularly well):
Zappa Mexico II 64-bit 4CPU 2980 +10 −10 48.7% +7.4 48.1% 3198
Zappa Mexico II 64-bit 2866 +10 −10 47.9% +14.3 41.6% 3251

2980 - 2866 = 114

Komodo 8 (also known to scale particularly well):
Komodo 8 64-bit 4CPU 3303 +16 −16 70.1% −129.2 49.0% 1286
Komodo 8 64-bit 3220 +20 −19 69.9% −125.0 50.2% 821
3303 - 3220 = 83
Those Elo differences are not directly comparable because of the differing draw rates. They all probably scale to 4 CPUs roughly the same if we take into account the draw rates.