Lazy SMP in Cheng

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

mar
Posts: 1830
Joined: Fri Nov 26, 2010 1:00 pm

Lazy SMP in Cheng

Post by mar » Mon Feb 02, 2015 1:14 pm

Because some may find it useful, I will attempt to describe how I do lazy smp in cheng:

(actually I just found a bug where minimum qs depth is not copied to helpers before the start of new iteration so right now min qs depth is for helpers now actually depends on maximum depth reached in previous search, I'm not sure if/what impact this may have)

Code: Select all

IterativeDeepening:
    synchronize smp threads (copy age, board, history, repetition list, multipv => helpers)
    depth 1 with full width window on 1 thread
    loop (depth=2 .. max)
        AspirationLoop:
            (as usual)
            start helper threads( depth, alpha, beta )
            root( depth, alpha, beta)
            stop helper threads
            (rest as usual)
        end aspiration loop
    end id loop

Code: Select all

starting helper threads:
    clear smp abort flag
    for each helper thread:
        copy rootmoves and minimum qs depth => helper
        signal helper to start root search at current depth (add 1 for each even helper (assuming 0-based indexing) with aspiration alpha, beta bounds and wait until helper starts searching
note: helper threads run in infinite mode => no need to check for timeout

Code: Select all

aborting helper threads:
    set abort flag for each helper and wait for each to stop searching
The search (including root handling) is the same for master and helpers (no code duplication just a couple of extra trivial conditions).
helpers hold ref pointer to master (if it's 0 it's master) - actually a pointer to abort smp flag would do.
At the end of root search helpers simply set a special flag (abort smp flag) in master.
When master detects abort (smp) flag coming from the helpers (different from typical abort flag),
it scans through helpers and copies score/bestmove/PV from the first helper that finished root.

Pondering doesn't require any special handling (handled by master search thread).

I hope I didn't forget something important, perhaps it even makes sense :)

zullil
Posts: 4878
Joined: Mon Jan 08, 2007 11:31 pm
Location: PA USA

Re: Lazy SMP in Cheng

Post by zullil » Mon Feb 02, 2015 2:39 pm

Maybe someone can implement this in Stockfish, since the current implementation seems to scale badly from 8 to 16 threads. Just a thought, for any talented folks who have a lot of free time ...

elcabesa
Posts: 638
Joined: Sun May 23, 2010 11:32 am
Contact:

Re: Lazy SMP in Cheng

Post by elcabesa » Mon Feb 02, 2015 7:53 pm

I haven't never studied your smp code.
Reading your description I can't understood if the root code simply wait for the other to end is search and save the result of the first one who end the search.
Am' i wrong?

mar
Posts: 1830
Joined: Fri Nov 26, 2010 1:00 pm

Re: Lazy SMP in Cheng

Post by mar » Mon Feb 02, 2015 8:02 pm

It doesn't wait. Root position is searched by all helper threads at once (including master), if any of the helpers finishes root before master, it notifies the master to abort search.
After master completes root, if it was signalled it grabs the result from the helper that finished (otherwise it keeps its own).
Then all helpers are aborted and you continue standard aspiration (and ID) loop.

elcabesa
Posts: 638
Joined: Sun May 23, 2010 11:32 am
Contact:

Re: Lazy SMP in Cheng

Post by elcabesa » Mon Feb 02, 2015 8:11 pm

ok, now I understand

mar
Posts: 1830
Joined: Fri Nov 26, 2010 1:00 pm

Re: Lazy SMP in Cheng

Post by mar » Mon Feb 02, 2015 8:34 pm

I'm not sure it would scale better, but certainly it would be interesting to compare.

Joerg Oster
Posts: 609
Joined: Fri Mar 10, 2006 3:29 pm
Location: Germany

Re: Lazy SMP in Cheng

Post by Joerg Oster » Mon Feb 02, 2015 9:42 pm

mar wrote:I'm not sure it would scale better, but certainly it would be interesting to compare.
At least lazy smp wouldn't suffer from 2 limitations as the current implementation.
1. takes some time to fully kick in because of the min split depth parameter
2. not enough workload especially for threads >= 8 due to heavy pruning and reductions
Jörg Oster

petero2
Posts: 558
Joined: Mon Apr 19, 2010 5:07 pm
Location: Sweden
Contact:

Re: Lazy SMP in Cheng

Post by petero2 » Mon Feb 02, 2015 10:56 pm

Joerg Oster wrote:
mar wrote:I'm not sure it would scale better, but certainly it would be interesting to compare.
At least lazy smp wouldn't suffer from 2 limitations as the current implementation.
1. takes some time to fully kick in because of the min split depth parameter
2. not enough workload especially for threads >= 8 due to heavy pruning and reductions
It seems 1 is not a big problem for YBWC either. I ran some tests with stockfish 6 at time control 1s+0.08s/move, 4 cores vs 1 core:

Code: Select all

Rank Name     Elo    +    - games score oppo. draws 
   1 sf6_4c    70    7    7  2282   73%   -70   46% 
   2 sf6      -70    7    7  2282   27%    70   46% 

bob
Posts: 20340
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Lazy SMP in Cheng

Post by bob » Mon Feb 02, 2015 11:00 pm

Joerg Oster wrote:
mar wrote:I'm not sure it would scale better, but certainly it would be interesting to compare.
At least lazy smp wouldn't suffer from 2 limitations as the current implementation.
1. takes some time to fully kick in because of the min split depth parameter
2. not enough workload especially for threads >= 8 due to heavy pruning and reductions
Nothing is free. Overhead climbs quickly because you are searching many more nodes at a particular depth than a sequential search would examine. YBW is the correct way to do parallel search and minimize overhead. I've yet to see any non-YBW-based algorithm produce even a 4x speedup at 8 cores (over 1 core to same depth). There's a reason many of us have invested all of the time we have spent in doing a traditional splitting algorithm - performance.

I've not had any problems at all playing bullet games with 12 cores. And the key point is that whatever the 12 cores do is generally important work. It is easy to keep all 12 cores busy, but if they are doing redundant work, what's the point?

Joerg Oster
Posts: 609
Joined: Fri Mar 10, 2006 3:29 pm
Location: Germany

Re: Lazy SMP in Cheng

Post by Joerg Oster » Mon Feb 02, 2015 11:15 pm

petero2 wrote:
Joerg Oster wrote:
mar wrote:I'm not sure it would scale better, but certainly it would be interesting to compare.
At least lazy smp wouldn't suffer from 2 limitations as the current implementation.
1. takes some time to fully kick in because of the min split depth parameter
2. not enough workload especially for threads >= 8 due to heavy pruning and reductions
It seems 1 is not a big problem for YBWC either. I ran some tests with stockfish 6 at time control 1s+0.08s/move, 4 cores vs 1 core:

Code: Select all

Rank Name     Elo    +    - games score oppo. draws 
   1 sf6_4c    70    7    7  2282   73%   -70   46% 
   2 sf6      -70    7    7  2282   27%    70   46% 
Thank you, Peter.
Yes, 4-core performance is very good, but how does it look like with 8 and 16 cores?
Jörg Oster

Post Reply