Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
fastgm
Posts: 405
Joined: Mon Aug 19, 2013 4:57 pm
Contact:

Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by fastgm » Fri May 05, 2017 8:28 am

I did some new tests on Symmetric multiprocessing (SMP) scaling
Here the results, also as PDF-File:
http://www.fastgm.de/schach/SMP-scaling.pdf


Symmetric multiprocessing (SMP) scaling
Stockfish 8 and Komodo 10.4 under Windows & Linux


Windows: Windows 10 Professional 64-Bit, Dual AMD Opteron 6376 @ 2.3 GHz
Linux: Ubuntu Server 16.04 LTS (HVM) 64-Bit, Amazon EC2 Instance, m4.16xlarge, Intel Xeon E5-2686v4 @ 2.3 GHz

Engines: default settings, 128 MB Hash
Cutechess-Cli: no draw and resign rules, no ponder, no learning, no tablebases, 1500 different opening positions, changing colors
TC = time control, T = number of threads, Elostat Start Elo = 3000

Code: Select all

Windows – 1 thread vs 2 threads
 
TC = 60" + 0.05"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth  
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T2  : 3031   6   6   3000    58.7 %   73.7 %  21.59         1 Komodo 10.4 T2  : 3044   8   8   3000    62.3 %   59.1 %  18.48
  2 Stockfish 8 T1  : 2969   6   6   3000    41.3 %   73.7 %  20.33         2 Komodo 10.4 T1  : 2956   8   8   3000    37.7 %   59.1 %  17.44
 
  Result     : 1761.0/3000 (+655,=2212,-133)                                Result     : 1868.5/3000 (+982,=1773,-245)
  Perf.      : 58.7 %                                                       Perf.      : 62.3 %
  Elo        : 3061                                                         Elo        : 3087
 
----------------------------------------------------------------------------------------------------------------------------------------------
 
Linux – 1 thread vs 2 threads
 
TC = 10" + 0.1"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T2  : 3039   7   7   3000    61.0 %   68.1 %  19.84         1 Komodo 10.4 T2  : 3044   8   8   3000    62.3 %   55.2 %  17.02
  2 Stockfish 8 T1  : 2961   7   7   3000    39.0 %   68.1 %  18.73         2 Komodo 10.4 T1  : 2956   8   8   3000    37.7 %   55.2 %  16.06
 
  Result     : 1830.0/3000 (+809,=2042,-149)                                Result     : 1868.5/3000 (+1040,=1657,-303)
  Perf.      : 61.0 %                                                       Perf.      : 62.3 %
  Elo        : 3078                                                         Elo        : 3087

---------------------------------------------------------------------------------------------------------------------------------------------- 
 
Windows – 1 thread vs 4 threads
 
TC = 60" + 0.05"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth 
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T4  : 3056   7   7   3000    65.6 %   65.7 %  23.08         1 Komodo 10.4 T4  : 3076   8   8   3000    70.6 %   51.5 %  20.04
  2 Stockfish 8 T1  : 2944   7   7   3000    34.4 %   65.7 %  20.66         2 Komodo 10.4 T1  : 2924   8   8   3000    29.4 %   51.5 %  17.71

  Result     : 1968.0/3000 (+982,=1972,-46)                                 Result     : 2118.0/3000 (+1345,=1546,-109)
  Perf.      : 65.6 %                                                       Perf.      : 70.6 % 
  Elo        : 3112                                                         Elo        : 3152

---------------------------------------------------------------------------------------------------------------------------------------------- 
 
Linux – 1 thread vs 4 threads
 
TC = 10" + 0.1"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T4  : 3064   7   7   3000    67.6 %   61.2 %  21.41         1 Komodo 10.4 T4  : 3084   9   9   3000    72.4 %   47.4 %  18.51
  2 Stockfish 8 T1  : 2936   7   7   3000    32.4 %   61.2 %  19.39         2 Komodo 10.4 T1  : 2916   9   9   3000    27.7 %   47.4 %  16.29

  Result     : 2028.5/3000 (+1111,=1835,-54)                                Result     : 2170.5/3000 (+1459,=1423,-118)
  Perf.      : 67.6 %                                                       Perf.      : 72.4 %
  Elo        : 3128                                                         Elo        : 3167

---------------------------------------------------------------------------------------------------------------------------------------------- 
  
Windows – 1 thread vs 8 threads
 
TC = 60" + 0.05"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth 
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T8  : 3079   8   8   3000    71.2 %   56.4 %  23.87         1 Komodo 10.4 T8  : 3104   9   9   3000    76.8 %   42.7 %  21.19
  2 Stockfish 8 T1  : 2921   8   8   3000    28.8 %   56.4 %  20.40         2 Komodo 10.4 T1  : 2896   9   9   3000    23.2 %   42.7 %  17.82
 
  Result     : 2135.5/3000 (+1289,=1693,-18)                                Result     : 2305.5/3000 (+1665,=1281,-54)
  Perf.      : 71.2 %                                                       Perf.      : 76.8 %
  Elo        : 3157                                                         Elo        : 3208

---------------------------------------------------------------------------------------------------------------------------------------------- 
 
Linux – 1 thread vs 8 threads
 
TC = 10" + 0.1"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T8  : 3093   8   8   3000    74.4 %   50.4 %  22.35         1 Komodo 10.4 T8  : 3121   10  10  3000    80.1 %   35.8 %  19.47
  2 Stockfish 8 T1  : 2907   8   8   3000    25.6 %   50.4 %  19.38         2 Komodo 10.4 T1  : 2879   10  10  3000    19.9 %   35.8 %  16.25
 
  Result     : 2232.5/3000 (+1477,=1511,-12)                                Result     : 2404.0/3000 (+1867,=1074,-59)
  Perf.      : 74.4 %                                                       Perf.      : 80.1 %
  Elo        : 3185                                                         Elo        : 3242

----------------------------------------------------------------------------------------------------------------------------------------------
 
Windows – 1 thread vs 16 threads
 
TC = 60" + 0.05"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth 
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T16 : 3101   9   9   3000    76.1 %   47.4 %  24.93         1 Komodo 10.4 T16 : 3140   11  11  3000    83.3 %   32.0 %  21.31
  2 Stockfish 8 T1  : 2899   9   9   3000    23.9 %   47.4 %  20.16         2 Komodo 10.4 T1  : 2860   11  11  3000    16.7 %   32.0 %  17.38

  Result     : 2283.0/3000 (+1572,=1422,-6)                                 Result     : 2499.5/3000 (+2020,=959,-21)
  Perf.      : 76.1 %                                                       Perf.      : 83.3 %
  Elo        : 3201                                                         Elo        : 3279

----------------------------------------------------------------------------------------------------------------------------------------------
 
Linux – 1 thread vs 16 threads
 
TC = 10" + 0.1"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T16 : 3113   13  13  1516    78.7 %   42.0 %  23.33         1 Komodo 10.4 T16 : 3163   18  17  1514    86.7 %   24.8 %  20.42
  2 Stockfish 8 T1  : 2887   13  13  1516    21.3 %   42.0 %  19.33         2 Komodo 10.4 T1  : 2837   17  18  1514    13.3 %   24.8 %  16.12

  Result     : 1192.5/1516 (+874,=637,-5)                                   Result     : 1313.0/1514 (+1125,=376,-13)
  Perf.      : 78.7 %                                                       Perf.      : 86.7 %
  Elo        : 3227                                                         Elo        : 3326

----------------------------------------------------------------------------------------------------------------------------------------------
 
Linux – 1 thread vs 32 threads
 
TC = 10" + 0.1"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T32 : 3115   13  13  1502    79.0 %   41.5 %  23.03         1 Komodo 10.4 T32 : 3172   18  18  1502    87.9 %   23.4 %  20.94
  2 Stockfish 8 T1  : 2885   13  13  1502    21.0 %   41.5 %  19.45         2 Komodo 10.4 T1  : 2828   18  18  1502    12.1 %   23.4 %  16.22

  Result     : 1186.0/1502 (+874,=624,-4)                                   Result     : 1320.5/1502 (+1145,=351,-6)
  Perf.      : 79.0 %                                                       Perf.      : 87.9 %
  Elo        : 3230                                                         Elo        : 3345

----------------------------------------------------------------------------------------------------------------------------------------------
 
Windows – 2 threads vs 4 threads
 
TC = 60" + 0.05"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth 
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T4  : 3027   6   6   3000    57.7 %   76.5 %  22.95         1 Komodo 10.4 T4  : 3043   8   8   3000    62.2 %   60.7 %  19.82
  2 Stockfish 8 T2  : 2973   6   6   3000    42.3 %   76.5 %  21.61         2 Komodo 10.4 T2  : 2957   8   8   3000    37.8 %   60.7 %  18.23

  Result     : 1732.0/3000 (+584,=2296,-120)                                Result     : 1867.0/3000 (+956,=1822,-222)
  Perf.      : 57.7 %                                                       Perf.      : 62.2 %
  Elo        : 3054                                                         Elo        : 3087

----------------------------------------------------------------------------------------------------------------------------------------------
 
Linux – 2 threads vs 4 threads
 
TC = 10" + 0.1"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T4  : 3032   9   9   1570    59.2 %   71.6 %  21.32         1 Komodo 10.4 T4  : 3033   11  11  1512    59.5 %   56.9 %  18.21
  2 Stockfish 8 T2  : 2968   9   9   1570    40.8 %   71.6 %  20.28         2 Komodo 10.4 T2  : 2967   11  11  1512    40.5 %   56.9 %  17.08

  Result     : 929.0/1570 (+367,=1124,-79)                                  Result     : 900.0/1512 (+470,=860,-182)
  Perf.      : 59.2 %                                                       Perf.      : 59.5 %
  Elo        : 3064                                                         Elo        : 3067

----------------------------------------------------------------------------------------------------------------------------------------------
 
Windows – 4 threads vs 8 threads
 
TC = 60" + 0.05"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth 
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T8  : 3024   6   6   3000    56.9 %   78.5 %  24.39         1 Komodo 10.4 T8  : 3037   7   7   3000    60.4 %   64.4 %  20.70
  2 Stockfish 8 T4  : 2976   6   6   3000    43.1 %   78.5 %  23.11         2 Komodo 10.4 T4  : 2963   7   7   3000    39.6 %   64.4 %  19.59

  Result     : 1707.0/3000 (+529,=2356,-115)                                Result     : 1812.5/3000 (+847,=1931,-222)
  Perf.      : 56.9 %                                                       Perf.      : 60.4 %
  Elo        : 3048                                                         Elo        : 3073

----------------------------------------------------------------------------------------------------------------------------------------------
 
Linux – 4 threads vs 8 threads
 
TC = 10" + 0.1"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T8  : 3023   8   8   1510    56.6 %   76.8 %  22.87         1 Komodo 10.4 T8  : 3029   10  10  1510    58.3 %   64.0 %  19.49
  2 Stockfish 8 T4  : 2977   8   8   1510    43.4 %   76.8 %  21.96         2 Komodo 10.4 T4  : 2971   10  10  1510    41.7 %   64.0 %  18.78

  Result     : 854.5/1510 (+275,=1159,-76)                                  Result     : 881.0/1510 (+398,=966,-146)
  Perf.      : 56.6 %                                                       Perf.      : 58.3 %
  Elo        : 3046                                                         Elo        : 3059

----------------------------------------------------------------------------------------------------------------------------------------------
 
Windows – 8 threads vs 16 threads
 
TC = 60" + 0.05"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T16 : 3023   5   5   3000    56.5 %   80.0 %  25.68         1 Komodo 10.4 T16 : 3033   7   7   3000    59.2 %   65.6 %  21.22
  2 Stockfish 8 T8  : 2977   5   5   3000    43.5 %   80.0 %  24.31         2 Komodo 10.4 T8  : 2967   7   7   3000    40.8 %   65.6 %  20.47

  Result     : 1694.0/3000 (+494,=2400,-106)                                Result     : 1777.5/3000 (+793,=1969,-238)
  Perf.      : 56.5 %                                                       Perf.      : 59.2 %
  Elo        : 3045                                                         Elo        : 3065

----------------------------------------------------------------------------------------------------------------------------------------------

Linux – 8 threads vs 16 threads
 
TC = 10" + 0.1"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
 1 Stockfish 8 T16  : 3016   8   8   1514    54.5 %   79.4 %  24.20         1 Komodo 10.4 T16 : 3022   10  10  1512    56.3 %   65.3 %  20.56
 2 Stockfish 8 T8   : 2984   8   8   1514    45.5 %   79.4 %  23.73         2 Komodo 10.4 T8  : 2978   10  10  1512    43.7 %   65.3 %  19.88

 Result     : 825.0/1514 (+224,=1202,-88)                                   Result     : 851.5/1512 (+358,=987,-167)
 Perf.      : 54.5 %                                                        Perf.      : 56.3 %
 Elo        : 3031                                                          Elo        : 3044

----------------------------------------------------------------------------------------------------------------------------------------------
 
Linux – 16 threads vs 32 threads
 
TC = 10" + 0.1"
 
    Program           Elo    +   -   Games   Score    Draws   Depth           Program           Elo    +   -   Games   Score    Draws   Depth
 -------------------------------------------------------------------       -------------------------------------------------------------------
  1 Stockfish 8 T32 : 3004   8   7   1254    51.3 %   84.8 %  25.62         1 Komodo 10.4 T32 : 3004   11  11  1156    51.1 %   70.2 %  20.86
  2 Stockfish 8 T16 : 2996   7   8   1254    48.7 %   84.8 %  25.92         2 Komodo 10.4 T16 : 2996   11  11  1156    48.9 %   70.2 %  21.37

  Result     : 643.0/1254 (+111,=1064,-79)                                  Result     : 591.0/1156 (+185,=812,-159)
  Perf.      : 51.3 %                                                       Perf.      : 51.1 %
  Elo        : 3009                                                         Elo        : 3008
.

User avatar
lucasart
Posts: 3037
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by lucasart » Sat May 06, 2017 12:11 am

Interesting how the operating system makes a difference. Once more, we see how Windows sucks and Linux rules 8-)

As for SF vs. Komodo SMP scaling, I doubt there's a meaningful difference. Although Komodo is very secretive about it, I'm pretty convinced it's more or less the same as SF: lazy SMP with ply skipping strategie. Why ? Because that's what works best in testing, and all engines are now switching to this (not only because it's easier to code, but also because it works better than split search).

2 things (other than SMP) can explain why Komodo scores better here:
* Contempt distortion
* Better scaling on single core (I think that's where K really shines)
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

mjlef
Posts: 1424
Joined: Thu Mar 30, 2006 12:08 pm
Contact:

Re: Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by mjlef » Sat May 06, 2017 1:10 am

lucasart wrote:Interesting how the operating system makes a difference. Once more, we see how Windows sucks and Linux rules 8-)

As for SF vs. Komodo SMP scaling, I doubt there's a meaningful difference. Although Komodo is very secretive about it, I'm pretty convinced it's more or less the same as SF: lazy SMP with ply skipping strategie. Why ? Because that's what works best in testing, and all engines are now switching to this (not only because it's easier to code, but also because it works better than split search).

2 things (other than SMP) can explain why Komodo scores better here:
* Contempt distortion
* Better scaling on single core (I think that's where K really shines)
It is hard to know all the causes why the elo changes happen with more threads. There is an effect both due to search widening, and to the fact that many threads working on a position have some of the same consequences of one thread taking more time. We believe Komodo scales well with more time, so some of the more threads scaling is due to that. But it grows to over 100 elo difference with enough cores, and that cannot be mostly due to "more time". We think it is due to our SMP search scheme.

As for the guess of "lazy SMP with ply skipping strategie(sic.)", well Larry has admitted in the past we use a kind of "Lazy SMP", and have since I started working on Komodo over 3 and a half years ago. We owe this to Don trying something people dismissed a bit too quickly, but which most of the top programs now use. But we also do several other things, but not the things Stockfish does. I did try a "ply skipping scheme", but it worked worse that what we are using. It is not just one thing, it is a bunch of things. And we change it from time to time as we learn more. Basically, there is more than one way to skin a cat (but all of them are messy). I like the simplicity of Stockfish's scheme. It is a pity it does not work for Komodo as well as stuff requiring a lot more complex code.

Andreas did not mention what values for Contempt he used in these runs. When running against Stockfish 8, we recommend using Contempt=0, and the same for Komodo against Komodo. There should be some elo gain when using a positive Contempt against a lower thread program since positive Contempt discourages Komodo from trading pieces, and against a weaker opponent, helps avoid some draws. I do not think this could amount to 100+ elo though. It would be great if Andreas could repeat some of the higher thread difference runs with Komodo's Contempt set to zero to characterize what the elo effect would be. Based on what I know about Komodo, I would expect a zero Contempt version to still scale better. But that is an educated guess and I could be wrong.

Andreas is a real "chess scientist". He does great tests like this which teach us a lot. Thanks!

Michel
Posts: 2040
Joined: Sun Sep 28, 2008 11:50 pm

Re: Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by Michel » Sat May 06, 2017 6:12 am

mjlef wrote: We owe this to Don trying something people dismissed a bit too quickly.
This is distorting history in a big way. Lazy SMP was used by Toga from the start (long before Komodo had SMP) and one could easily verify on the rating lists that it had the same scaling 1->4 cores as engines that used YBW. This was pointed out regularly here.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

mar
Posts: 1992
Joined: Fri Nov 26, 2010 1:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by mar » Sat May 06, 2017 9:30 am

Michel wrote:
mjlef wrote: We owe this to Don trying something people dismissed a bit too quickly.
This is distorting history in a big way. Lazy SMP was used by Toga from the start (long before Komodo had SMP) and one could easily verify on the rating lists that it had the same scaling 1->4 cores as engines that used YBW. This was pointed out regularly here.
Let's do some history then:

- The term Lazy SMP was coined by Julien Marcel (originally it was about parallelizing evaluation).
- Dan Homan (ExChess) liked the name and a while later reported his success with shared TT (this is what was used in Toga) plus varying depth for each other helper.
- Of course people like Bob kept saying that it "doesn't work" even though there was evidence to the contrary

syzygy
Posts: 4451
Joined: Tue Feb 28, 2012 10:56 pm

Re: Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by syzygy » Sat May 06, 2017 11:19 am

It would surprise me if the shared TT approach is not way older than Toga's implementation.

If I remember well, before the "lazy smp" variation came up, the shared TT approach was ‒ rightly or wrongly ‒ nearly universally dismissed as a poor man's attempt at SMP that could perhaps do OK at 2 threads but would give no benefit beyond that.

Dan Homan's idea and talkchess thread seem to have revived the approach (with some new ideas).

Don, who seemed to have had trouble debugging a YBWC implementation, then decided to give lazy smp a shot. It turned out to do better on modern hardware than the existing YBWC implementations in other top engines. So Don had no reason to go back to YBWC and instead continued to improve his lazy smp implementation.

My speculation: if Don had more quickly succeeded at getting YBWC run reliably, all top engines would still be doing YBWC.

User avatar
lucasart
Posts: 3037
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by lucasart » Sat May 06, 2017 12:48 pm

mar wrote: - The term Lazy SMP was coined by Julien Marcel (originally it was about parallelizing evaluation).
The term isn't important. And it's not a very good one, because it conveys a pejorative connotation, that was detrimental to its adoption (ie. people dismissing it as "too simple to work", and didn't bother to try).
mar wrote: - Dan Homan (ExChess) liked the name and a while later reported his success with shared TT (this is what was used in Toga) plus varying depth for each other helper.
Indeed, lazy SMP is *at least* as old as Toga II. Most likely it was tried decades before, because it's so simple and natural.
With more threads you need to distribute the threads across depths, in order to improve scaling. That's a natural extension, and we can thank Daniel Homan for introducing the idea.
mar wrote: Of course people like Bob kept saying that it "doesn't work" even though there was evidence to the contrary
Yes, he doesn't understand the subtle difference between "I tried but couldn't make it work", and "it doesn't work". After all these years, you'd think he'd learn...

Here's another trick for you, which worked well in testing for my engine. I don't claim to have invented the wheel, and I'm sure others have done it before, but here it is. When a thread completes a depth, it signals all other threads working on that depth or lower to stop immediately, and report back to base (to find a useful depth to work on, which is where your depth skipping strategy comes into play). Stockfish doesn't do that. Code is here: https://github.com/lucasart/demolito, it's in search.cc iterate function (where signals are raised) and the recursive search is in recurse.h (where signals are listened to).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

mar
Posts: 1992
Joined: Fri Nov 26, 2010 1:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by mar » Sat May 06, 2017 1:50 pm

lucasart wrote:Here's another trick for you, which worked well in testing for my engine. I don't claim to have invented the wheel, and I'm sure others have done it before, but here it is. When a thread completes a depth, it signals all other threads working on that depth or lower to stop immediately, and report back to base (to find a useful depth to work on, which is where your depth skipping strategy comes into play). Stockfish doesn't do that. Code is here: https://github.com/lucasart/demolito, it's in search.cc iterate function (where signals are raised) and the recursive search is in recurse.h (where signals are listened to).
Yes, this is almost exactly what I'm doing in Cheng since the early days => whenever a helper finishes iteration I notify the master to abort, grab the result and continue next iteration.
(I've noticed during analysis that sometimes master can take much longer to finish than one of the helpers).

The only difference is that I don't continue at d+1 where the helper finished (simply continue master depth+1; in my case that's the lowest depth), so it's something worth trying, however I don't work on it anymore so it's up to others to try.

User avatar
cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by cdani » Sat May 06, 2017 3:40 pm

mar wrote:
lucasart wrote:Here's another trick for you, which worked well in testing for my engine. I don't claim to have invented the wheel, and I'm sure others have done it before, but here it is. When a thread completes a depth, it signals all other threads working on that depth or lower to stop immediately, and report back to base (to find a useful depth to work on, which is where your depth skipping strategy comes into play). Stockfish doesn't do that. Code is here: https://github.com/lucasart/demolito, it's in search.cc iterate function (where signals are raised) and the recursive search is in recurse.h (where signals are listened to).
Yes, this is almost exactly what I'm doing in Cheng since the early days => whenever a helper finishes iteration I notify the master to abort, grab the result and continue next iteration.
(I've noticed during analysis that sometimes master can take much longer to finish than one of the helpers).

The only difference is that I don't continue at d+1 where the helper finished (simply continue master depth+1; in my case that's the lowest depth), so it's something worth trying, however I don't work on it anymore so it's up to others to try.
I do also this of to stop all other threads when one finishes in Andscacs. I have not tried to stop only the ones that are working on <= depth.

And I tried the two ideas, to continue with master depth + 1 (currently what I'm doing), or with slave depth + 1. The second was dismissed but more by intuition than by serious testing, as some times it resulted in skipping 3 or more iterations and I found it pretty strange; I suppose I will try this again at some point.

mjlef
Posts: 1424
Joined: Thu Mar 30, 2006 12:08 pm
Contact:

Re: Symmetric multiprocessing (SMP) scaling - SF8 and K10.4

Post by mjlef » Sat May 06, 2017 4:16 pm

Michel wrote:
mjlef wrote: We owe this to Don trying something people dismissed a bit too quickly.
This is distorting history in a big way. Lazy SMP was used by Toga from the start (long before Komodo had SMP) and one could easily verify on the rating lists that it had the same scaling 1->4 cores as engines that used YBW. This was pointed out regularly here.
I never claimed all programmers dismissed Lazy SMP, but certainly a lot of them did. There are many threads here along the topic. Cheng was another program that used it. If you look at the various threads, Kai noted Komodo's nps scaling was like Cheng's, and Cheng was know to use Lazy SMP. I am not on the Stockfish team, but it seems likely programmers there tried Lazy SMP because of these postings. But you would have to add them.

Of note, YBW does not scale the same as Lazy SMP, even for 2-4 cores. YBW often has threads just waiting (although the Stockfish guys Late Join was one way to keep them better used).

Post Reply