Scaling of engines from FGRL rating list

Laskos · Post by **Laskos** » Fri Apr 07, 2017 9:49 pm

The excellent FGRL rating list (http://www.fastgm.de/index.html) contains two Top 10 rating lists for 10' + 6'' and 60' + 15'' TC with identical engines on one core. We can make direct comparisons of engine performances.

1/
10' + 6''

Code: Select all

10' + 6''

Ordo v1.0.9.2&#58; 3000

     Engine              &#58;    Elo   Diff   Error    Points    (%)       W      D      L     D&#40;%)   CFS      W/L
  ------------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8         &#58;   3151      0       9    1916.0   70.96    1209   1414     77   52.37    89     15.70
   2 Komodo 10.4         &#58;   3143     -8       9    1889.0   69.96    1224   1330    146   49.26    63      8.38
   3 Houdini 5.01        &#58;   3141    -10       8    1882.0   69.70    1193   1378    129   51.04   100      9.25
   4 Deep Shredder 13    &#58;   3009   -142       8    1390.0   51.48     630   1520    550   56.30   100      1.145
   5 Fire 5              &#58;   2983   -168       8    1289.0   47.74     542   1494    664   55.33   100      0.816
   6 Fizbo 1.9           &#58;   2957   -194       8    1186.0   43.93     476   1420    804   52.59   100      0.592
   7 Gull 3              &#58;   2941   -210       8    1125.0   41.67     399   1452    849   53.78   100      0.470
   8 Andscacs 0.89       &#58;   2901   -250       8     975.5   36.13     330   1291   1079   47.81    98      0.306
   9 Fritz 15            &#58;   2889   -262       8     930.0   34.44     282   1296   1122   48.00    72      0.251
  10 Chiron 4            &#58;   2885   -266       8     917.5   33.98     271   1293   1136   47.89   ---      0.239

White advantage = 40.58 +/- 2.07
Draw rate &#40;equal opponents&#41; = 63.46 % +/- 0.53

2/
60' + 15''

Code: Select all

60' + 15''

 Ordo v1.2.6&#58; 3000

     Engine              &#58;    Elo   Diff   Error   Points    (%)      W      D      L     D&#40;%)   CFS     W/L
 ----------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8         &#58;   3146      0      12    950.5   70.41    587    727     36   53.85    51    16.31
   2 Komodo 10.4         &#58;   3146      0      12    950.0   70.37    615    670     65   49.63   100     9.46
   3 Houdini 5.01        &#58;   3119    -27      11    903.5   66.93    516    775     59   57.41   100     8.74
   4 Deep Shredder 13    &#58;   3015   -131      11    706.5   52.33    304    805    241   59.63    99     1.261
   5 Fire 5              &#58;   2997   -149      10    670.5   49.67    287    767    296   56.81   100     0.970
   6 Fizbo 1.9           &#58;   2949   -197      11    577.5   42.78    208    739    403   54.74    83     0.516
   7 Gull 3              &#58;   2941   -205      11    562.5   41.67    172    781    397   57.85    97     0.433
   8 Andscacs 0.89       &#58;   2926   -220      11    533.0   39.48    176    714    460   52.89   100     0.383
   9 Chiron 4            &#58;   2885   -261      11    457.0   33.85    126    662    562   49.04    88     0.224
  10 Fritz 15            &#58;   2875   -271      11    439.0   32.52    106    666    578   49.33   ---     0.183

White advantage = 39.23 +/- 2.84
Draw rate &#40;equal opponents&#41; = 66.78 % +/- 0.74

Elo is not an adequate parametrization of the scaling. Rating at longer time controls is subjected to Elo compression, due to increasing draw rate. So, a weaker engine might appear to approach Elo-wise a stronger one (relatively gain strength), but this might be just due to the increasing number of draws, without affecting the relative strength. More related to relative strength is Win/Loss rate for every engine in the list. Here I post the rating list of scaling of engines in Win/Loss ratios from Blitz TC to Long TC. Also log10 list for ratings to be additive.

Scaling to Long Time Control on one core:

Code: Select all

     Engine                 Scaling = &#40;W2*L1&#41;/&#40;W1*L2&#41;    100*log10&#40;Scaling&#41;
  ------------------------------------------------------------------------------------
   1 Andscacs 0.89       &#58;          1.252                     9.76 
   2 Fire 5              &#58;          1.189                     7.52
   3 Komodo 10.4         &#58;          1.129                     5.27
   4 Deep Shredder 13    &#58;          1.101                     4.18
   5 Stockfish 8         &#58;          1.039                     1.66
   6 Houdini 5.01        &#58;          0.945                    -2.46
   7 Chiron 4            &#58;          0.937                    -2.83
   8 Gull 3              &#58;          0.921                    -3.57
   9 Fizbo 1.9           &#58;          0.872                    -5.95
  10 Fritz 15            &#58;          0.729                   -13.73

Dann Corbit · Post by **Dann Corbit** » Fri Apr 07, 2017 10:08 pm

So using this measure, Andscacs scales best with longer time and Fritz the worst.

Laskos · Post by **Laskos** » Fri Apr 07, 2017 10:16 pm

Dann Corbit wrote:So using this measure, Andscacs scales best with longer time and Fritz the worst.

Yes.

fastgm · Post by **fastgm** » Fri Apr 07, 2017 10:49 pm

Hello Kai,

thank you very much for the comparison.
Here the data from my third rating list 60" + 0.6":

Code: Select all

60'' + 0.6''

Ordo v1.2.6&#58; 3000 

     Engine                  &#58;    Elo   Diff   Games    Points    (%)      W       D      L     D&#40;%)     W/L
 ----------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8             &#58;   3208      0    2250    1722.0   76.53    1308    828    114   36.80    11.47
   2 Houdini 5               &#58;   3205     -3    2250    1714.0   76.18    1319    790    141   35.11     9.35
   3 Komodo 10.4             &#58;   3184    -24    2250    1663.0   73.91    1263    800    187   35.56     6.75
   4 Deep Shredder 13        &#58;   3004   -204    2250    1148.0   51.02     675    946    629   42.04     1.073
   5 Fire 5                  &#58;   2973   -235    2250    1053.0   46.80     635    836    779   37.16     0.815
   6 Fizbo 1.9               &#58;   2947   -261    2250     974.0   43.29     575    798    877   35.47     0.656
   7 Gull 3                  &#58;   2918   -290    2250     884.5   39.31     459    851    940   37.82     0.489
   8 Fritz 15                &#58;   2858   -350    2250     711.0   31.60     337    748   1165   33.24     0.289
   9 Andscacs 0.89           &#58;   2858   -350    2250     708.5   31.49     372    673   1205   29.91     0.309   
  10 Chiron 4                &#58;   2844   -364    2250     672.0   29.87     291    762   1197   33.87     0.243

White advantage = 34.62
Draw rate &#40;equal opponents&#41; = 46.34 %

Laskos · Post by **Laskos** » Fri Apr 07, 2017 10:55 pm

fastgm wrote:Hello Kai,

thank you very much for the comparison.
Here the data from my third rating list 60" + 0.6":

Code: Select all

60'' + 0.6''

Ordo v1.2.6&#58; 3000 

     Engine                  &#58;    Elo   Diff   Games    Points    (%)      W       D      L     D&#40;%)     W/L
 ----------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8             &#58;   3208      0    2250    1722.0   76.53    1308    828    114   36.80    11.47
   2 Houdini 5               &#58;   3205     -3    2250    1714.0   76.18    1319    790    141   35.11     9.35
   3 Komodo 10.4             &#58;   3184    -24    2250    1663.0   73.91    1263    800    187   35.56     6.75
   4 Deep Shredder 13        &#58;   3004   -204    2250    1148.0   51.02     675    946    629   42.04     1.073
   5 Fire 5                  &#58;   2973   -235    2250    1053.0   46.80     635    836    779   37.16     0.815
   6 Fizbo 1.9               &#58;   2947   -261    2250     974.0   43.29     575    798    877   35.47     0.656
   7 Gull 3                  &#58;   2918   -290    2250     884.5   39.31     459    851    940   37.82     0.489
   8 Fritz 15                &#58;   2858   -350    2250     711.0   31.60     337    748   1165   33.24     0.289
   9 Andscacs 0.89           &#58;   2858   -350    2250     708.5   31.49     372    673   1205   29.91     0.309   
  10 Chiron 4                &#58;   2844   -364    2250     672.0   29.87     291    762   1197   33.87     0.243

White advantage = 34.62
Draw rate &#40;equal opponents&#41; = 46.34 %

Thank you very much, I will compute tomorrow morning the relative ratios from Bullet to Long Time Control.

cdani · Post by **cdani** » Fri Apr 07, 2017 11:28 pm

Laskos wrote:
Dann Corbit wrote:So using this measure, Andscacs scales best with longer time and Fritz the worst.
Yes.

You can bet I tried very hard to obtain this

Dann Corbit · Post by **Dann Corbit** » Fri Apr 07, 2017 11:42 pm

cdani wrote:
Laskos wrote:
Dann Corbit wrote:So using this measure, Andscacs scales best with longer time and Fritz the worst.
Yes.
You can bet I tried very hard to obtain this

I guess that all the efforts to obtain this are via pruning, since it has to do with all experiments running a single thread (so it has nothing to do with SMP).

I think that this is the right direction for a giant win (next big revolution like null move and LMR were in their day).

cdani · Post by **cdani** » Fri Apr 07, 2017 11:51 pm

Dann Corbit wrote:
cdani wrote:
Laskos wrote:
Dann Corbit wrote:So using this measure, Andscacs scales best with longer time and Fritz the worst.
Yes.
You can bet I tried very hard to obtain this
I guess that all the efforts to obtain this are via pruning, since it has to do with all experiments running a single thread (so it has nothing to do with SMP).

I think that this is the right direction for a giant win (next big revolution like null move and LMR were in their day).

I don't signal a concrete cause. I try that every patch I accept scales well, or at least is neutral. So its an accumulated effect. Anyway even if this comes for long ago, I'm never sure if the next patch I will do will kill a part of the achievements, as of course I cannot test at very long time control.

JJJ · Post by **JJJ** » Sat Apr 08, 2017 12:00 am

This confirm my intuition, about Komodo scaling better than Stockfish 8 with time.

mjlef · Post by **mjlef** » Sat Apr 08, 2017 2:50 am

Laskos wrote:

fastgm wrote:Hello Kai,

thank you very much for the comparison.
Here the data from my third rating list 60" + 0.6":

Code: Select all

60'' + 0.6''

Ordo v1.2.6&#58; 3000 

     Engine                  &#58;    Elo   Diff   Games    Points    (%)      W       D      L     D&#40;%)     W/L
 ----------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8             &#58;   3208      0    2250    1722.0   76.53    1308    828    114   36.80    11.47
   2 Houdini 5               &#58;   3205     -3    2250    1714.0   76.18    1319    790    141   35.11     9.35
   3 Komodo 10.4             &#58;   3184    -24    2250    1663.0   73.91    1263    800    187   35.56     6.75
   4 Deep Shredder 13        &#58;   3004   -204    2250    1148.0   51.02     675    946    629   42.04     1.073
   5 Fire 5                  &#58;   2973   -235    2250    1053.0   46.80     635    836    779   37.16     0.815
   6 Fizbo 1.9               &#58;   2947   -261    2250     974.0   43.29     575    798    877   35.47     0.656
   7 Gull 3                  &#58;   2918   -290    2250     884.5   39.31     459    851    940   37.82     0.489
   8 Fritz 15                &#58;   2858   -350    2250     711.0   31.60     337    748   1165   33.24     0.289
   9 Andscacs 0.89           &#58;   2858   -350    2250     708.5   31.49     372    673   1205   29.91     0.309   
  10 Chiron 4                &#58;   2844   -364    2250     672.0   29.87     291    762   1197   33.87     0.243

White advantage = 34.62
Draw rate &#40;equal opponents&#41; = 46.34 %

Thank you very much, I will compute tomorrow morning the relative ratios from Bullet to Long Time Control.

It would be great to calculate the same kind of scaling based on number of cores/threads. Of course more cores help you search deeper, just as longer time does, so that would have to be taken into account.

Kai, as always, great stuff! Thanks.

Mark

Scaling of engines from FGRL rating list

Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list

Re: Scaling of engines from FGRL rating list