Page 1 of 6

Scaling of engines from FGRL rating list

Posted: Fri Apr 07, 2017 9:49 pm
by Laskos
The excellent FGRL rating list (http://www.fastgm.de/index.html) contains two Top 10 rating lists for 10' + 6'' and 60' + 15'' TC with identical engines on one core. We can make direct comparisons of engine performances.

1/
10' + 6''

Code: Select all

10' + 6''

Ordo v1.0.9.2: 3000

     Engine              :    Elo   Diff   Error    Points    (%)       W      D      L     D(%)   CFS      W/L
  ------------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8         :   3151      0       9    1916.0   70.96    1209   1414     77   52.37    89     15.70
   2 Komodo 10.4         :   3143     -8       9    1889.0   69.96    1224   1330    146   49.26    63      8.38
   3 Houdini 5.01        :   3141    -10       8    1882.0   69.70    1193   1378    129   51.04   100      9.25
   4 Deep Shredder 13    :   3009   -142       8    1390.0   51.48     630   1520    550   56.30   100      1.145
   5 Fire 5              :   2983   -168       8    1289.0   47.74     542   1494    664   55.33   100      0.816
   6 Fizbo 1.9           :   2957   -194       8    1186.0   43.93     476   1420    804   52.59   100      0.592
   7 Gull 3              :   2941   -210       8    1125.0   41.67     399   1452    849   53.78   100      0.470
   8 Andscacs 0.89       :   2901   -250       8     975.5   36.13     330   1291   1079   47.81    98      0.306
   9 Fritz 15            :   2889   -262       8     930.0   34.44     282   1296   1122   48.00    72      0.251
  10 Chiron 4            :   2885   -266       8     917.5   33.98     271   1293   1136   47.89   ---      0.239

White advantage = 40.58 +/- 2.07
Draw rate (equal opponents) = 63.46 % +/- 0.53
2/
60' + 15''

Code: Select all

60' + 15''

 Ordo v1.2.6: 3000

     Engine              :    Elo   Diff   Error   Points    (%)      W      D      L     D(%)   CFS     W/L
 ----------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8         :   3146      0      12    950.5   70.41    587    727     36   53.85    51    16.31
   2 Komodo 10.4         :   3146      0      12    950.0   70.37    615    670     65   49.63   100     9.46
   3 Houdini 5.01        :   3119    -27      11    903.5   66.93    516    775     59   57.41   100     8.74
   4 Deep Shredder 13    :   3015   -131      11    706.5   52.33    304    805    241   59.63    99     1.261
   5 Fire 5              :   2997   -149      10    670.5   49.67    287    767    296   56.81   100     0.970
   6 Fizbo 1.9           :   2949   -197      11    577.5   42.78    208    739    403   54.74    83     0.516
   7 Gull 3              :   2941   -205      11    562.5   41.67    172    781    397   57.85    97     0.433
   8 Andscacs 0.89       :   2926   -220      11    533.0   39.48    176    714    460   52.89   100     0.383
   9 Chiron 4            :   2885   -261      11    457.0   33.85    126    662    562   49.04    88     0.224
  10 Fritz 15            :   2875   -271      11    439.0   32.52    106    666    578   49.33   ---     0.183

White advantage = 39.23 +/- 2.84
Draw rate (equal opponents) = 66.78 % +/- 0.74
Elo is not an adequate parametrization of the scaling. Rating at longer time controls is subjected to Elo compression, due to increasing draw rate. So, a weaker engine might appear to approach Elo-wise a stronger one (relatively gain strength), but this might be just due to the increasing number of draws, without affecting the relative strength. More related to relative strength is Win/Loss rate for every engine in the list. Here I post the rating list of scaling of engines in Win/Loss ratios from Blitz TC to Long TC. Also log10 list for ratings to be additive.

Scaling to Long Time Control on one core:

Code: Select all

     Engine                 Scaling = (W2*L1)/(W1*L2)    100*log10(Scaling)
  ------------------------------------------------------------------------------------
   1 Andscacs 0.89       :          1.252                     9.76 
   2 Fire 5              :          1.189                     7.52
   3 Komodo 10.4         :          1.129                     5.27
   4 Deep Shredder 13    :          1.101                     4.18
   5 Stockfish 8         :          1.039                     1.66
   6 Houdini 5.01        :          0.945                    -2.46
   7 Chiron 4            :          0.937                    -2.83
   8 Gull 3              :          0.921                    -3.57
   9 Fizbo 1.9           :          0.872                    -5.95
  10 Fritz 15            :          0.729                   -13.73

Re: Scaling of engines from FGRL rating list

Posted: Fri Apr 07, 2017 10:08 pm
by Dann Corbit
So using this measure, Andscacs scales best with longer time and Fritz the worst.

Re: Scaling of engines from FGRL rating list

Posted: Fri Apr 07, 2017 10:16 pm
by Laskos
Dann Corbit wrote:So using this measure, Andscacs scales best with longer time and Fritz the worst.
Yes.

Re: Scaling of engines from FGRL rating list

Posted: Fri Apr 07, 2017 10:49 pm
by fastgm
Hello Kai,

thank you very much for the comparison.
Here the data from my third rating list 60" + 0.6":

Code: Select all

60'' + 0.6''

Ordo v1.2.6: 3000 

     Engine                  :    Elo   Diff   Games    Points    (%)      W       D      L     D(%)     W/L
 ----------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8             :   3208      0    2250    1722.0   76.53    1308    828    114   36.80    11.47
   2 Houdini 5               :   3205     -3    2250    1714.0   76.18    1319    790    141   35.11     9.35
   3 Komodo 10.4             :   3184    -24    2250    1663.0   73.91    1263    800    187   35.56     6.75
   4 Deep Shredder 13        :   3004   -204    2250    1148.0   51.02     675    946    629   42.04     1.073
   5 Fire 5                  :   2973   -235    2250    1053.0   46.80     635    836    779   37.16     0.815
   6 Fizbo 1.9               :   2947   -261    2250     974.0   43.29     575    798    877   35.47     0.656
   7 Gull 3                  :   2918   -290    2250     884.5   39.31     459    851    940   37.82     0.489
   8 Fritz 15                :   2858   -350    2250     711.0   31.60     337    748   1165   33.24     0.289
   9 Andscacs 0.89           :   2858   -350    2250     708.5   31.49     372    673   1205   29.91     0.309   
  10 Chiron 4                :   2844   -364    2250     672.0   29.87     291    762   1197   33.87     0.243

White advantage = 34.62
Draw rate (equal opponents) = 46.34 %

Re: Scaling of engines from FGRL rating list

Posted: Fri Apr 07, 2017 10:55 pm
by Laskos
fastgm wrote:Hello Kai,

thank you very much for the comparison.
Here the data from my third rating list 60" + 0.6":

Code: Select all

60'' + 0.6''

Ordo v1.2.6: 3000 

     Engine                  :    Elo   Diff   Games    Points    (%)      W       D      L     D(%)     W/L
 ----------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8             :   3208      0    2250    1722.0   76.53    1308    828    114   36.80    11.47
   2 Houdini 5               :   3205     -3    2250    1714.0   76.18    1319    790    141   35.11     9.35
   3 Komodo 10.4             :   3184    -24    2250    1663.0   73.91    1263    800    187   35.56     6.75
   4 Deep Shredder 13        :   3004   -204    2250    1148.0   51.02     675    946    629   42.04     1.073
   5 Fire 5                  :   2973   -235    2250    1053.0   46.80     635    836    779   37.16     0.815
   6 Fizbo 1.9               :   2947   -261    2250     974.0   43.29     575    798    877   35.47     0.656
   7 Gull 3                  :   2918   -290    2250     884.5   39.31     459    851    940   37.82     0.489
   8 Fritz 15                :   2858   -350    2250     711.0   31.60     337    748   1165   33.24     0.289
   9 Andscacs 0.89           :   2858   -350    2250     708.5   31.49     372    673   1205   29.91     0.309   
  10 Chiron 4                :   2844   -364    2250     672.0   29.87     291    762   1197   33.87     0.243

White advantage = 34.62
Draw rate (equal opponents) = 46.34 %
Thank you very much, I will compute tomorrow morning the relative ratios from Bullet to Long Time Control.

Re: Scaling of engines from FGRL rating list

Posted: Fri Apr 07, 2017 11:28 pm
by cdani
Laskos wrote:
Dann Corbit wrote:So using this measure, Andscacs scales best with longer time and Fritz the worst.
Yes.
You can bet I tried very hard to obtain this :-)

Re: Scaling of engines from FGRL rating list

Posted: Fri Apr 07, 2017 11:42 pm
by Dann Corbit
cdani wrote:
Laskos wrote:
Dann Corbit wrote:So using this measure, Andscacs scales best with longer time and Fritz the worst.
Yes.
You can bet I tried very hard to obtain this :-)
I guess that all the efforts to obtain this are via pruning, since it has to do with all experiments running a single thread (so it has nothing to do with SMP).

I think that this is the right direction for a giant win (next big revolution like null move and LMR were in their day).

Re: Scaling of engines from FGRL rating list

Posted: Fri Apr 07, 2017 11:51 pm
by cdani
Dann Corbit wrote:
cdani wrote:
Laskos wrote:
Dann Corbit wrote:So using this measure, Andscacs scales best with longer time and Fritz the worst.
Yes.
You can bet I tried very hard to obtain this :-)
I guess that all the efforts to obtain this are via pruning, since it has to do with all experiments running a single thread (so it has nothing to do with SMP).

I think that this is the right direction for a giant win (next big revolution like null move and LMR were in their day).
I don't signal a concrete cause. I try that every patch I accept scales well, or at least is neutral. So its an accumulated effect. Anyway even if this comes for long ago, I'm never sure if the next patch I will do will kill a part of the achievements, as of course I cannot test at very long time control.

Re: Scaling of engines from FGRL rating list

Posted: Sat Apr 08, 2017 12:00 am
by JJJ
This confirm my intuition, about Komodo scaling better than Stockfish 8 with time.

Re: Scaling of engines from FGRL rating list

Posted: Sat Apr 08, 2017 2:50 am
by mjlef
Laskos wrote:
fastgm wrote:Hello Kai,

thank you very much for the comparison.
Here the data from my third rating list 60" + 0.6":

Code: Select all

60'' + 0.6''

Ordo v1.2.6: 3000 

     Engine                  :    Elo   Diff   Games    Points    (%)      W       D      L     D(%)     W/L
 ----------------------------------------------------------------------------------------------------  ------
   1 Stockfish 8             :   3208      0    2250    1722.0   76.53    1308    828    114   36.80    11.47
   2 Houdini 5               :   3205     -3    2250    1714.0   76.18    1319    790    141   35.11     9.35
   3 Komodo 10.4             :   3184    -24    2250    1663.0   73.91    1263    800    187   35.56     6.75
   4 Deep Shredder 13        :   3004   -204    2250    1148.0   51.02     675    946    629   42.04     1.073
   5 Fire 5                  :   2973   -235    2250    1053.0   46.80     635    836    779   37.16     0.815
   6 Fizbo 1.9               :   2947   -261    2250     974.0   43.29     575    798    877   35.47     0.656
   7 Gull 3                  :   2918   -290    2250     884.5   39.31     459    851    940   37.82     0.489
   8 Fritz 15                :   2858   -350    2250     711.0   31.60     337    748   1165   33.24     0.289
   9 Andscacs 0.89           :   2858   -350    2250     708.5   31.49     372    673   1205   29.91     0.309   
  10 Chiron 4                :   2844   -364    2250     672.0   29.87     291    762   1197   33.87     0.243

White advantage = 34.62
Draw rate (equal opponents) = 46.34 %
Thank you very much, I will compute tomorrow morning the relative ratios from Bullet to Long Time Control.
It would be great to calculate the same kind of scaling based on number of cores/threads. Of course more cores help you search deeper, just as longer time does, so that would have to be taken into account.

Kai, as always, great stuff! Thanks.

Mark