Doubling of time control

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Doubling of time control

Post by Guenther »

corres wrote:Thanks for correction.
But where are the 1500 opening positions?
I should like to know that how you can calculate the Elo number from a self play match. Elo number is always a relative number. But in the case of self play what is the basic point?
I think from your post some facts are missing still.
Robert
This thread is not about Elo as you think of it, but about measuring the difference in strength related to time. (=>thread header)

Edit: May be you and others were confused by Danns post and other posts, which failed to see the goal of this test...
(The data in the first diagram should read as 'Elo Diff' instead 'Elo', but Andreas sure thought it was self explaining)
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Doubling of time control.

Post by IWB »

Thx for that (Andreas and Jesus)

The interesting thing is, that adding time and cores does not necessarily mean better game play from a certain point onwards. Cores and long games can be used different!

... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: Doubling of time control

Post by beram »

Uri Blass wrote:
shrapnel wrote:
Dann Corbit wrote:This testing clearly shows a fundamental flaw in the Elo model.
The engine does not get weaker at long time control but stronger.
When given an hour to think, compared to one second, the move chosen with more time allowed will clearly be a much better move.

The increase in draws probably just shows that more careful chess is played by both sides at slower time control.
I've been crying myself hoarse for the last few years that testing through Blitz/STC games is NO SUBSTITUTE for testing through LTC games.
Glad to be vindicated.
Its not just a matter of engines being "more careful"....LTC games ruthlessly expose any weakness in the Evaluation and Search patterns of the Engine.
That is why in the TCEC, Houdini winning the Rapids is no guarantee that it will beat SF in the super-final ( though I hope it does, just to discomfit Adam :D ).
Adam has a point though, when he says that the Super-Final will be a different kind of ballgame.
Houdini can win only if it is REALLY the better Engine.
I believe that difference between rapid TCEC and slow time control TCEC is small and we do not have enough games.

Conditions are also not the same in long and rapid time control because in rapid time control there are many weak opponents.

Rapid TCEC is clearly slower time control than blitz espacially when you consider the number of cores and is clearly slower than the time controls in this thread.
I like to add that Shrapnel's remarks have nothing to do with meaning of this test
That the overall quality of chessplay improves with LTC has never been questioned
What has been questioned very much was whether results of matches between opponents A and B on STC are proportional to LTC

When engine B did loose substantial say >55% to engine A on STC, programmers and fanboys of B would(will) often say or claim that it would be all different at LTC conditions
When than same good winning results of engine A over B where made on LTC, than these people simply repeated the mantra, saying that it would be all very different when the match would be played at very LTC
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Doubling of time control

Post by corres »

[quote="Guenther"]
(The data in the first diagram should read as 'Elo Diff' instead 'Elo', but Andreas sure thought it was self explaining)[/quote]

I am not a reader of cogitation and I like the precise reports.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Doubling of time control.

Post by corres »

[quote="IWB"]
... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...[/quote]

You are all right if all engines would behave the same manner to shifting move time.
But it is not the case.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Doubling of time control.

Post by IWB »

corres wrote:
IWB wrote: ... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...
You are all right if all engines would behave the same manner to shifting move time.
But it is not the case.
I am pretty sure I am (given a decent time overall)!
I am running 5 + 3 and the only difference to longer time controls I can see is that the longer ones are more compressed. Every change in ranking is within error margins (unfortunately the longer list have ridiculous huge error margins). The boundary to produce a proper list seems to be below that 5 + 3 Ponder on, one core, average HW. Ponder off I don't know, but the CEGT 40/4 is not that different than my list, so I assume that is ok too even if the overall processing time (with one core) is below mine ...
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Doubling of time control.

Post by Laskos »

Ajedrecista wrote:Hello:

I have been looking for an adjust of this data and I think I have something decent. Here I go:

I looked into a Gompertz function (an example is here) and I came with the following:

1.- I converted accumulated Elo gain (0, 144, 277,...) into score with the Elo model µ = 1/[1 + 10^(-Elo/400)].

2.- I used the TC values of 0, 1, 2 and so on in the horizontal axis, like it is seen in the cited paper just after equation 1.

3.- I used the numbers ln[ln(µ_1/µ_0)], ln[ln(µ_2/µ_1)], ..., ln[ln(µ_8/µ_7)] in the vertical axis. (Equation 4 of the paper).

4.- I did a linear regression with Excel to obtain beta and gamma parameters:

Code: Select all

Gompertz fit:
Fitted_µ = alpha*exp[-beta*exp(-gamma*TC)]

Linear fit of the 8 data points = m*TC + n ~ -0.64087232*TC - 0.51556368 (R² ~ 0.99744438)

(Equation 4): gamma = -m ~ 0.64087232
(Equation 4): beta = exp(n)/[exp(gamma) - 1] ~ 0.66489741
5.- By definition, alpha is the saturation level, so we can expect that max(µ) = 1 = alpha --> horizontal asymptote. If that:

Code: Select all

Fitted_µ ~ exp[-0.66489741*exp(-gamma*0.64087232)]

Converting fitted_µ into Elo gain (rounding up to the nearest Elo integer):

TC  Elo   Fitted Elo   Elo - (fitted Elo)
 1  144       151               -7
 2  277       277                0
 3  389       396               -9
 4  490       512              -22
 5  583       625              -42
 6  656       738              -82
 7  715       850             -135
 8  766       961             -195

Average error = -61.5 Elo
6.- Equation 5 of the paper proposes the following:

Code: Select all

alpha_TC = exp[ln(µ_TC) + beta*exp(-gamma*TC)]

I obtain 8 values of alpha_TC. If I randomly choose alpha = average(alpha_TC) ~ 0.99310185

Fitted_µ ~ 0.99310185*exp[-0.66489741*exp(-gamma*0.64087232)]

TC  Elo   Fitted Elo   Elo - (fitted Elo)
 1  144       147               -3
 2  277       270               -7
 3  389       384               +5
 4  490       489               +1
 5  583       585               -2
 6  656       668              -12
 7  715       735              -20
 8  766       785              -19

Average error ~ -7.1 Elo
I know that it sets the upper bound of 99.31% of score, that is, circa 863.3 Elo gain at most. But the average error has improved a lot.

Furthermore, I did not take into account error bars.

Bonus: if I continue giving increasing values of TC to fitted_µ ~ 0.99310185*exp[-0.66489741*exp(-gamma*0.64087232)], I get the next estimated Elo gains:

Code: Select all

Converting fitted_µ into Elo gain (rounding up to the nearest Elo integer):

          Comparison               TC  Fitted Elo
 5120 +  51.2 vs  2560 +  25.6      8  785
10240 + 102.4 vs  5120 +  51.2      9  818 (+33)
20480 + 204.8 vs 10240 + 102.4     10  838 (+20)
40960 + 409.6 vs 20480 + 204.8     11  849 (+11)
81920 + 819.2 vs 40960 + 409.6     12  856 ( +7)
I hope no typos. 818 (+33) should be understood as 818 - 785 = +33 Elo in (10240 + 102.4 vs 5120 + 51.2) and +818 Elo in (10240 + 102.4 vs 10 + 0.1).

It might be interesting to fit win ratio, draw ratio and lose ratio in similar ways.

Last but not least: thank you very much, Andreas.

Regards from Spain.

Ajedrecista.
I took the simplest asymptote which fits well the data and 0 ELO points gain at infinity. I converted ELO gains per doubling to score percentages for successive doublings, got the following:

Image

The average error is 4 ELO points. Integrating over doublings above 8 to infinity, I got about 420 ELO points gain compared to the last data point, a total of close to 3700 ELO points for perfect engine. My other extrapolation, from draw rate, seems to show a similar result. This is somewhat lower than I got in my own tests, but Andreas data is much better, has more games, and to much longer time control.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Doubling of time control.

Post by corres »

[quote="IWB"][quote="corres"][quote="IWB"]
... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...[/quote]

You are all right if all engines would behave the same manner to shifting move time.
But it is not the case.[/quote]

I am pretty sure I am (given a decent time overall)!
I am running 5 + 3 and the only difference to longer time controls I can see is that the longer ones are more compressed. Every change in ranking is within error margins (unfortunately the longer list have ridiculous huge error margins). The boundary to produce a proper list seems to be below that 5 + 3 Ponder on, one core, average HW. Ponder off I don't know, but the CEGT 40/4 is not that different than my list, so I assume that is ok too even if the overall processing time (with one core) is below mine ...[/quote]

If the difference in Elo between participants is rather high and/or the difference in move time is not too big the order of engines may be the same. However if the shifting of move time would be an effect nothing than the TCEC rapid is won by Stockfish and no Houdini, for e.g.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Doubling of time control.

Post by Laskos »

Laskos wrote:I took the simplest asymptote which fits well the data and 0 ELO points gain at infinity. I converted ELO gains per doubling to score percentages for successive doublings, got the following:

Image

The average error is 4 ELO points. Integrating over doublings above 8 to infinity, I got about 420 ELO points gain compared to the last data point, a total of close to 3700 ELO points for perfect engine. My other extrapolation, from draw rate, seems to show a similar result. This is somewhat lower than I got in my own tests, but Andreas data is much better, has more games, and to much longer time control.
Fitting similarly with decaying exponential the draw rate:

Image

Assuming constant Win/Loss ratio given by the last data point, the ELO gain to infinity (perfect engine) is about 480 ELO points above the last data point, or about 3750 CCRL ELO points. Seems consistent.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Doubling of time control

Post by Adam Hair »

fastgm wrote:Ferdinand Mosca helped me. Thanks a lot!

He found a program (Protools) written by Ed Schröder:
http://www.top-5000.nl/dl/protools15.zip

and the latest prodeo.exe
http://www.talkchess.com/forum/viewtopi ... 77&t=61721

With this tool I have created the following additional data:

Code: Select all

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 20+0.2      17.40   28:34:57    3000    256891    0.40     0   26737 (8.91)
Komodo 9.3 T1 10+0.1      15.44   14:19:45    3000    256342    0.20     0   24694 (8.23)

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 40+0.4      19.05   58:18:17    3000    265755    0.79     0   27193 (9.06)
Komodo 9.3 T1 20+0.2      17.07   29:24:34    3000    265279    0.40     0   25454 (8.48)

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 80+0.8      20.70  116:39:50    3000    267468    1.57     0   27456 (9.15)
Komodo 9.3 T1 40+0.4      18.85   58:53:16    3000    267048    0.79     0   25729 (8.58)

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 160+1.6     22.45  234:51:52    3000    267920    3.16     0   27371 (9.12)
Komodo 9.3 T1 80+0.8      20.50  118:06:24    3000    267555    1.59     0   26095 (8.70)

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 320+3.2     24.30  476:30:53    3000    274164    6.26     0   27771 (9.26)
Komodo 9.3 T1 160+1.6     22.29  239:40:46    3000    273826    3.15     0   26428 (8.81)

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 640+6.4     26.28  950:20:50    3000    272343    12.56    0   27972 (9.32)
Komodo 9.3 T1 320+3.2     24.26  478:20:21    3000    272091    6.33     0   26638 (8.88)

Engine                    Depth        Time  Games     Moves  Average Forfeit Book
Komodo 9.3 T1 1280+12.8   28.09  1908:54:31   3000    276907    24.82    0   28475 (9.49)
Komodo 9.3 T1 640+6.4     26.18   960:08:07   3000    276750    12.49    0   27004 (9.00)

Engine                    Depth        Time  Games     Moves  Average Forfeit Book
Komodo 9.3 T1 2560+25.6   29.92  3806:05:02   3000    275195    49.79    0   28760 (9.59)
Komodo 9.3 T1 1280+12.8   28.01  1914:36:07   3000    275034    25.06    0   27544 (9.18)


Time control comparison between engines

Depth     : Average search depth
Time      : Total time engine used
Moves     : Total moves engine played
Average   : Average time per move in centi-seconds
Forfeit   : Games engine lost due to time forfeit

List is sorted on Average Time indicating the engine that uses the most time tops.
Thanks for investing the time and electricity to perform this test Andreas!