Doubling of time control

Guenther · Post by **Guenther** » Sat Oct 22, 2016 2:41 pm

corres wrote:Thanks for correction.
But where are the 1500 opening positions?
I should like to know that how you can calculate the Elo number from a self play match. Elo number is always a relative number. But in the case of self play what is the basic point?
I think from your post some facts are missing still.
Robert

This thread is not about Elo as you think of it, but about measuring the difference in strength related to time. (=>thread header)

Edit: May be you and others were confused by Danns post and other posts, which failed to see the goal of this test...
(The data in the first diagram should read as 'Elo Diff' instead 'Elo', but Andreas sure thought it was self explaining)

IWB · Post by **IWB** » Sat Oct 22, 2016 2:49 pm

Thx for that (Andreas and Jesus)

The interesting thing is, that adding time and cores does not necessarily mean better game play from a certain point onwards. Cores and long games can be used different!

... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...

beram · Post by **beram** » Sat Oct 22, 2016 2:56 pm

Uri Blass wrote:
shrapnel wrote:
Dann Corbit wrote:This testing clearly shows a fundamental flaw in the Elo model.
The engine does not get weaker at long time control but stronger.
When given an hour to think, compared to one second, the move chosen with more time allowed will clearly be a much better move.

The increase in draws probably just shows that more careful chess is played by both sides at slower time control.
I've been crying myself hoarse for the last few years that testing through Blitz/STC games is NO SUBSTITUTE for testing through LTC games.
Glad to be vindicated.
Its not just a matter of engines being "more careful"....LTC games ruthlessly expose any weakness in the Evaluation and Search patterns of the Engine.
That is why in the TCEC, Houdini winning the Rapids is no guarantee that it will beat SF in the super-final ( though I hope it does, just to discomfit Adam ).
Adam has a point though, when he says that the Super-Final will be a different kind of ballgame.
Houdini can win only if it is REALLY the better Engine.
I believe that difference between rapid TCEC and slow time control TCEC is small and we do not have enough games.

Conditions are also not the same in long and rapid time control because in rapid time control there are many weak opponents.

Rapid TCEC is clearly slower time control than blitz espacially when you consider the number of cores and is clearly slower than the time controls in this thread.

I like to add that Shrapnel's remarks have nothing to do with meaning of this test
That the overall quality of chessplay improves with LTC has never been questioned
What has been questioned very much was whether results of matches between opponents A and B on STC are proportional to LTC

When engine B did loose substantial say >55% to engine A on STC, programmers and fanboys of B would(will) often say or claim that it would be all different at LTC conditions
When than same good winning results of engine A over B where made on LTC, than these people simply repeated the mantra, saying that it would be all very different when the match would be played at very LTC

corres · Post by **corres** » Sat Oct 22, 2016 2:59 pm

[quote="Guenther"]
(The data in the first diagram should read as 'Elo Diff' instead 'Elo', but Andreas sure thought it was self explaining)[/quote]

I am not a reader of cogitation and I like the precise reports.

corres · Post by **corres** » Sat Oct 22, 2016 3:06 pm

[quote="IWB"]
... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...[/quote]

You are all right if all engines would behave the same manner to shifting move time.
But it is not the case.

IWB · Post by **IWB** » Sat Oct 22, 2016 3:33 pm

corres wrote:
IWB wrote: ... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...
You are all right if all engines would behave the same manner to shifting move time.
But it is not the case.

I am pretty sure I am (given a decent time overall)!
I am running 5 + 3 and the only difference to longer time controls I can see is that the longer ones are more compressed. Every change in ranking is within error margins (unfortunately the longer list have ridiculous huge error margins). The boundary to produce a proper list seems to be below that 5 + 3 Ponder on, one core, average HW. Ponder off I don't know, but the CEGT 40/4 is not that different than my list, so I assume that is ok too even if the overall processing time (with one core) is below mine ...

Laskos · Post by **Laskos** » Sat Oct 22, 2016 3:45 pm

Ajedrecista wrote:Hello:

I have been looking for an adjust of this data and I think I have something decent. Here I go:

I looked into a Gompertz function (an example is here) and I came with the following:

1.- I converted accumulated Elo gain (0, 144, 277,...) into score with the Elo model µ = 1/[1 + 10^(-Elo/400)].

2.- I used the TC values of 0, 1, 2 and so on in the horizontal axis, like it is seen in the cited paper just after equation 1.

3.- I used the numbers ln[ln(µ_1/µ_0)], ln[ln(µ_2/µ_1)], ..., ln[ln(µ_8/µ_7)] in the vertical axis. (Equation 4 of the paper).

4.- I did a linear regression with Excel to obtain beta and gamma parameters:
Code: Select all
Gompertz fit&#58;
Fitted_µ = alpha*exp&#91;-beta*exp&#40;-gamma*TC&#41;&#93;

Linear fit of the 8 data points = m*TC + n ~ -0.64087232*TC - 0.51556368 &#40;R² ~ 0.99744438&#41;

&#40;Equation 4&#41;&#58; gamma = -m ~ 0.64087232
&#40;Equation 4&#41;&#58; beta = exp&#40;n&#41;/&#91;exp&#40;gamma&#41; - 1&#93; ~ 0.66489741
5.- By definition, alpha is the saturation level, so we can expect that max(µ) = 1 = alpha --> horizontal asymptote. If that:
Code: Select all
Fitted_µ ~ exp&#91;-0.66489741*exp&#40;-gamma*0.64087232&#41;&#93;

Converting fitted_µ into Elo gain &#40;rounding up to the nearest Elo integer&#41;&#58;

TC  Elo   Fitted Elo   Elo - &#40;fitted Elo&#41;
 1  144       151               -7
 2  277       277                0
 3  389       396               -9
 4  490       512              -22
 5  583       625              -42
 6  656       738              -82
 7  715       850             -135
 8  766       961             -195

Average error = -61.5 Elo
6.- Equation 5 of the paper proposes the following:
Code: Select all
alpha_TC = exp&#91;ln&#40;µ_TC&#41; + beta*exp&#40;-gamma*TC&#41;&#93;

I obtain 8 values of alpha_TC. If I randomly choose alpha = average&#40;alpha_TC&#41; ~ 0.99310185

Fitted_µ ~ 0.99310185*exp&#91;-0.66489741*exp&#40;-gamma*0.64087232&#41;&#93;

TC  Elo   Fitted Elo   Elo - &#40;fitted Elo&#41;
 1  144       147               -3
 2  277       270               -7
 3  389       384               +5
 4  490       489               +1
 5  583       585               -2
 6  656       668              -12
 7  715       735              -20
 8  766       785              -19

Average error ~ -7.1 Elo
I know that it sets the upper bound of 99.31% of score, that is, circa 863.3 Elo gain at most. But the average error has improved a lot.

Furthermore, I did not take into account error bars.

Bonus: if I continue giving increasing values of TC to fitted_µ ~ 0.99310185*exp[-0.66489741*exp(-gamma*0.64087232)], I get the next estimated Elo gains:
Code: Select all
Converting fitted_µ into Elo gain &#40;rounding up to the nearest Elo integer&#41;&#58;

          Comparison               TC  Fitted Elo
 5120 +  51.2 vs  2560 +  25.6      8  785
10240 + 102.4 vs  5120 +  51.2      9  818 (+33&#41;
20480 + 204.8 vs 10240 + 102.4     10  838 (+20&#41;
40960 + 409.6 vs 20480 + 204.8     11  849 (+11&#41;
81920 + 819.2 vs 40960 + 409.6     12  856 ( +7&#41;
I hope no typos. 818 (+33) should be understood as 818 - 785 = +33 Elo in (10240 + 102.4 vs 5120 + 51.2) and +818 Elo in (10240 + 102.4 vs 10 + 0.1).

It might be interesting to fit win ratio, draw ratio and lose ratio in similar ways.

Last but not least: thank you very much, Andreas.

Regards from Spain.

Ajedrecista.

I took the simplest asymptote which fits well the data and 0 ELO points gain at infinity. I converted ELO gains per doubling to score percentages for successive doublings, got the following:

The average error is 4 ELO points. Integrating over doublings above 8 to infinity, I got about 420 ELO points gain compared to the last data point, a total of close to 3700 ELO points for perfect engine. My other extrapolation, from draw rate, seems to show a similar result. This is somewhat lower than I got in my own tests, but Andreas data is much better, has more games, and to much longer time control.

corres · Post by **corres** » Sat Oct 22, 2016 4:03 pm

[quote="IWB"][quote="corres"][quote="IWB"]
... and you don't need long time controls to get a proper ranking for rating lists, you just compress the result and make it more difficult to produce and to distinguish entries ...[/quote]

You are all right if all engines would behave the same manner to shifting move time.
But it is not the case.[/quote]

I am pretty sure I am (given a decent time overall)!
I am running 5 + 3 and the only difference to longer time controls I can see is that the longer ones are more compressed. Every change in ranking is within error margins (unfortunately the longer list have ridiculous huge error margins). The boundary to produce a proper list seems to be below that 5 + 3 Ponder on, one core, average HW. Ponder off I don't know, but the CEGT 40/4 is not that different than my list, so I assume that is ok too even if the overall processing time (with one core) is below mine ...[/quote]

If the difference in Elo between participants is rather high and/or the difference in move time is not too big the order of engines may be the same. However if the shifting of move time would be an effect nothing than the TCEC rapid is won by Stockfish and no Houdini, for e.g.

Laskos · Post by **Laskos** » Sat Oct 22, 2016 4:11 pm

Laskos wrote:I took the simplest asymptote which fits well the data and 0 ELO points gain at infinity. I converted ELO gains per doubling to score percentages for successive doublings, got the following:

The average error is 4 ELO points. Integrating over doublings above 8 to infinity, I got about 420 ELO points gain compared to the last data point, a total of close to 3700 ELO points for perfect engine. My other extrapolation, from draw rate, seems to show a similar result. This is somewhat lower than I got in my own tests, but Andreas data is much better, has more games, and to much longer time control.

Fitting similarly with decaying exponential the draw rate:

Assuming constant Win/Loss ratio given by the last data point, the ELO gain to infinity (perfect engine) is about 480 ELO points above the last data point, or about 3750 CCRL ELO points. Seems consistent.

Adam Hair · Post by **Adam Hair** » Sat Oct 22, 2016 4:15 pm

fastgm wrote:Ferdinand Mosca helped me. Thanks a lot!

He found a program (Protools) written by Ed Schröder:
http://www.top-5000.nl/dl/protools15.zip

and the latest prodeo.exe
http://www.talkchess.com/forum/viewtopi ... 77&t=61721

With this tool I have created the following additional data:

Code: Select all

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 20+0.2      17.40   28&#58;34&#58;57    3000    256891    0.40     0   26737 &#40;8.91&#41;
Komodo 9.3 T1 10+0.1      15.44   14&#58;19&#58;45    3000    256342    0.20     0   24694 &#40;8.23&#41;

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 40+0.4      19.05   58&#58;18&#58;17    3000    265755    0.79     0   27193 &#40;9.06&#41;
Komodo 9.3 T1 20+0.2      17.07   29&#58;24&#58;34    3000    265279    0.40     0   25454 &#40;8.48&#41;

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 80+0.8      20.70  116&#58;39&#58;50    3000    267468    1.57     0   27456 &#40;9.15&#41;
Komodo 9.3 T1 40+0.4      18.85   58&#58;53&#58;16    3000    267048    0.79     0   25729 &#40;8.58&#41;

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 160+1.6     22.45  234&#58;51&#58;52    3000    267920    3.16     0   27371 &#40;9.12&#41;
Komodo 9.3 T1 80+0.8      20.50  118&#58;06&#58;24    3000    267555    1.59     0   26095 &#40;8.70&#41;

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 320+3.2     24.30  476&#58;30&#58;53    3000    274164    6.26     0   27771 &#40;9.26&#41;
Komodo 9.3 T1 160+1.6     22.29  239&#58;40&#58;46    3000    273826    3.15     0   26428 &#40;8.81&#41;

Engine                    Depth       Time   Games     Moves  Average Forfeit Book
Komodo 9.3 T1 640+6.4     26.28  950&#58;20&#58;50    3000    272343    12.56    0   27972 &#40;9.32&#41;
Komodo 9.3 T1 320+3.2     24.26  478&#58;20&#58;21    3000    272091    6.33     0   26638 &#40;8.88&#41;

Engine                    Depth        Time  Games     Moves  Average Forfeit Book
Komodo 9.3 T1 1280+12.8   28.09  1908&#58;54&#58;31   3000    276907    24.82    0   28475 &#40;9.49&#41;
Komodo 9.3 T1 640+6.4     26.18   960&#58;08&#58;07   3000    276750    12.49    0   27004 &#40;9.00&#41;

Engine                    Depth        Time  Games     Moves  Average Forfeit Book
Komodo 9.3 T1 2560+25.6   29.92  3806&#58;05&#58;02   3000    275195    49.79    0   28760 &#40;9.59&#41;
Komodo 9.3 T1 1280+12.8   28.01  1914&#58;36&#58;07   3000    275034    25.06    0   27544 &#40;9.18&#41;


Time control comparison between engines

Depth     &#58; Average search depth
Time      &#58; Total time engine used
Moves     &#58; Total moves engine played
Average   &#58; Average time per move in centi-seconds
Forfeit   &#58; Games engine lost due to time forfeit

List is sorted on Average Time indicating the engine that uses the most time tops.

Thanks for investing the time and electricity to perform this test Andreas!

Doubling of time control

Re: Doubling of time control

Re: Doubling of time control.

Re: Doubling of time control

Re: Doubling of time control

Re: Doubling of time control.

Re: Doubling of time control.

Re: Doubling of time control.

Re: Doubling of time control.

Re: Doubling of time control.

Re: Doubling of time control