Elo Increase per Doubling

Adam Hair · Post by **Adam Hair** » Mon May 07, 2012 1:40 pm

In the past few months, several people (such as Peter Österlund and myself) have measured the increase in Elo per doubling of time control. The increases discovered in each case was, on average, much higher than the often quoted range of 50 to 70 Elo per doubling of speed (which is emulated by doubling the time control). However, one objection that has been raised is that the time controls are too short. While I do not believe the previous results were invalid, I do know that the short base time control added noise to the measurement. So, I decided to test the Elo increase per doubling at a longer base time control.

I used Gaviota 0.85.1 64-bit as the engine to measure.

Gaviota played against 10 opponents (Jonny 4.00, SmarThink 1.20 64-bit, Glaurung 2.2 64-bit, Booot 5.1.0, Quazar 0.4 64-bit, Nemo 1.01b 64-bit, Naum 4.2 64-bit, Gull 1.0a 64-bit, Spike 1.4, Hannibal 1.1 64-bit).

The base time control was 1 minute + 1 second per move.

I randomly choice 100 positions in epd format to use for all games, with each position played with reversed colors so that there was 200 games played for each match.

Gaviota played a gauntlet against the opponents at the base time control, and then it played with 2 times, 4 times, and 8 times the base time control

The following ratings were computed with Bayeselo with the commands "mm 0 1" and "covariance":

Code: Select all

Rank Name                       Elo    +    - games score oppo. draws 
   1 Naum 4.2 64-bit            163   21   21   800   74%   -24   27% 
   2 Gull 1.0a 64-bit           113   20   20   800   68%   -24   28% 
   3 Gaviota 0.85.1 64-bit&#40;8&#41;    97   12   12  2000   62%    10   30% 
   4 Quazar 0.4 64-bit           39   19   19   800   59%   -24   32% 
   5 Hannibal 1.1 64-bit         33   19   19   800   58%   -24   31% 
   6 Spike 1.4                   33   19   19   800   58%   -24   31% 
   7 Gaviota 0.85.1 64-bit&#40;4&#41;    20   12   12  2000   51%    10   32% 
   8 Nemo 1.01 beta 64-bit       -6   19   19   800   52%   -24   32% 
   9 Booot 5.1.0                -35   19   19   801   48%   -24   33% 
  10 Glaurung 2.2 JA 64-bit     -43   19   19   800   47%   -24   25% 
  11 Gaviota 0.85.1 64-bit&#40;2&#41;   -65   12   12  2001   40%    10   28% 
  12 SmarThink 1.20 64-bit      -99   20   20   800   40%   -24   25% 
  13 Jonny 4.00                -100   20   20   800   40%   -24   26% 
  14 Gaviota 0.85.1 64-bit     -148   13   13  2000   29%    10   26%

As it can be seen, the increase in Elo per doubling was 83, 85, and 77 Elo. The increase in winning percentage for each doubling was 11%.

I also performed a RR tournament with Gaviota's opponents. When I include those games, the ratings are as follows:

Code: Select all

Rank Name                       Elo    +    - games score oppo. draws 
   1 Naum 4.2 64-bit            165   11   11  2600   73%   -13   28% 
   2 Gull 1.0a 64-bit           103   10   10  2600   65%    -8   32% 
   3 Gaviota 0.85.1 64-bit&#40;8&#41;    98   12   12  2000   62%    10   30% 
   4 Spike 1.4                   49   10   10  2600   57%    -4   32% 
   5 Hannibal 1.1 64-bit         40   10   10  2600   56%    -3   34% 
   6 Quazar 0.4 64-bit           22   10   10  2600   53%    -2   35% 
   7 Gaviota 0.85.1 64-bit&#40;4&#41;    21   12   12  2000   51%    10   32% 
   8 Nemo 1.01 beta 64-bit      -10   10   10  2600   48%     1   32% 
   9 Booot 5.1.0                -29   10   10  2601   46%     2   34% 
  10 Glaurung 2.2 JA 64-bit     -47   10   10  2600   43%     4   29% 
  11 Gaviota 0.85.1 64-bit&#40;2&#41;   -65   12   12  2001   40%    10   28% 
  12 Jonny 4.00                 -95   11   11  2600   36%     7   27% 
  13 SmarThink 1.20 64-bit     -102   11   11  2600   35%     8   26% 
  14 Gaviota 0.85.1 64-bit     -149   13   13  2000   29%    10   26%

The increase per doubling is now 84, 86, and 77 Elo.

From my previous test, using a base time control of 6 seconds + 0.1 seconds per move , I found that Gaviota 0.84 gained 104 Elo per doubling (on average). Though the method of measurement was different between the two tests, and two different versions of Gaviota are being compared, it may not be unreasonable to claim that this shows that:

1) At depths higher than those used to determine the estimate of 50 to 70 Elo per doubling of speed originally ("How Computers Play Chess" by David Levy and Monty Newborn ?), the expected increase per doubling for modern engines is greater than the quoted numbers.

2) The expected increase per doubling of speed may decrease at higher time controls. If the base time control was 120 minutes + 90 seconds per move, the measured Elo increase per doubling might be 40 to 70 Elo.

All the games used to compute the ratings can be found here:

http://www.mediafire.com/file/guwf0e3x9 ... me_Odds.7z

JuLieN · Post by **JuLieN** » Mon May 07, 2012 2:04 pm

Thanks Adam, very interesting!

You're becoming a chess stats specialist!

A suggestion: could you make a tournament using "max depth" parameters to see what the Elo delta is when searching for 1 ply, 2 plies, 3 plies... 15 plies, etc... UCI engines have a "search depth x" command, and WB ones have probably an equivalent, so given time and methodology that could be an interesting experiment. I've been willing to do that with Prédateur, but I always end up using my computer time to test new versions than making such experiments...

hgm · Post by **hgm** » Mon May 07, 2012 2:13 pm

Well, I always counted 70 Elo / doubling (100 * ln(TC)), and what you find (slightly over 80) is not shockingly different. It is still conceivable there could be a small systematic error due to the fact that the fastest Gaviota version is tested only against stronger opponents.

Another possible source of error is communication lag: at 6"+0.1" you have about 200 msec/move, and the delay is not completely negligible. But if there is 20 msec delay, the effective thinking time goes from 180 to 380, when you 'double' it, which is actually a factor 2.11.

It is also dangerous to draw any conclusion from measurements on just a single engine. Perhaps Gaviota does have an above-average scaling.

Michel · Post by **Michel** » Mon May 07, 2012 4:01 pm

It is also dangerous to draw any conclusion from measurements on just a single engine. Perhaps Gaviota does have an above-average scaling.

The point is not that Gaviota gets a little more than 70 elo/doubling but rather that recent tests had suggested 150-200 elo/doubling for "modern" engines. The current post suggests that these earlier results are caused by the fast time control that was used.

I am familiar with the fast time control issue. GNU Chess for example seems to lose much more than 1 elo per percent slowdown at fast time control.

Adam Hair · Post by **Adam Hair** » Mon May 07, 2012 6:48 pm

JuLieN wrote:Thanks Adam, very interesting! You're becoming a chess stats specialist!

A suggestion: could you make a tournament using "max depth" parameters to see what the Elo delta is when searching for 1 ply, 2 plies, 3 plies... 15 plies, etc... UCI engines have a "search depth x" command, and WB ones have probably an equivalent, so given time and methodology that could be an interesting experiment. I've been willing to do that with Prédateur, but I always end up using my computer time to test new versions than making such experiments...

Hi Julien,

I have some partial self-test data for Fruit 2.1 and Houdini 1.03 that I can post when I get back home. I believe that I tested Fruit from depth 4 to depth 12, and Houdini from depth 4 to depth 16. I am doing some testing at the moment with IvanHoe so that I can prove a point, but I might drop that and do something more useful to everyone, such as measuring the Elo delta for several engines as the number of plies increase.

Adam

JuLieN · Post by **JuLieN** » Mon May 07, 2012 6:54 pm

Adam Hair wrote:
JuLieN wrote:Thanks Adam, very interesting! You're becoming a chess stats specialist!

A suggestion: could you make a tournament using "max depth" parameters to see what the Elo delta is when searching for 1 ply, 2 plies, 3 plies... 15 plies, etc... UCI engines have a "search depth x" command, and WB ones have probably an equivalent, so given time and methodology that could be an interesting experiment. I've been willing to do that with Prédateur, but I always end up using my computer time to test new versions than making such experiments...
Hi Julien,

I have some partial self-test data for Fruit 2.1 and Houdini 1.03 that I can post when I get back home. I believe that I tested Fruit from depth 4 to depth 12, and Houdini from depth 4 to depth 16. I am doing some testing at the moment with IvanHoe so that I can prove a point, but I might drop that and do something more useful to everyone, such as measuring the Elo delta for several engines as the number of plies increase.

Adam

Nothing urgent, so take your time and do that when you want, but yes that would be very interesting

Btw, for such experiments we should add a setting in our engines to turn off the quiescent search when we reach the horizon, because my experience is that QS kind of leverage the tactical strength.

Adam Hair · Post by **Adam Hair** » Mon May 07, 2012 7:12 pm

hgm wrote:Well, I always counted 70 Elo / doubling (100 * ln(TC)), and what you find (slightly over 80) is not shockingly different. It is still conceivable there could be a small systematic error due to the fact that the fastest Gaviota version is tested only against stronger opponents.

Another possible source of error is communication lag: at 6"+0.1" you have about 200 msec/move, and the delay is not completely negligible. But if there is 20 msec delay, the effective thinking time goes from 180 to 380, when you 'double' it, which is actually a factor 2.11.

It is also dangerous to draw any conclusion from measurements on just a single engine. Perhaps Gaviota does have an above-average scaling.

I am not trying to suggest that the measurement at the longer time control disproves 70 Elo per doubling. I am just fairly certain, though I have not accumulated enough data to prove it, that approximately 70 Elo per doubling does not hold as the speed/time varies substantially. I base this on two things I have observed. First, I have used the similarity tool to compare Gaviota's move selection as its depth varies. The move selection correlation between successive plies increases as the number of plies increase. Second, I have done some self-tests for Fruit and Houdini that seem to indicate that the increase in Elo grows smaller as the ply depth increases.

Perhaps I can post all my data this evening and we can discuss possible errors and interpretations.

Don · Post by **Don** » Mon May 07, 2012 8:40 pm

Adam Hair wrote:
hgm wrote:Well, I always counted 70 Elo / doubling (100 * ln(TC)), and what you find (slightly over 80) is not shockingly different. It is still conceivable there could be a small systematic error due to the fact that the fastest Gaviota version is tested only against stronger opponents.

Another possible source of error is communication lag: at 6"+0.1" you have about 200 msec/move, and the delay is not completely negligible. But if there is 20 msec delay, the effective thinking time goes from 180 to 380, when you 'double' it, which is actually a factor 2.11.

It is also dangerous to draw any conclusion from measurements on just a single engine. Perhaps Gaviota does have an above-average scaling.
I am not trying to suggest that the measurement at the longer time control disproves 70 Elo per doubling. I am just fairly certain, though I have not accumulated enough data to prove it, that approximately 70 Elo per doubling does not hold as the speed/time varies substantially. I base this on two things I have observed. First, I have used the similarity tool to compare Gaviota's move selection as its depth varies. The move selection correlation between successive plies increases as the number of plies increase. Second, I have done some self-tests for Fruit and Houdini that seem to indicate that the increase in Elo grows smaller as the ply depth increases.

Perhaps I can post all my data this evening and we can discuss possible errors and interpretations.

Larry and I have done substantial studies of that. At low depths the amount of ELO per doubling is quite large and at high depths it is much lower. There is absolutely no question about that.

A big part of the reason for this is the gradual increase in the number of draws as the programs get stronger. It gets more and more difficult to beat much weaker programs but also less likely you will lose to them.

Adam Hair · Post by **Adam Hair** » Mon May 07, 2012 10:50 pm

JuLieN wrote:Thanks Adam, very interesting! You're becoming a chess stats specialist!

A suggestion: could you make a tournament using "max depth" parameters to see what the Elo delta is when searching for 1 ply, 2 plies, 3 plies... 15 plies, etc... UCI engines have a "search depth x" command, and WB ones have probably an equivalent, so given time and methodology that could be an interesting experiment. I've been willing to do that with Prédateur, but I always end up using my computer time to test new versions than making such experiments...

And I always seem to use my time running and examining experiments, as well as figuring out how to work around problems that arise due to my lack of programming knowledge, instead of learning how to write code

. Well, I do know some programming, but it is in the "monkey see, monkey do" language

JuLieN · Post by **JuLieN** » Mon May 07, 2012 11:00 pm

Adam Hair wrote:
JuLieN wrote:Thanks Adam, very interesting! You're becoming a chess stats specialist!

A suggestion: could you make a tournament using "max depth" parameters to see what the Elo delta is when searching for 1 ply, 2 plies, 3 plies... 15 plies, etc... UCI engines have a "search depth x" command, and WB ones have probably an equivalent, so given time and methodology that could be an interesting experiment. I've been willing to do that with Prédateur, but I always end up using my computer time to test new versions than making such experiments...
And I always seem to use my time running and examining experiments, as well as figuring out how to work around problems that arise due to my lack of programming knowledge, instead of learning how to write code . Well, I do know some programming, but it is in the "monkey see, monkey do" language

I know many programmers here that use this language as well.

I have names! For instance: <censored>

Learning a programming language is not difficult at all. What is difficult is to learn to program. What's the difference? Well, imagine learning your first natural language (english when you were a baby): you don't say you learnt english but instead that you learnt to speak.

My first programming language was the amiga Basic. It was a VERY easy language to learn... but then it took months of practice to really get a grasp over what programming is. So despite the language was trivial, learning to program was not trivial a task. Then I learnt assembly (68000 assembly) and, despite it was a much more difficult "language" (because there is a lots of things to know) it was much more easy, because I yet had learnt to program.

So, if you want to learn how to program the language you pick doesn't really matter: what you'll need is dedication.

Elo Increase per Doubling

Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling