The following list is the result of my testing the approximate gain in Elo for each doubling of thinking time, which should roughly equal the same increase due to doubling speed:
Code: Select all
Base time control: 6 sec + 0.1 sec
(2) : 2 x (6 sec + 0.1 sec) = 12 sec + 0.2 sec
(4) : 24 sec + 0.4 sec
(8) : 48 sec + 0.8 sec
(16): 96 sec + 1.6 sec
QX6700 @ 3.05 GHz
100 positions per match, each position twice (reversed colors)
The mean Elo increase per doubling in time is 117.94 +/- 54. If the Elo increases for doubling from the base time to 12 sec + 0.2 sec is discarded, the increase per doubling is 108.16 +/- 38.52.
I will not claim that there are no problems with my testing methodology, but you will have to point out to me exactly what the problems are. I understand that these numbers do not correspond with your expectations, but this data does have a little bit of support in that it corresponds with the results found by the authors of Dirty and Spandrel.
If you do point out some plausible flaws, I would be willing to redo the study with corrections made to the methodology.[/quote]
Ok, I was away for the weekend, so the answer comes with a bit of delay :).
I have 2 major problems with your testing method.
First and the most important is time controls. At least for the first 3 time controls (basically for any TC where average time per move is less than 1 sec) the Elo differences tend to be greatly exaggerated. This is a well-known fact. It's not due to search/evaluation algorithm weaknesses, but TC and interface problem differences which make blunders more often and therefore effectively lower Elo. You can also see then by the extremely low draw percentage. Btw. these extremely short TCs are used in engine tuning exactly because they exaggerate differences so any change you make to the engine is more easily recognized as positive or negative.
The second problem is related to the wide range of Elo differences in your list. I have no clue how you are selecting opponents, but when testing 1 engine with various TC's all opponents have to be the same, always with the same TC (for opponent, so there should be no same opponents with different TCs per opponent). Opponents should also be selected in a way that the weakest is slightly stronger than the tested engine at the shortest TC, while the strongest is slightly weaker then the tested engine at the longest TC and there should be no more than 3 doubling TCs for the tested engine since that would already give more than 200 Elo difference which is quite large.