In a post in a thread here:
http://www.talkchess.com/forum/viewtopi ... 8&start=56 ,
I suggested that "Normalized ELO" (Michel's paper http://hardy.uhasselt.be/Toga/normalized_elo.pdf ), or basically (score-1/2)/(sigma*sqrt(N)), is the correct time control invariant measure of engines' strength. It also has a nice statistical interpretation: inverse square of it gives the number of games to desired LOS, p-value and SPRT stop. I was on a vacation, and left my PC to check the invariance hypothesis, and meanwhile I remembered a very important experiment of Andreas Strangmüller:
http://www.talkchess.com/forum/viewtopic.php?t=61784
So, I basically checked his results on a limited span. The tests are the following: take a good engine, play self games double vs single time control and measure the difference according to rating scheme, for different time controls. I took Komodo 11.01 and Stockfish dev in 3000 self games each match at 60''+ 0.6'' vs 30''+ 0.3'' and 300''+ 3'' vs 150''+ 1.5'' to see if Normalized ELO doesn't vary much and W/L ratio increases. Opening suite was 3moves_Elo2200.epd. The results are here:
Code: Select all
K 60''+ 0.6'' vs 30''+ 0.3'':
Score of K2 vs K1: 1007 - 127 - 1866 [0.647] 3000
ELO difference: 105.00 +/- 7.34
W/L: 7.93
Normalized ELO: 0.543 +/- 0.0358
K 300''+ 3'' vs 150''+ 1.5'':
Score of K2 vs K1: 703 - 104 - 2193 [0.600] 3000
ELO difference: 70.32 +/- 6.19
W/L: 6.76
Normalized ELO: 0.417 +/- 0.0358
SF 60''+ 0.6'' vs 30''+ 0.3'':
Score of SF2 vs SF1: 890 - 100 - 2010 [0.632] 3000
ELO difference: 93.70 +/- 6.81
W/L: 8.90
Normalized ELO: 0.516 +/- 0.0358
SF 300''+ 3'' vs 150''+ 1.5'':
Score of SF2 vs SF1: 547 - 96 - 2357 [0.575] 3000
ELO difference: 52.63 +/- 5.56
W/L: 5.70
Normalized ELO: 0.343 +/- 0.0358
It is not what I expected. W/L decreases with time control instead of increasing, Normalized ELO decreases even more, instead of being constant. It is contrary to my built model from http://www.talkchess.com/forum/viewtopi ... 8&start=56 .