Most, if not all computer chess rating lists are based on logistic curve with,
D_ELO = -400/Log* Log[1/s-1],
It is based on the assumption that if player A scores k times as many points (with wins counting for 1, draws counting for 1/2) against player B, and player B scores k times as many points than his opponent C, then in a match between A and C, A should score k*k times as many points as C. Its inverse is
FIDE however uses Gaussian distribution for calculating ELO according to Arpad Elo:
http://www.fide.com/fide/handbook.html? ... ew=article
Arpad Elo assumed that, at any given time, every chess player has a normal (Gaussian) distribution of chess levels (i.e.ratings), all with the same standard deviation, sigma = 200 points, but each with a specific mean level. The distribution of the the difference of two Gaussian-distributed variables of the same standard deviation is itself a Gaussian, whose means the difference of the two means, and whose standard deviation is sqrt(2)*sigma.
s = (1+Erf[D_ELO/400])/2
D_ELO = 400 * InverseErf[2*s-1]
What started as a game to compare two widely spread engines, turned out more serious. I took Houdini 1.5a 64 bit, Houdini 1.5a 32 bit, and SOS 5.1 engine (and AnMon engine, which behaves similarly to SOS, but I will talk about SOS 5.1 mainly). I know how H15a x64 is compared to H15a x32 at desired time control (250ms per move) on my PC, 64 bit is 26% faster and 36 +/- 3 ELO points stronger than 32 bit one in these conditions. I ran H1.5a x64 against SOS 5.1 for 10,000 games at 250ms per move:
Code: Select all
Program Score 1 Houdini 1.5a 64 : 9732.5/10000 2 SOS 5.1 : 267.5/10000
Code: Select all
Program Score 1 Houdini 1.5a 32 : 9638.0/10000 2 SOS 5.1 : 362.0/10000
-400/Log* Log[1/0.97325-1]+400/Log* Log[1/0.9638-1] = 54.2 ELO points
The prediction of the Gaussian Model is that the difference is
400 * InverseErf[-1+2*0.97325]-400 * InverseErf[-1+2*0.9638] = 38.0 ELO points
The real difference in both ELO models is 36+/-3 points, which is predicted well by the Gaussian Model. The Logistic Model is completely off. The same happened with another engine, AnMon, and it seems the Gaussian Model (Arpad Elo's) is a better predictor for engine ratings comprising large ELO differences. A larger study including many engines would be nice, taking something like CCRL or CEGT database to verify this. One either has few engines with many games played, or many engines with fewer games played to be statistically consistent.
Some properties of the Gaussian Model compared to the Logistic one:
The ratio of the D_ELO Gaussian over D_ELO Logistic -400/Log* Log[1/s-1] / (400* InverseErf[2*s-1]
The ratio of the derivatives Logistic/Gaussian (400/((-1 + 1/s) s^2 Log)) / (400 E^InverseErf[-1 + 2 s]^2 Sqrt[Pi])
As can be seen, the ELO differences on the tails can be by 50% off between two models, so knowing which model to use is important for large ELO differences. As it is in engine ratings, the ratings are probably inflated, and a comparison across a wide range gives bad predictions using the Logistic Model.
For close matches (s around 1/2) Taylor series expansions to first order of formula gives
Logistic: D_ELO = 1600/Log * (s-1/2) ~ 694.9*(s-1/2)
Gaussian: D_ELO = 400*Sqrt[Pi] * (s-1/2) ~ 709.0*(s-1/2)
So, for small ELO differences, the model is not that important.