There are clearly two competing models:
Code: Select all
Rao-Kupper:
d = C*w*(1 - w - d)
d -> (C*w - C*w^2)/(1 + C*w)
Code: Select all
Davidson:
d^2 = C*w*(1 - w - d)
d -> 1/2 (-C*w + Sqrt[C]*Sqrt[w]*Sqrt[4 - 4 w + C*w])
Code: Select all
d^a = C*w(1 - w - d)
where a and C are now variables to fit the datapoints, a=1 being Rao-Kupper model (used in BayesElo) and a=2 being the Davidson model.
The Elo span in matches is 1000-1500 Elo points, so tails are visible.
Code: Select all
Anchor: SF depth 6
Komodo depths:
K1 : 1000 (+986,= 12,- 2), 99.2 %
K2 : 1000 (+930,= 60,- 10), 96.0 %
K3 : 1000 (+837,= 95,- 68), 88.4 %
K4 : 1000 (+663,=215,-122), 77.0 %
K5 : 1000 (+488,=207,-305), 59.2 %
K6 : 1000 (+192,=248,-560), 31.6 %
K7 : 1000 (+ 84,=168,-748), 16.8 %
K8 : 1000 (+ 18,=108,-874), 7.2 %
K9 : 1000 (+ 14,= 46,-940), 3.7 %
K10 : 1000 (+ 2,= 26,-972), 1.5 %
model: d^a = C*w*(1-d-w)
Least Squares:
a = 1.955
C = 0.426
Anchor: IvanHoe depth 5
Komodo depths:
K1 : 1000 (+982,= 14,- 4), 98.9 %
K2 : 1000 (+946,= 39,- 15), 96.5 %
K3 : 1000 (+834,= 99,- 67), 88.3 %
K4 : 1000 (+630,=214,-156), 73.7 %
K5 : 1000 (+440,=249,-311), 56.5 %
K6 : 1000 (+173,=248,-579), 29.7 %
K7 : 1000 (+ 55,=169,-776), 14.0 %
K8 : 1000 (+ 19,=102,-879), 7.0 %
K9 : 1000 (+ 4,= 34,-962), 2.1 %
model: d^a = C*w*(1-d-w)
Least Squares
a = 1.639
C = 0.818
Anchor: SF depth 5
Houdini depths:
H1 : 1000 (+907,= 79,- 14), 94.7 %
H2 : 1000 (+815,=148,- 37), 88.9 %
H3 : 1000 (+594,=293,-113), 74.1 %
H4 : 1000 (+403,=375,-222), 59.1 %
H5 : 1000 (+203,=413,-384), 40.9 %
H6 : 1000 (+ 86,=312,-602), 24.2 %
H7 : 1000 (+ 30,=243,-727), 15.2 %
H8 : 1000 (+ 18,=191,-791), 11.3 %
H9 : 1000 (+ 3,= 95,-902), 5.1 %
model: d^a = C*w*(1-d-w)
Least Squares:
a = 1.941
C = 1.771
Anchor: Komodo depth 5
Houdini depths:
H1 : 1000 (+928,= 67,- 5), 96.2 %
H2 : 1000 (+878,=106,- 16), 93.1 %
H3 : 1000 (+724,=211,- 65), 83.0 %
H4 : 1000 (+524,=301,-175), 67.5 %
H5 : 1000 (+310,=396,-294), 50.8 %
H6 : 1000 (+139,=385,-476), 33.1 %
H7 : 1000 (+ 54,=327,-619), 21.8 %
H8 : 1000 (+ 22,=252,-726), 14.8 %
H9 : 1000 (+ 5,=113,-882), 6.2 %
model: d^a = C*w*(1-d-w)
Least Sqaures:
a = 2.476
C = 0.931
Anchor: Houdini depth 5
SF depths:
S1 : 1000 (+925,= 72,- 3), 96.1 %
S2 : 1000 (+883,=109,- 8), 93.8 %
S3 : 1000 (+762,=208,- 30), 86.6 %
S4 : 1000 (+620,=311,- 69), 77.5 %
S5 : 1000 (+405,=394,-201), 60.2 %
S6 : 1000 (+241,=380,-379), 43.1 %
S7 : 1000 (+117,=315,-568), 27.5 %
S8 : 1000 (+ 41,=235,-724), 15.8 %
S9 : 1000 (+ 5,=143,-852), 7.6 %
S10 : 1000 (+ 7,= 72,-921), 4.3 %
S11 : 1000 (+ 0,= 49,-951), 2.5 %
model: d^a = C*w*(1-d-w)
Least Sqaures:
a = 2.167
C = 1.457
All in all, the Davidson model P(D|E) = C*Sqrt[P(W|E)*P(L|E)], which assumes that 1 win + 1 loss = 2 draws fits much better the empirical data than Rao-Kupper model used in BayesElo P(D|E) = C*P(W|E)*P(L|E), where 1 win + 1 loss = 1 draw. So, it's not advisable to use BayesElo for computer chess ratings. The model assumed in Ordo is 1 win + 1 loss = 2 draws, so it would be advisable to use Ordo in computer chess ratings instead of BayesElo.