Empirically Logistic ELO model better suited than Gaussian
Posted: Tue Jul 12, 2016 8:37 am
I let play a massive amount of games (total 105,000) in round-robin at fixed nodes between different engines like Stockfish, Texel, Andscacs, etc. for accuracy. The engines were distanced between themselves by an order of 200 ELO points each, so that each individual ELO interval between them is almost linear in ELO-score and independent of the ELO model. The largest total difference between engines was of order of 1400 ELO points, I needed large differences because large differences between ELO models occur for large ELO differences. For each individual match I computed the total Logistic ELO difference, on large ELO intervals. This is the horizontal axis. Then, the consistent ELO is the sum of small differences between engines cumulated to give the total difference. If the Logistic model is consistent these two should be equal, and the diagonal from (0,0) to (1400,1400) would be the fit. If the Gaussian or other model is more consistent, the dots should deviate from the diagonal. They do not very much. Gaussian model seems ruled out, and Logistic ELO model for computer chess engines seems to stand well on this try. My earlier results were mixed because of fewer data points and fewer games for each data point.
The data:
The plot:
The data:
Code: Select all
Individual statistics:
1 SF2 : 2381 35000 (+32134,=1275,-1591), 93.6 %
T1 : 7000 (+6950,= 44,- 6), 99.6 %
Ha1 : 7000 (+6996,= 4,- 0), 100.0 %
T2 : 7000 (+4574,=969,-1457), 72.3 %
R2 : 7000 (+6625,=248,-127), 96.4 %
R1 : 7000 (+6989,= 10,- 1), 99.9 %
2 T2 : 2232 35000 (+28760,=1323,-4917), 84.1 %
SF2 : 7000 (+1457,=969,-4574), 27.7 %
T1 : 7000 (+6968,= 29,- 3), 99.8 %
Ha1 : 7000 (+6991,= 8,- 1), 99.9 %
R2 : 7000 (+6355,=308,-337), 93.0 %
R1 : 7000 (+6989,= 9,- 2), 99.9 %
3 R2 : 2051 35000 (+20528,=1016,-13456), 60.1 %
SF2 : 7000 (+127,=248,-6625), 3.6 %
T1 : 7000 (+6302,=332,-366), 92.4 %
Ha1 : 7000 (+6910,= 45,- 45), 99.0 %
T2 : 7000 (+337,=308,-6355), 7.0 %
R1 : 7000 (+6852,= 83,- 65), 98.5 %
4 T1 : 1898 35000 (+11060,=1952,-21988), 34.4 %
SF2 : 7000 (+ 6,= 44,-6950), 0.4 %
Ha1 : 7000 (+5750,=554,-696), 86.1 %
T2 : 7000 (+ 3,= 29,-6968), 0.2 %
R2 : 7000 (+366,=332,-6302), 7.6 %
R1 : 7000 (+4935,=993,-1072), 77.6 %
5 R1 : 1778 35000 (+5667,=1666,-27667), 18.6 %
SF2 : 7000 (+ 1,= 10,-6989), 0.1 %
T1 : 7000 (+1072,=993,-4935), 22.4 %
Ha1 : 7000 (+4527,=571,-1902), 68.8 %
T2 : 7000 (+ 2,= 9,-6989), 0.1 %
R2 : 7000 (+ 65,= 83,-6852), 1.5 %
6 Ha1 : 1661 35000 (+2644,=1182,-31174), 9.2 %
SF2 : 7000 (+ 0,= 4,-6996), 0.0 %
T1 : 7000 (+696,=554,-5750), 13.9 %
T2 : 7000 (+ 1,= 8,-6991), 0.1 %
R2 : 7000 (+ 45,= 45,-6910), 1.0 %
R1 : 7000 (+1902,=571,-4527), 31.2 %