I agree with Miguel here: if SF won the RR (more points than its opponents) then it should have the top rating. Of course differences are too narrow and well inside the error bars, as both Ingo and Miguel wrote. A similar thing happens with Naum and Texel.michiguel wrote:There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.IWB wrote:Hello all,
This is quite interesting:
The official method for the IPON is Bayeselo with mm 0 1, draw rate consideration. The pure TOP 16 one on one looks like this:
The same set of data with Bayes default:Code: Select all
1 Houdini 4 3111 9 9 3300 75% 2921 31% 2 Stockfish 5 3106 9 8 3300 75% 2921 39% 3 Komodo 7a 3088 9 9 3300 72% 2922 37% 4 Gull 3 3057 8 8 3300 68% 2924 41% 5 Critter 1.4a 2980 8 8 3300 57% 2930 46% 6 Equinox 2.02 2975 8 8 3300 56% 2930 47% 7 Deep Rybka 4.1 2959 8 8 3300 54% 2931 45% 8 Deep Fritz 14 2894 8 8 3300 44% 2935 45% 9 Chiron 2 2889 8 8 3300 44% 2936 45% 10 Protector 1.6.0 2870 8 8 3300 41% 2937 44% 11 Hannibal 1.4b 2870 8 8 3300 41% 2937 43% 12 Naum 4.2 2838 8 9 3300 36% 2939 41% 13 Texel 1.04 2838 8 8 3300 37% 2939 38% 14 Senpai 1.0 2838 8 8 3300 36% 2939 41% 15 HIARCS 14 WCSC 32b 2812 9 9 3300 33% 2941 37% 16 Jonny 6.00 2798 9 9 3300 31% 2942 36%
Now with Elostat:Code: Select all
1 Houdini 4 3111 11 11 3300 75% 2931 31% 2 Stockfish 5 3105 10 10 3300 75% 2931 39% 3 Komodo 7a 3088 10 10 3300 72% 2932 37% 4 Gull 3 3057 10 10 3300 68% 2934 41% 5 Critter 1.4a 2984 10 9 3300 57% 2939 46% 6 Equinox 2.02 2980 9 10 3300 56% 2939 47% 7 Deep Rybka 4.1 2964 10 10 3300 54% 2940 45% 8 Deep Fritz 14 2905 9 10 3300 44% 2944 45% 9 Chiron 2 2900 10 10 3300 44% 2945 45% 10 Protector 1.6.0 2883 10 10 3300 41% 2946 44% 11 Hannibal 1.4b 2883 10 10 3300 41% 2946 43% 12 Naum 4.2 2854 10 10 3300 36% 2948 41% 13 Texel 1.04 2854 10 10 3300 37% 2948 38% 14 Senpai 1.0 2853 10 10 3300 36% 2948 41% 15 HIARCS 14 WCSC 32b 2830 10 10 3300 33% 2949 37% 16 Jonny 6.00 2816 10 10 3300 31% 2950 36%
and finaly with ORDO:Code: Select all
1 Stockfish 5 : 3115 10 10 3300 74.9 % 2924 38.6 % 2 Houdini 4 : 3111 11 10 3300 74.5 % 2925 30.7 % 3 Komodo 7a : 3091 10 10 3300 72.1 % 2926 37.0 % 4 Gull 3 : 3059 9 9 3300 68.0 % 2928 41.0 % 5 Critter 1.4a : 2982 9 9 3300 57.0 % 2933 46.1 % 6 Equinox 2.02 : 2978 9 9 3300 56.3 % 2933 46.9 % 7 Deep Rybka 4.1 : 2962 9 9 3300 53.9 % 2935 45.2 % 8 Deep Fritz 14 : 2899 9 9 3300 44.4 % 2939 44.9 % 9 Chiron 2 : 2894 9 9 3300 43.5 % 2939 45.1 % 10 Protector 1.6.0 : 2877 9 9 3300 40.9 % 2940 44.1 % 11 Hannibal 1.4b : 2875 9 9 3300 40.7 % 2940 42.6 % 12 Texel 1.04 : 2846 9 9 3300 36.5 % 2942 38.5 % 13 Naum 4.2 : 2845 9 9 3300 36.4 % 2942 40.9 % 14 Senpai 1.0 : 2845 9 9 3300 36.3 % 2942 40.7 % 15 HIARCS 14 WCSC 32b : 2822 10 10 3300 33.2 % 2944 37.5 % 16 Jonny 6.00 : 2808 10 10 3300 31.2 % 2945 35.7 %
That is very good, as everyone can take the list he likesCode: Select all
# PLAYER : RATING POINTS PLAYED (%) 1 Stockfish 5 : 3115.1 2473.0 3300 74.9% 2 Houdini 4 : 3111.0 2458.5 3300 74.5% 3 Komodo 7a : 3089.3 2379.0 3300 72.1% 4 Gull 3 : 3054.9 2245.5 3300 68.0% 5 Critter 1.4a : 2968.9 1882.0 3300 57.0% 6 Equinox 2.02 : 2963.8 1859.5 3300 56.3% 7 Deep Rybka 4.1 : 2945.6 1778.5 3300 53.9% 8 Deep Fritz 14 : 2875.7 1464.5 3300 44.4% 9 Chiron 2 : 2869.4 1436.5 3300 43.5% 10 Protector 1.6.0 : 2850.1 1351.0 3300 40.9% 11 Hannibal 1.4b : 2848.3 1343.0 3300 40.7% 12 Texel 1.04 : 2816.4 1204.5 3300 36.5% 13 Naum 4.2 : 2815.5 1200.5 3300 36.4% 14 Senpai 1.0 : 2814.9 1198.0 3300 36.3% 15 HIARCS 14 WCSC 32b : 2790.6 1096.0 3300 33.2% 16 Jonny 6.00 : 2774.4 1030.0 3300 31.2%
Regards
Ingo
1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%
Miguel
I ran my own clumsy rating programme (which is very similar to EloSTAT in the results). Engine 01, engine 02, etc. correspond to the engine in that position of EloSTAT and Ordo lists:
Code: Select all
Round Robin with 16 engines and 3300 games per engine.
Total number of games: 26400 games.
Engines: Performance: Score:
Engine 01: 3114.77 74.94 %
Engine 02: 3111.00 74.50 %
Engine 03: 3091.03 72.09 %
Engine 04: 3059.72 68.05 %
Engine 05: 2983.07 57.03 %
Engine 06: 2978.56 56.35 %
Engine 07: 2962.47 53.89 %
Engine 08: 2900.56 44.38 %
Engine 09: 2894.97 43.53 %
Engine 10: 2877.75 40.94 %
Engine 11: 2876.12 40.70 %
Engine 12: 2847.39 36.50 %
Engine 13: 2846.54 36.38 %
Engine 14: 2846.01 36.30 %
Engine 15: 2823.90 33.21 %
Engine 16: 2809.04 31.21 %
Mean of ratings: 2938.93 Elo.
Code: Select all
Bayeselo Bayeselo EloSTAT Ordo My tool
(mm 0 1) (default)
313 295 307 340.7 305.73 Max.(list) - min.(list)
2932.69 2941.69 2938.06 2918.99 2938.93 Average of ratings.
108.63 101.75 106.67 118.76 106.17 Sample standard deviation of ratings.
1.6415 1.6641 1.6212 1.6168 1.6207 Houdini 4
1.5955 1.6051 1.6587 1.6513 1.6562 Stockfish 5
1.4298 1.4380 1.4337 1.4341 1.4326 Komodo 7a
1.1444 1.1333 1.1337 1.1444 1.1377 Gull 3
0.4356 0.4159 0.4119 0.4202 0.4157 Critter 1.4a
0.3895 0.3765 0.3744 0.3773 0.3733 Equinox 2.02
0.2422 0.2193 0.2244 0.2240 0.2217 Deep Rybka 4.1
-0.3561 -0.3606 -0.3662 -0.3646 -0.3614 Deep Fritz 14
-0.4022 -0.4097 -0.4131 -0.4176 -0.4141 Chiron 2
-0.5771 -0.5768 -0.5724 -0.5801 -0.5763 Protector 1.6.0
-0.5771 -0.5768 -0.5912 -0.5953 -0.5916 Hannibal 1.4b
-0.8717 -0.8618 -0.8724 -0.8715 -0.8702 Naum 4.2
-0.8717 -0.8618 -0.8630 -0.8639 -0.8622 Texel 1.04
-0.8717 -0.8717 -0.8724 -0.8765 -0.8752 Senpai 1.0
-1.1110 -1.0977 -1.0880 -1.0812 -1.0835 HIARCS 14 WCSC 32b
-1.2399 -1.2353 -1.2193 -1.2176 -1.2235 Jonny 6.00
Regards from Spain.
Ajedrecista.