Questions have been raised about Elo differences and error margins calculated with Bayeselo in the following threads:
SF-McBrain v3.0 TCEC-X RELEASE
Noomen KI Fianchetto H6.02 vs. K11.2.2 30m+30s 16-core
Fishy Bayeselo numbers?
Moderators: hgm, Rebel, chrisw
-
- Posts: 2041
- Joined: Wed Mar 08, 2006 8:30 pm
Re: Fishy Bayeselo numbers?
Indeed, for example,
how can a match with final score: 56-44 (+15=82-3),
leading to the "usual" (Elostat) +42 Elo difference,
gives a measly +15 Elo using the BayesElo computation !
(or does the +15 for the winner, -15 for the loser mean a +30 Elo difference, still quite different from +42, but less startling...)
And results can also be very different concerning the error-bars (2-sigma).
how can a match with final score: 56-44 (+15=82-3),
leading to the "usual" (Elostat) +42 Elo difference,
gives a measly +15 Elo using the BayesElo computation !
(or does the +15 for the winner, -15 for the loser mean a +30 Elo difference, still quite different from +42, but less startling...)
And results can also be very different concerning the error-bars (2-sigma).
-
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: Fishy Bayeselo numbers?
this just illustrates the different settings that can be used - this pgn file is the current tcec 10 pgn file:ernest wrote:Indeed, for example,
how can a match with final score: 56-44 (+15=82-3),
leading to the "usual" (Elostat) +42 Elo difference,
gives a measly +15 Elo using the BayesElo computation !
(or does the +15 for the winner, -15 for the loser mean a +30 Elo difference, still quite different from +42, but less startling...)
And results can also be very different concerning the error-bars (2-sigma).
[Mac-Pro:~/cluster.mfb]
Code: Select all
michaelbyrne% bay
version 0058, Copyright (C) 1997-2016 Remi Coulom and updated by Michael Byrne.
compiled Jul 24 2016 00:03:35.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under the terms and conditions of the GNU General Public License.
See http://www.gnu.org/copyleft/gpl.html for details.
ResultSet>
ResultSet>rp /Users/michaelbyrne/Downloads/dl.php-7.pgn
112 game(s) loaded
ResultSet>elo
ResultSet-EloRating>mm
00:00:00,00
ResultSet-EloRating>r
Rank Name Rating Δ + - # Σ Σ% W L D W% =% OppR
---------------------------------------------------------------------------------------------------------
1 Stockfish 041017 3394 0.0 211 211 9 8.5 94.4 8 0 1 88.9 11.1 3105
2 Komodo 1937.00 3302 92.5 174 174 10 8.5 85.0 7 0 3 70.0 30.0 3089
3 Fire 6.1 3212 89.5 180 180 9 6.0 66.7 5 2 2 55.6 22.2 3109
4 Houdini 6.02 3200 11.7 155 155 10 7.0 70.0 4 0 6 40.0 60.0 3086
5 Ginkgo 2 3188 12.3 170 170 9 6.0 66.7 4 1 4 44.4 44.4 3101
6 Chiron 040917 3187 1.4 160 160 9 5.5 61.1 3 1 5 33.3 55.6 3139
7 Andscacs 0.92 3182 4.9 158 158 9 6.0 66.7 3 0 6 33.3 66.7 3072
8 Jonny 8.1 3180 2.0 172 172 9 5.0 55.6 4 3 2 44.4 22.2 3154
9 Bobcat 8 3158 21.4 168 168 9 6.0 66.7 4 1 4 44.4 44.4 3060
10 Vajolet2 2.3.2 3142 16.7 155 155 9 5.0 55.6 2 1 6 22.2 66.7 3110
11 Hannibal 121017 3136 5.9 159 159 9 4.5 50.0 2 2 5 22.2 55.6 3132
12 Gull 3 3133 2.4 167 167 9 6.0 66.7 4 1 4 44.4 44.4 3038
13 Booot 6.2 3114 19.9 150 150 10 6.0 60.0 3 1 6 30.0 60.0 3060
14 Nirvana 2.4 3104 9.1 154 154 10 4.5 45.0 2 3 5 20.0 50.0 3128
15 Texel 1.07a35 3092 12.8 158 158 10 5.0 50.0 3 3 4 30.0 40.0 3076
16 Fizbo 1.91 3091 1.0 179 179 9 4.0 44.4 3 4 2 33.3 22.2 3112
17 Wasp 2.5 3055 35.3 170 170 9 4.0 44.4 2 3 4 22.2 44.4 3084
18 Rybka 4.1 3037 18.2 161 161 9 3.5 38.9 1 3 5 11.1 55.6 3102
19 Gaviota 1.01 2983 54.0 172 172 9 2.5 27.8 1 5 3 11.1 33.3 3099
20 Arasan 20.2 2977 6.2 176 176 10 3.0 30.0 2 6 2 20.0 20.0 3099
21 Fruit 3.2 2969 7.9 182 182 9 2.5 27.8 2 6 1 22.2 11.1 3097
22 Nemorino 3.04 2894 75.0 186 186 10 1.0 10.0 0 8 2 0.0 20.0 3132
23 Laser 200917 2859 35.2 202 202 9 1.0 11.1 0 7 2 0.0 22.2 3099
24 Hakkapeliitta 210416 2811 48.1 211 211 10 1.0 10.0 1 9 0 10.0 0.0 3089
---------------------------------------------------------------------------------------------------------
Δ = delta from the next higher rated opponent
# = number of games played
Σ = total score, 1 point for win, 1/2 point for draw
Code: Select all
ResultSet-EloRating>x ## this moves back the menu
ResultSet>reset ##this resets bayeselo to clean state
ResultSet>rp /Users/michaelbyrne/Downloads/dl.php-7.pgn
112 game(s) loaded
ResultSet>elo
ResultSet-EloRating>mm 1 1 ## what I normally use
Iteration 100: 3e-05
00:00:00,00
ResultSet-EloRating>covariance ## what I normally use
ResultSet-EloRating>r
Rank Name Rating Δ + - # Σ Σ% W L D W% =% OppR
---------------------------------------------------------------------------------------------------------
1 Stockfish 041017 3360 0.0 117 117 9 8.5 94.4 8 0 1 88.9 11.1 3105
2 Komodo 1937.00 3285 75.0 102 102 10 8.5 85.0 7 0 3 70.0 30.0 3089
3 Fire 6.1 3206 79.2 110 110 9 6.0 66.7 5 2 2 55.6 22.2 3109
4 Houdini 6.02 3193 13.0 102 102 10 7.0 70.0 4 0 6 40.0 60.0 3087
5 Chiron 040917 3190 3.2 106 106 9 5.5 61.1 3 1 5 33.3 55.6 3134
6 Ginkgo 2 3189 0.7 107 107 9 6.0 66.7 4 1 4 44.4 44.4 3100
7 Jonny 8.1 3175 14.2 106 106 9 5.0 55.6 4 3 2 44.4 22.2 3148
8 Andscacs 0.92 3173 2.0 106 106 9 6.0 66.7 3 0 6 33.3 66.7 3077
9 Bobcat 8 3153 20.5 103 103 9 6.0 66.7 4 1 4 44.4 44.4 3066
10 Vajolet2 2.3.2 3136 16.5 104 104 9 5.0 55.6 2 1 6 22.2 66.7 3109
11 Gull 3 3131 5.6 101 101 9 6.0 66.7 4 1 4 44.4 44.4 3045
12 Hannibal 121017 3128 2.9 103 103 9 4.5 50.0 2 2 5 22.2 55.6 3128
13 Booot 6.2 3114 13.3 95 95 10 6.0 60.0 3 1 6 30.0 60.0 3063
14 Nirvana 2.4 3099 15.2 93 93 10 4.5 45.0 2 3 5 20.0 50.0 3127
15 Texel 1.07a35 3086 12.7 101 101 10 5.0 50.0 3 3 4 30.0 40.0 3077
16 Fizbo 1.91 3081 5.8 105 105 9 4.0 44.4 3 4 2 33.3 22.2 3110
17 Wasp 2.5 3063 17.6 105 105 9 4.0 44.4 2 3 4 22.2 44.4 3086
18 Rybka 4.1 3040 23.3 100 100 9 3.5 38.9 1 3 5 11.1 55.6 3102
19 Arasan 20.2 2992 48.1 102 102 10 3.0 30.0 2 6 2 20.0 20.0 3101
20 Gaviota 1.01 2986 5.5 99 99 9 2.5 27.8 1 5 3 11.1 33.3 3098
21 Fruit 3.2 2974 12.5 102 102 9 2.5 27.8 2 6 1 22.2 11.1 3096
22 Nemorino 3.04 2918 56.1 104 104 10 1.0 10.0 0 8 2 0.0 20.0 3129
23 Laser 200917 2872 45.3 118 118 9 1.0 11.1 0 7 2 0.0 22.2 3100
24 Hakkapeliitta 210416 2856 16.2 115 115 10 1.0 10.0 1 9 0 10.0 0.0 3090
---------------------------------------------------------------------------------------------------------
Δ = delta from the next higher rated opponent
# = number of games played
Σ = total score, 1 point for win, 1/2 point for draw
ResultSet-EloRating>
-
- Posts: 1968
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Fishy Bayeselo numbers?
Hello:
Just my two cents.
The draw ratio is very high, so unexpected results might show.
I tried the following version of Bayeselo with the 100-game PGN provided by Michael:
These are the results with some settings:
------------------------
1.- Poppins' settings:
http://www.talkchess.com/forum/viewtopi ... 84&t=65404
Identical results except the number of decimals.
------------------------
2.- Own settings (mm 0 1; 68.27% confidence ~ 1-sigma confidence):
I get 37.56 Bayeselo difference with 1-sigma error bars of ±11.06 Bayeselo. LOS(Houdini) > 99% and LOS(Komodo) < 1%.
------------------------
3.- Own settings (mm 0 1; 95% confidence ~ 1.96-sigma confidence):
I get 37.56 Bayeselo difference again. 1.96-sigma error bars are ±21.67 Bayeselo. LOS is the same as before.
------------------------
I used two different confidence levels to see what happens. Since scores are close enough to 50%, I would expect that:
It happens in EloSTAT model for scores close to 50% IIRC. In fact, with z2 = 1 and z3 = 1.96:
Which confirms that confidence setting works in the version that I use.
I have not used Bayeselo for years and I struggled a bit to remember that I was used to type:
There is not a special reason to use these settings, but people could feel more confortable in this particular 100-game match with 37.56 ± 21.67 than with 30 ± 24.
Just for the record, I get more less 42 ± 28 Elo (95% confidence) and LOS(Houdini) ~ 99.8% with my own calculator, which gives very similar results than EloSTAT in two-engine matches. Again, I have been a lot of time without computing error bars.
Regards from Spain.
Ajedrecista.
Just my two cents.
Be sure that it means 15 - (-15) = 30 Bayeselo difference. Please remember that 1 Bayeselo =/= 1 logistic Elo!ernest wrote:Indeed, for example,
how can a match with final score: 56-44 (+15=82-3),
leading to the "usual" (Elostat) +42 Elo difference,
gives a measly +15 Elo using the BayesElo computation !
(or does the +15 for the winner, -15 for the loser mean a +30 Elo difference, still quite different from +42, but less startling...)
And results can also be very different concerning the error-bars (2-sigma).
The draw ratio is very high, so unexpected results might show.
I tried the following version of Bayeselo with the 100-game PGN provided by Michael:
Code: Select all
version 0057.2, Copyright (C) 1997-2010 Remi Coulom.
compiled Apr 5 2012 17:26:01.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under the terms and conditions of the GNU General Public License.
See http://www.gnu.org/copyleft/gpl.html for details.
ResultSet>readpgn H602-K1122 Noomen KI Fianchetto.pgn
100 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>elo
------------------------
1.- Poppins' settings:
http://www.talkchess.com/forum/viewtopi ... 84&t=65404
Myself:tpoppins wrote:Code: Select all
ResultSet>elo ResultSet-EloRating>mm 00:00:00,00 ResultSet-EloRating>exactdist 00:00:00,00 ResultSet-EloRating>ratings Rank Name Elo + - games score oppo. draws 1 Houdini 6.02 Pro x64-popcnt 15 24 24 100 56% -15 82% 2 Komodo 11.2.2 64-bit -15 24 24 100 44% 15 82%
Code: Select all
ResultSet>elo
ResultSet-EloRating>mm
00:00:00,00
ResultSet-EloRating>exactdist
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo Diff + - Games Score Oppo. Draws Win W-L-D
1 Houdini 6.02 Pro x64-popcnt 15.25 0.00 24.46 24.25 100 56.00% -15.25 82.00% 15.00% 15-3-82
2 Komodo 11.2.2 64-bit -15.25 -30.50 24.25 24.46 100 44.00% 15.25 82.00% 3.00% 3-15-82
------------------------
2.- Own settings (mm 0 1; 68.27% confidence ~ 1-sigma confidence):
Code: Select all
ResultSet>elo
ResultSet-EloRating>confidence 0.6827
0.6827
ResultSet-EloRating>mm 0 1
Iteration 100: 0.00214375
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo Diff + - Games Score Oppo. Draws Win W-L-D
1 Houdini 6.02 Pro x64-popcnt 18.78 0.00 11.06 11.06 100 56.00% -18.78 82.00% 15.00% 15-3-82
2 Komodo 11.2.2 64-bit -18.78 -37.56 11.06 11.06 100 44.00% 18.78 82.00% 3.00% 3-15-82
ResultSet-EloRating>los
Ho Ko
Houdini 6.02 Pro x64-popcnt 99
Komodo 11.2.2 64-bit 0
ResultSet-EloRating>
------------------------
3.- Own settings (mm 0 1; 95% confidence ~ 1.96-sigma confidence):
Code: Select all
ResultSet>elo
ResultSet-EloRating>confidence 0.95
0.95
ResultSet-EloRating>mm 0 1
Iteration 100: 0.00214375
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo Diff + - Games Score Oppo. Draws Win W-L-D
1 Houdini 6.02 Pro x64-popcnt 18.78 0.00 21.67 21.67 100 56.00% -18.78 82.00% 15.00% 15-3-82
2 Komodo 11.2.2 64-bit -18.78 -37.56 21.67 21.67 100 44.00% 18.78 82.00% 3.00% 3-15-82
ResultSet-EloRating>los
Ho Ko
Houdini 6.02 Pro x64-popcnt 99
Komodo 11.2.2 64-bit 0
------------------------
I used two different confidence levels to see what happens. Since scores are close enough to 50%, I would expect that:
Code: Select all
z2-sigma ==> ±err2
z3-sigma ==> ±err3
(z3)/(z2) ~ |err3|/|err2|
Code: Select all
(21.67)/(11.06) ~ 1.9593
I have not used Bayeselo for years and I struggled a bit to remember that I was used to type:
Code: Select all
readpgn [...].pgn
elo
confidence 0.95
mm 0 1
ratings
los
Just for the record, I get more less 42 ± 28 Elo (95% confidence) and LOS(Houdini) ~ 99.8% with my own calculator, which gives very similar results than EloSTAT in two-engine matches. Again, I have been a lot of time without computing error bars.
Regards from Spain.
Ajedrecista.
-
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: Fishy Bayeselo numbers?
A run using the pgn above with the parameters.
Which is aligned with your last two runs. Will use these going forward.
Code: Select all
Mac-Pro:cluster.mfb michaelbyrne$ bay
version 0058, Copyright (C) 1997-2016 Remi Coulom and updated by Michael Byrne.
compiled Jul 24 2016 00:03:35.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under the terms and conditions of the GNU General Public License.
See http://www.gnu.org/copyleft/gpl.html for details.
ResultSet>rp /Users/michaelbyrne/Downloads/H602-K1122NoomenKIFianchetto.pgn
100 game(s) loaded
ResultSet>elo
ResultSet-EloRating>confidence 0.95
0.9
ResultSet-EloRating>mm 0 1
Iteration 100: 0.002
00:00:00,00
ResultSet-EloRating>r
Rank Name Rating Δ + - # Σ Σ% W L D W% =% OppR
---------------------------------------------------------------------------------------------------------
1 Houdini 6.02 Pro x64-popcnt 3119 0.0 22 22 100 56.0 56.0 15 3 82 15.0 82.0 3081
2 Komodo 11.2.2 64-bit 3081 37.6 22 22 100 44.0 44.0 3 15 82 3.0 82.0 3119
---------------------------------------------------------------------------------------------------------
Δ = delta from the next higher rated opponent
# = number of games played
Σ = total score, 1 point for win, 1/2 point for draw
ResultSet-EloRating>los
Ho Ko
Houdini 6.02 Pro x64-popcnt 99
Komodo 11.2.2 64-bit 0
ResultSet-EloRating>