H4 or S5 !?

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: H4 or S5 !?

Post by michiguel »

lkaufman wrote:
Modern Times wrote:
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.

Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome. The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?
Right, that is the way to frame it.

You play A and B. Winning against A and losing to B should be different than losing to A and beating B? or should it be different than drawing to both?

Miguel
PK
Posts: 904
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: H4 or S5 !?

Post by PK »

Ideally, winning against a stronger opponent and losing against a weaker one should give the same rating, but bigger rating variance (in other words, a two-dimensional model should be used)
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: H4 or S5 !?

Post by Milos »

lkaufman wrote:
Modern Times wrote:
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.

Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome. The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?
Rating as a statistical category means strength of an engine which is unknown and is just estimated using its performance, therefore you can not use strength estimated from the same results as an a priori information in order to estimate strength coz that would be circular reasoning.
You could only do that only if your a priori strength estimation is independent (not derived from current results) and if you knew how reliable this a priori strength estimation is (i.e. error bars).
Last edited by Milos on Wed Jun 04, 2014 10:15 pm, edited 2 times in total.
Modern Times
Posts: 3703
Joined: Thu Jun 07, 2012 11:02 pm

Re: H4 or S5 !?

Post by Modern Times »

Can I point you to one of HG's online tournaments:

http://www.talkchess.com/forum/viewtopi ... =&start=10

Look at the engine rankings, then look at the "Performance" column, which is the Elo calculation of the engines just on the games played in that tournament. Note that WaDuttie, in 8th place with 5 points, has a higher Elo performance rating than Telepath in 4th with 5.5 points. That directly contradicts what Larry and Miguel have been saying, yes ? Or have I misunderstood. I don't know what tool is used for those ratings.

Code: Select all

:Tourney Players: Round 9 of 9 
: 
:     Name              Rating Score Perfrm Upset  Results 
:     ----------------- ------ ----- ------ ------ ------- 
:  1 -NightmareX        [1958]  8.0  [2133] [ 162] +05w +18w +03b =02w +12b +08w +11b =09b +07w 
:  2 +Texel             [2163]  7.5  [2049] [   0] +19w =12w +06b =01b +08w +03w =07b +11w +09b 
:  3 +Goldbar           [1891]  6.0  [1980] [ 155] +13w +17b -01w +07b =11w -02b +06w =08w +12b 
:  4 +Telepath          [   0]  5.5  [1825] [9213] =07b -08w -11b +16w +20b +17w -09w +14b +19b 
:  5 -Fizbo             [   0]  5.5  [1791] [8908] -01b +20w +17w -08b -09b +19w +13w =06b +16w 
:  6 +Almere            [   0]  5.5  [1928] [9824] =08b +11w -02w +10b +07w =12w -03b =05w +20b 
:  7 +GaviotaRB         [2018]  5.0  [1850] [  72] =04w +14b +12w -03w -06b +13b =02w +15w -01b 
:  8 -WaDuuttie         [1947]  5.0  [1846] [   0] =06w +04b +09w +05w -02b -01b =12w =03b =11w 
:  9 +Baron             [   0]  5.0  [1869] [8714] =11b +15w -08b -12b +05w +14w +04b =01w -02w 
: 10 +NebiyuHG          [   0]  5.0  [1702] [8048] -12b +19b +18w -06w +14b -11w -15b +20w +17b 
: 11 -Rookie            [1783]  4.5  [1814] [ 136] =09w -06b +04w +13w =03b +10b -01w -02b =08b 
: 12 +ArasanX           [1727]  4.5  [1853] [ 328] +10w =02b -07b +09w -01w =06b =08b +17w -03w 
: 13 +RuyDos            [   0]  4.5  [1747] [7525] -03b +16w +14w -11b =17b -07w -05b +18b +15w 
: 14 +Joker             [1808]  4.0  [1617] [   0] +16w -07w -13b +19b -10w -09b +20b -04w +18w 
: 15 +PuppetMaster      [1686]  4.0  [1617] [   0] -17w -09b -16b +20w +19w +18b +10w -07b -13b 
: 16 +Spartacus         [   0]  3.5  [1551] [5491] -14b -13b +15w -04b =18w +20w -17b +19w -05b 
: 17 -Capivara          [   0]  3.5  [1599] [5717] +15b -03w -05b +18b =13w -04b +16w -12b -10w 
: 18 +microMax          [1519]  2.5  [1510] [ 234] +20w -01b -10b -17w =16b -15w +19b -13w -14b 
: 19 +Skipper           [1371]  1.0  [1416] [ 304] -02b -10w +20b -14w -15b -05b -18w -16b -04w 
: 20 +NumptyX           [   0]  0.0  [1239] [   0] -18b -05b -19w -15b -04w -16b -14w -10b -06w 
: 
:     Average Rating    1806.5 
: 
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: H4 or S5 !?

Post by Adam Hair »

Milos wrote:
lkaufman wrote:
Modern Times wrote:
michiguel wrote: I do not see any compelling reason why a given game should weight more than others.

Miguel
A win against a strong opponent is surely worth more than a win against a weaker opponent ?
All rating systems give that outcome. The question is: Should a win against a strong opponent and a loss to a weak opponent be treated differently than a loss to a strong opponent and a win against a weak opponent?
Rating as a statistical category means strength of an engine which is unknown and is just estimated using its performance, therefore you can not use strength estimated from the same results as an a priori information in order to estimate strength coz that would be circular reasoning.
You could only do that only if your a priori strength estimation is independent (not derived from current results) and if you knew how reliable this a priori strength estimation is (i.e. error bars).
I agree. All of the games must be weighted the same unless we have a priori information on which to base the weighting on.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: H4 or S5 !?

Post by Adam Hair »

Modern Times wrote:Can I point you to one of HG's online tournaments:

http://www.talkchess.com/forum/viewtopi ... =&start=10

Look at the engine rankings, then look at the "Performance" column, which is the Elo calculation of the engines just on the games played in that tournament. Note that WaDuttie, in 8th place with 5 points, has a higher Elo performance rating than Telepath in 4th with 5.5 points. That directly contradicts what Larry and Miguel have been saying, yes ? Or have I misunderstood. I don't know what tool is used for those ratings.

Code: Select all

:Tourney Players: Round 9 of 9 
: 
:     Name              Rating Score Perfrm Upset  Results 
:     ----------------- ------ ----- ------ ------ ------- 
:  1 -NightmareX        [1958]  8.0  [2133] [ 162] +05w +18w +03b =02w +12b +08w +11b =09b +07w 
:  2 +Texel             [2163]  7.5  [2049] [   0] +19w =12w +06b =01b +08w +03w =07b +11w +09b 
:  3 +Goldbar           [1891]  6.0  [1980] [ 155] +13w +17b -01w +07b =11w -02b +06w =08w +12b 
:  4 +Telepath          [   0]  5.5  [1825] [9213] =07b -08w -11b +16w +20b +17w -09w +14b +19b 
:  5 -Fizbo             [   0]  5.5  [1791] [8908] -01b +20w +17w -08b -09b +19w +13w =06b +16w 
:  6 +Almere            [   0]  5.5  [1928] [9824] =08b +11w -02w +10b +07w =12w -03b =05w +20b 
:  7 +GaviotaRB         [2018]  5.0  [1850] [  72] =04w +14b +12w -03w -06b +13b =02w +15w -01b 
:  8 -WaDuuttie         [1947]  5.0  [1846] [   0] =06w +04b +09w +05w -02b -01b =12w =03b =11w 
:  9 +Baron             [   0]  5.0  [1869] [8714] =11b +15w -08b -12b +05w +14w +04b =01w -02w 
: 10 +NebiyuHG          [   0]  5.0  [1702] [8048] -12b +19b +18w -06w +14b -11w -15b +20w +17b 
: 11 -Rookie            [1783]  4.5  [1814] [ 136] =09w -06b +04w +13w =03b +10b -01w -02b =08b 
: 12 +ArasanX           [1727]  4.5  [1853] [ 328] +10w =02b -07b +09w -01w =06b =08b +17w -03w 
: 13 +RuyDos            [   0]  4.5  [1747] [7525] -03b +16w +14w -11b =17b -07w -05b +18b +15w 
: 14 +Joker             [1808]  4.0  [1617] [   0] +16w -07w -13b +19b -10w -09b +20b -04w +18w 
: 15 +PuppetMaster      [1686]  4.0  [1617] [   0] -17w -09b -16b +20w +19w +18b +10w -07b -13b 
: 16 +Spartacus         [   0]  3.5  [1551] [5491] -14b -13b +15w -04b =18w +20w -17b +19w -05b 
: 17 -Capivara          [   0]  3.5  [1599] [5717] +15b -03w -05b +18b =13w -04b +16w -12b -10w 
: 18 +microMax          [1519]  2.5  [1510] [ 234] +20w -01b -10b -17w =16b -15w +19b -13w -14b 
: 19 +Skipper           [1371]  1.0  [1416] [ 304] -02b -10w +20b -14w -15b -05b -18w -16b -04w 
: 20 +NumptyX           [   0]  0.0  [1239] [   0] -18b -05b -19w -15b -04w -16b -14w -10b -06w 
: 
:     Average Rating    1806.5 
: 
One assumption being used in the discussion is that the engines play a RR. Telepath and Waduuttie did not play the same opponents.
syzygy
Posts: 5693
Joined: Tue Feb 28, 2012 11:56 pm

Re: H4 or S5 !?

Post by syzygy »

Adam Hair wrote:One assumption being used in the discussion is that the engines play a RR. Telepath and Waduuttie did not play the same opponents.
Exactly.

In a chess tournament with 10 players that play a single or double RR, would anyone dispute that the player with the highest number of points at the end is the winner?

If the tournament is not RR, then obviously you need to resort to tricky calculations that take into account the strengths of the opponents you've played.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: H4 or S5 !?

Post by Dr.Wael Deeb »

syzygy wrote:
Adam Hair wrote:One assumption being used in the discussion is that the engines play a RR. Telepath and Waduuttie did not play the same opponents.
Exactly.

In a chess tournament with 10 players that play a single or double RR, would anyone dispute that the player with the highest number of points at the end is the winner?

If the tournament is not RR, then obviously you need to resort to tricky calculations that take into account the strengths of the opponents you've played.
RR is the best format in my opinion....
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: H4 or S5 !?

Post by michiguel »

IWB wrote:Hello all,

This is quite interesting:

The official method for the IPON is Bayeselo with mm 0 1, draw rate consideration. The pure TOP 16 one on one looks like this:

Code: Select all

   1 Houdini 4           3111    9    9  3300   75%  2921   31% 
   2 Stockfish 5         3106    9    8  3300   75%  2921   39% 
   3 Komodo 7a           3088    9    9  3300   72%  2922   37% 
   4 Gull 3              3057    8    8  3300   68%  2924   41% 
   5 Critter 1.4a        2980    8    8  3300   57%  2930   46% 
   6 Equinox 2.02        2975    8    8  3300   56%  2930   47% 
   7 Deep Rybka 4.1      2959    8    8  3300   54%  2931   45% 
   8 Deep Fritz 14       2894    8    8  3300   44%  2935   45% 
   9 Chiron 2            2889    8    8  3300   44%  2936   45% 
  10 Protector 1.6.0     2870    8    8  3300   41%  2937   44% 
  11 Hannibal 1.4b       2870    8    8  3300   41%  2937   43% 
  12 Naum 4.2            2838    8    9  3300   36%  2939   41% 
  13 Texel 1.04          2838    8    8  3300   37%  2939   38% 
  14 Senpai 1.0          2838    8    8  3300   36%  2939   41% 
  15 HIARCS 14 WCSC 32b  2812    9    9  3300   33%  2941   37% 
  16 Jonny 6.00          2798    9    9  3300   31%  2942   36%
The same set of data with Bayes default:

Code: Select all

   1 Houdini 4           3111   11   11  3300   75%  2931   31% 
   2 Stockfish 5         3105   10   10  3300   75%  2931   39% 
   3 Komodo 7a           3088   10   10  3300   72%  2932   37% 
   4 Gull 3              3057   10   10  3300   68%  2934   41% 
   5 Critter 1.4a        2984   10    9  3300   57%  2939   46% 
   6 Equinox 2.02        2980    9   10  3300   56%  2939   47% 
   7 Deep Rybka 4.1      2964   10   10  3300   54%  2940   45% 
   8 Deep Fritz 14       2905    9   10  3300   44%  2944   45% 
   9 Chiron 2            2900   10   10  3300   44%  2945   45% 
  10 Protector 1.6.0     2883   10   10  3300   41%  2946   44% 
  11 Hannibal 1.4b       2883   10   10  3300   41%  2946   43% 
  12 Naum 4.2            2854   10   10  3300   36%  2948   41% 
  13 Texel 1.04          2854   10   10  3300   37%  2948   38% 
  14 Senpai 1.0          2853   10   10  3300   36%  2948   41% 
  15 HIARCS 14 WCSC 32b  2830   10   10  3300   33%  2949   37% 
  16 Jonny 6.00          2816   10   10  3300   31%  2950   36%
Now with Elostat:

Code: Select all

  1 Stockfish 5                    : 3115   10  10  3300    74.9 %   2924   38.6 %
  2 Houdini 4                      : 3111   11  10  3300    74.5 %   2925   30.7 %
  3 Komodo 7a                      : 3091   10  10  3300    72.1 %   2926   37.0 %
  4 Gull 3                         : 3059    9   9  3300    68.0 %   2928   41.0 %
  5 Critter 1.4a                   : 2982    9   9  3300    57.0 %   2933   46.1 %
  6 Equinox 2.02                   : 2978    9   9  3300    56.3 %   2933   46.9 %
  7 Deep Rybka 4.1                 : 2962    9   9  3300    53.9 %   2935   45.2 %
  8 Deep Fritz 14                  : 2899    9   9  3300    44.4 %   2939   44.9 %
  9 Chiron 2                       : 2894    9   9  3300    43.5 %   2939   45.1 %
 10 Protector 1.6.0                : 2877    9   9  3300    40.9 %   2940   44.1 %
 11 Hannibal 1.4b                  : 2875    9   9  3300    40.7 %   2940   42.6 %
 12 Texel 1.04                     : 2846    9   9  3300    36.5 %   2942   38.5 %
 13 Naum 4.2                       : 2845    9   9  3300    36.4 %   2942   40.9 %
 14 Senpai 1.0                     : 2845    9   9  3300    36.3 %   2942   40.7 %
 15 HIARCS 14 WCSC 32b             : 2822   10  10  3300    33.2 %   2944   37.5 %
 16 Jonny 6.00                     : 2808   10  10  3300    31.2 %   2945   35.7 %
and finaly with ORDO:

Code: Select all

   # PLAYER                : RATING    POINTS  PLAYED    (%)
   1 Stockfish 5           : 3115.1    2473.0    3300   74.9%
   2 Houdini 4             : 3111.0    2458.5    3300   74.5%
   3 Komodo 7a             : 3089.3    2379.0    3300   72.1%
   4 Gull 3                : 3054.9    2245.5    3300   68.0%
   5 Critter 1.4a          : 2968.9    1882.0    3300   57.0%
   6 Equinox 2.02          : 2963.8    1859.5    3300   56.3%
   7 Deep Rybka 4.1        : 2945.6    1778.5    3300   53.9%
   8 Deep Fritz 14         : 2875.7    1464.5    3300   44.4%
   9 Chiron 2              : 2869.4    1436.5    3300   43.5%
  10 Protector 1.6.0       : 2850.1    1351.0    3300   40.9%
  11 Hannibal 1.4b         : 2848.3    1343.0    3300   40.7%
  12 Texel 1.04            : 2816.4    1204.5    3300   36.5%
  13 Naum 4.2              : 2815.5    1200.5    3300   36.4%
  14 Senpai 1.0            : 2814.9    1198.0    3300   36.3%
  15 HIARCS 14 WCSC 32b    : 2790.6    1096.0    3300   33.2%
  16 Jonny 6.00            : 2774.4    1030.0    3300   31.2%
That is very good, as everyone can take the list he likes :-)

Regards
Ingo
I took the results that ended in a draw and save them into a separate file. Then, I pasted those results into the original file (that means that all draw will be present twice). Then, I ran ordo with that file (td.pgn) and I reproduced exactly what BayesELO had. Except that I had to expand the scale a bit (with -z243.5, default is 202 for 76% winning expectancy), since counting the draws twice contracted the scale.

ordo -p td.pgn -a3111.5 -A"Houdini 4" -W -z243.5

Code: Select all

   # PLAYER                : RATING    POINTS  PLAYED    (%)
   1 Houdini 4             : 3111.5    2965.0    4313   68.7%
   2 Stockfish 5           : 3104.9    3110.0    4574   68.0%
   3 Komodo 7a             : 3087.5    2990.0    4522   66.1%
   4 Gull 3                : 3055.8    2922.0    4653   62.8%
   5 Critter 1.4a          : 2982.7    2642.0    4820   54.8%
   6 Equinox 2.02          : 2978.4    2633.0    4847   54.3%
   7 Deep Rybka 4.1        : 2962.8    2524.0    4791   52.7%
   8 Deep Fritz 14         : 2903.8    2206.0    4783   46.1%
   9 Chiron 2              : 2899.7    2181.0    4789   45.5%
  10 Hannibal 1.4b         : 2882.6    2046.0    4706   43.5%
  11 Protector 1.6.0       : 2882.6    2078.0    4754   43.7%
  12 Naum 4.2              : 2853.6    1875.0    4649   40.3%
  13 Texel 1.04            : 2853.1    1839.0    4569   40.2%
  14 Senpai 1.0            : 2852.8    1870.0    4644   40.3%
  15 HIARCS 14 WCSC 32b    : 2830.0    1714.0    4536   37.8%
  16 Jonny 6.00            : 2815.9    1619.0    4478   36.2%
So, the reason for the discrepancy in the ranking order is exactly that: BE counts the draws twice, Ordo once.

Miguel
carldaman
Posts: 2284
Joined: Sat Jun 02, 2012 2:13 am

Re: H4 or S5 !?

Post by carldaman »

michiguel wrote:
IWB wrote:Hello all,

This is quite interesting:

The official method for the IPON is Bayeselo with mm 0 1, draw rate consideration. The pure TOP 16 one on one looks like this:

Code: Select all

   1 Houdini 4           3111    9    9  3300   75%  2921   31% 
   2 Stockfish 5         3106    9    8  3300   75%  2921   39% 
   3 Komodo 7a           3088    9    9  3300   72%  2922   37% 
   4 Gull 3              3057    8    8  3300   68%  2924   41% 
   5 Critter 1.4a        2980    8    8  3300   57%  2930   46% 
   6 Equinox 2.02        2975    8    8  3300   56%  2930   47% 
   7 Deep Rybka 4.1      2959    8    8  3300   54%  2931   45% 
   8 Deep Fritz 14       2894    8    8  3300   44%  2935   45% 
   9 Chiron 2            2889    8    8  3300   44%  2936   45% 
  10 Protector 1.6.0     2870    8    8  3300   41%  2937   44% 
  11 Hannibal 1.4b       2870    8    8  3300   41%  2937   43% 
  12 Naum 4.2            2838    8    9  3300   36%  2939   41% 
  13 Texel 1.04          2838    8    8  3300   37%  2939   38% 
  14 Senpai 1.0          2838    8    8  3300   36%  2939   41% 
  15 HIARCS 14 WCSC 32b  2812    9    9  3300   33%  2941   37% 
  16 Jonny 6.00          2798    9    9  3300   31%  2942   36%
The same set of data with Bayes default:

Code: Select all

   1 Houdini 4           3111   11   11  3300   75%  2931   31% 
   2 Stockfish 5         3105   10   10  3300   75%  2931   39% 
   3 Komodo 7a           3088   10   10  3300   72%  2932   37% 
   4 Gull 3              3057   10   10  3300   68%  2934   41% 
   5 Critter 1.4a        2984   10    9  3300   57%  2939   46% 
   6 Equinox 2.02        2980    9   10  3300   56%  2939   47% 
   7 Deep Rybka 4.1      2964   10   10  3300   54%  2940   45% 
   8 Deep Fritz 14       2905    9   10  3300   44%  2944   45% 
   9 Chiron 2            2900   10   10  3300   44%  2945   45% 
  10 Protector 1.6.0     2883   10   10  3300   41%  2946   44% 
  11 Hannibal 1.4b       2883   10   10  3300   41%  2946   43% 
  12 Naum 4.2            2854   10   10  3300   36%  2948   41% 
  13 Texel 1.04          2854   10   10  3300   37%  2948   38% 
  14 Senpai 1.0          2853   10   10  3300   36%  2948   41% 
  15 HIARCS 14 WCSC 32b  2830   10   10  3300   33%  2949   37% 
  16 Jonny 6.00          2816   10   10  3300   31%  2950   36%
Now with Elostat:

Code: Select all

  1 Stockfish 5                    : 3115   10  10  3300    74.9 %   2924   38.6 %
  2 Houdini 4                      : 3111   11  10  3300    74.5 %   2925   30.7 %
  3 Komodo 7a                      : 3091   10  10  3300    72.1 %   2926   37.0 %
  4 Gull 3                         : 3059    9   9  3300    68.0 %   2928   41.0 %
  5 Critter 1.4a                   : 2982    9   9  3300    57.0 %   2933   46.1 %
  6 Equinox 2.02                   : 2978    9   9  3300    56.3 %   2933   46.9 %
  7 Deep Rybka 4.1                 : 2962    9   9  3300    53.9 %   2935   45.2 %
  8 Deep Fritz 14                  : 2899    9   9  3300    44.4 %   2939   44.9 %
  9 Chiron 2                       : 2894    9   9  3300    43.5 %   2939   45.1 %
 10 Protector 1.6.0                : 2877    9   9  3300    40.9 %   2940   44.1 %
 11 Hannibal 1.4b                  : 2875    9   9  3300    40.7 %   2940   42.6 %
 12 Texel 1.04                     : 2846    9   9  3300    36.5 %   2942   38.5 %
 13 Naum 4.2                       : 2845    9   9  3300    36.4 %   2942   40.9 %
 14 Senpai 1.0                     : 2845    9   9  3300    36.3 %   2942   40.7 %
 15 HIARCS 14 WCSC 32b             : 2822   10  10  3300    33.2 %   2944   37.5 %
 16 Jonny 6.00                     : 2808   10  10  3300    31.2 %   2945   35.7 %
and finaly with ORDO:

Code: Select all

   # PLAYER                : RATING    POINTS  PLAYED    (%)
   1 Stockfish 5           : 3115.1    2473.0    3300   74.9%
   2 Houdini 4             : 3111.0    2458.5    3300   74.5%
   3 Komodo 7a             : 3089.3    2379.0    3300   72.1%
   4 Gull 3                : 3054.9    2245.5    3300   68.0%
   5 Critter 1.4a          : 2968.9    1882.0    3300   57.0%
   6 Equinox 2.02          : 2963.8    1859.5    3300   56.3%
   7 Deep Rybka 4.1        : 2945.6    1778.5    3300   53.9%
   8 Deep Fritz 14         : 2875.7    1464.5    3300   44.4%
   9 Chiron 2              : 2869.4    1436.5    3300   43.5%
  10 Protector 1.6.0       : 2850.1    1351.0    3300   40.9%
  11 Hannibal 1.4b         : 2848.3    1343.0    3300   40.7%
  12 Texel 1.04            : 2816.4    1204.5    3300   36.5%
  13 Naum 4.2              : 2815.5    1200.5    3300   36.4%
  14 Senpai 1.0            : 2814.9    1198.0    3300   36.3%
  15 HIARCS 14 WCSC 32b    : 2790.6    1096.0    3300   33.2%
  16 Jonny 6.00            : 2774.4    1030.0    3300   31.2%
That is very good, as everyone can take the list he likes :-)

Regards
Ingo
I took the results that ended in a draw and save them into a separate file. Then, I pasted those results into the original file (that means that all draw will be present twice). Then, I ran ordo with that file (td.pgn) and I reproduced exactly what BayesELO had. Except that I had to expand the scale a bit (with -z243.5, default is 202 for 76% winning expectancy), since counting the draws twice contracted the scale.

ordo -p td.pgn -a3111.5 -A"Houdini 4" -W -z243.5

Code: Select all

   # PLAYER                : RATING    POINTS  PLAYED    (%)
   1 Houdini 4             : 3111.5    2965.0    4313   68.7%
   2 Stockfish 5           : 3104.9    3110.0    4574   68.0%
   3 Komodo 7a             : 3087.5    2990.0    4522   66.1%
   4 Gull 3                : 3055.8    2922.0    4653   62.8%
   5 Critter 1.4a          : 2982.7    2642.0    4820   54.8%
   6 Equinox 2.02          : 2978.4    2633.0    4847   54.3%
   7 Deep Rybka 4.1        : 2962.8    2524.0    4791   52.7%
   8 Deep Fritz 14         : 2903.8    2206.0    4783   46.1%
   9 Chiron 2              : 2899.7    2181.0    4789   45.5%
  10 Hannibal 1.4b         : 2882.6    2046.0    4706   43.5%
  11 Protector 1.6.0       : 2882.6    2078.0    4754   43.7%
  12 Naum 4.2              : 2853.6    1875.0    4649   40.3%
  13 Texel 1.04            : 2853.1    1839.0    4569   40.2%
  14 Senpai 1.0            : 2852.8    1870.0    4644   40.3%
  15 HIARCS 14 WCSC 32b    : 2830.0    1714.0    4536   37.8%
  16 Jonny 6.00            : 2815.9    1619.0    4478   36.2%
So, the reason for the discrepancy in the ranking order is exactly that: BE counts the draws twice, Ordo once.

Miguel
Rating lists should be as free of controversy as possible. It seems that BE is controversial and Ordo is less so. Maybe Ordo is better suited for rating lists.

Regards,
CL