H4 or S5 !?

IGarcia · Post by **IGarcia** » Tue Jun 03, 2014 3:12 pm

michiguel wrote:

IWB wrote:
lkaufman wrote:... I think that the decision was correct, but that now you (and CCRL and CEGT) should switch to Ordo. ...
My biggest problem with ORDO is the missing error bar. I like the concept of uncertanty and these absolut values (with decimals?) somehow look too precise.

Is there a way to switch on an error bar in ORDO? I did not check so maybe ...

BYe
Ingo

ordo -W -p TOPRES.pgn -a2800 -s1000

where

-W automatic white advantage
-a2800 average set to 2800
-s1000 simulate ranking 1000 times to calc standard deviations

Each engine's error is the error relative to the average of the pool.

Code: Select all

   # PLAYER                : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish 5           : 2996.1    9.3   2473.0    3300   74.9%
   2 Houdini 4             : 2992.0    9.2   2458.5    3300   74.5%
   3 Komodo 7a             : 2970.4    9.3   2379.0    3300   72.1%
   4 Gull 3                : 2935.9    8.9   2245.5    3300   68.0%
   5 Critter 1.4a          : 2849.9    8.3   1882.0    3300   57.0%
   6 Equinox 2.02          : 2844.8    8.3   1859.5    3300   56.3%
   7 Deep Rybka 4.1        : 2826.6    8.2   1778.5    3300   53.9%
   8 Deep Fritz 14         : 2756.7    8.2   1464.5    3300   44.4%
   9 Chiron 2              : 2750.4    8.4   1436.5    3300   43.5%
  10 Protector 1.6.0       : 2731.1    8.3   1351.0    3300   40.9%
  11 Hannibal 1.4b         : 2729.3    8.1   1343.0    3300   40.7%
  12 Texel 1.04            : 2697.4    8.7   1204.5    3300   36.5%
  13 Naum 4.2              : 2696.5    8.6   1200.5    3300   36.4%
  14 Senpai 1.0            : 2695.9    8.6   1198.0    3300   36.3%
  15 HIARCS 14 WCSC 32b    : 2671.6    8.9   1096.0    3300   33.2%
  16 Jonny 6.00            : 2655.4    8.6   1030.0    3300   31.2%

or

ordo -p TOPRES.pgn -a3100 -A "Stockfish 5" -W -s1000

Where Stockfish is fixed to 3100, so it will have no error for that reason.
Then, each engines error is the error relative to SF. Of course, errors are bigger (they now implicitly include SF error too). In the previous example, each engines error is the error relative to the average of the pool.
This is better if you want to compare one engine to the rest.

Code: Select all

   # PLAYER                : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish 5           : 3100.0   ----   2473.0    3300   74.9%
   2 Houdini 4             : 3095.9   13.5   2458.5    3300   74.5%
   3 Komodo 7a             : 3074.3   13.0   2379.0    3300   72.1%
   4 Gull 3                : 3039.8   12.9   2245.5    3300   68.0%
   5 Critter 1.4a          : 2953.8   13.0   1882.0    3300   57.0%
   6 Equinox 2.02          : 2948.7   12.8   1859.5    3300   56.3%
   7 Deep Rybka 4.1        : 2930.5   12.6   1778.5    3300   53.9%
   8 Deep Fritz 14         : 2860.6   12.5   1464.5    3300   44.4%
   9 Chiron 2              : 2854.3   13.4   1436.5    3300   43.5%
  10 Protector 1.6.0       : 2835.1   13.1   1351.0    3300   40.9%
  11 Hannibal 1.4b         : 2833.2   12.6   1343.0    3300   40.7%
  12 Texel 1.04            : 2801.3   13.1   1204.5    3300   36.5%
  13 Naum 4.2              : 2800.4   13.5   1200.5    3300   36.4%
  14 Senpai 1.0            : 2799.8   13.4   1198.0    3300   36.3%
  15 HIARCS 14 WCSC 32b    : 2775.5   13.8   1096.0    3300   33.2%
  16 Jonny 6.00            : 2759.4   13.4   1030.0    3300   31.2%

You can also save a matrix or errors (each engine against each other) with the switch -e.

help with ordo

Code: Select all

quick example: ordo -a 2500 -p input.pgn -o output.txt
  - Processes input.pgn (PGN file) to calculate ratings to output.txt.
  - The general pool will have an average of 2500

usage: ordo [-OPTION]
 -h          print this help
 -H          print just the switches
 -v          print version number and exit
 -L          display the license information
 -q          quiet mode (no screen progress updates)
 -a <avg>    set rating for the pool average
 -A <player> anchor: rating given by '-a' is fixed for <player>, if provided
 -m <file>   multiple anchors: file contains rows of "AnchorName",AnchorRating
 -w <value>  white advantage value (default=0.0)
 -W          white advantage, automatically adjusted
 -z <value>  scaling: set rating for winning expectancy of 76% (default=202)
 -T          display winning expectancy table
 -p <file>   input file in PGN format
 -c <file>   output file (comma separated value format)
 -o <file>   output file (text format), goes to the screen if not present
 -g <file>   output file with group connection info (no rating output on screen)
 -s  #       perform # simulations to calculate errors
 -e <file>   saves an error matrix, if -s was used
 -F <value>  confidence (%) to estimate error margins. Default is 95.0

ORDO, impresionante. Que CAPO! Miguel.

clasical RTFM case

IWB · Post by **IWB** » Tue Jun 03, 2014 8:39 pm

IWB wrote: ..., as I am curious I will run the S5 match again with 4pc SYSYSY bases....

Test is running. The original setup had this:

Code: Select all

     Stockfish 5              3080.0 (2297.0 : 783.0)
                              220.0 (127.5 :  92.5) Houdini 4           3111
                              220.0 (121.5 :  98.5) Komodo 7a           3088
                              220.0 (134.0 :  86.0) Gull 3              3057
                              220.0 (150.5 :  69.5) Critter 1.4a        2980
                              220.0 (149.0 :  71.0) Equinox 2.02        2975
                              220.0 (159.5 :  60.5) Deep Rybka 4.1      2959
                              220.0 (176.0 :  44.0) Deep Fritz 14       2894
                              220.0 (170.5 :  49.5) Chiron 2            2889
                              220.0 (181.0 :  39.0) Protector 1.6.0     2870
                              220.0 (168.0 :  52.0) Hannibal 1.4b       2870
                              220.0 (183.0 :  37.0) Naum 4.2            2838
                              220.0 (187.0 :  33.0) Texel 1.04          2838
                              220.0 (187.0 :  33.0) Senpai 1.0          2838
                              220.0 (188.0 :  32.0) HIARCS 14 WCSC 32b  2812
                              220.0 (190.5 :  29.5) Jonny 6.00          2798

So 74.58% have to be beaten!

Bye
Ingo

IWB · Post by **IWB** » Tue Jun 03, 2014 8:42 pm

michiguel wrote:
ordo -p TOPRES.pgn -a3100 -A "Stockfish 5" -W -s1000

Where Stockfish is fixed to 3100, so it will have no error for that reason.
Then, each engines error is the error relative to SF. Of course, errors are bigger (they now implicitly include SF error too). In the previous example, each engines error is the error relative to the average of the pool.
This is better if you want to compare one engine to the rest.

Code: Select all

   # PLAYER                : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish 5           : 3100.0   ----   2473.0    3300   74.9%
   2 Houdini 4             : 3095.9   13.5   2458.5    3300   74.5%
   3 Komodo 7a             : 3074.3   13.0   2379.0    3300   72.1%
   4 Gull 3                : 3039.8   12.9   2245.5    3300   68.0%
   5 Critter 1.4a          : 2953.8   13.0   1882.0    3300   57.0%
   6 Equinox 2.02          : 2948.7   12.8   1859.5    3300   56.3%
   7 Deep Rybka 4.1        : 2930.5   12.6   1778.5    3300   53.9%
   8 Deep Fritz 14         : 2860.6   12.5   1464.5    3300   44.4%
   9 Chiron 2              : 2854.3   13.4   1436.5    3300   43.5%
  10 Protector 1.6.0       : 2835.1   13.1   1351.0    3300   40.9%
  11 Hannibal 1.4b         : 2833.2   12.6   1343.0    3300   40.7%
  12 Texel 1.04            : 2801.3   13.1   1204.5    3300   36.5%
  13 Naum 4.2              : 2800.4   13.5   1200.5    3300   36.4%
  14 Senpai 1.0            : 2799.8   13.4   1198.0    3300   36.3%
  15 HIARCS 14 WCSC 32b    : 2775.5   13.8   1096.0    3300   33.2%
  16 Jonny 6.00            : 2759.4   13.4   1030.0    3300   31.2%

Ahh thx!
That would be the most interesting variation!

Reg
Ingo

Laskos · Post by **Laskos** » Tue Jun 03, 2014 9:43 pm

michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel

Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A

and total points in RR are A>B>C, then Elo ratings are always also A>B>C?

Vinvin · Post by **Vinvin** » Tue Jun 03, 2014 10:24 pm

Laskos wrote:
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A

and total points in RR are A>B>C, then Elo ratings are always also A>B>C?

That seems logical !
Numerical example ?
A vs B : 56-44
B vs C : 55-45
C vs A : 51-49

A : 105 ; B : 99 ; C : 96

Modern Times · Post by **Modern Times** » Tue Jun 03, 2014 10:31 pm

Laskos wrote: Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A

and total points in RR are A>B>C, then Elo ratings are always also A>B>C?

I am also not totally convinced of this.

Laskos · Post by **Laskos** » Tue Jun 03, 2014 10:34 pm

Vinvin wrote:
Laskos wrote:
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A

and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
That seems logical !
Numerical example ?
A vs B : 56-44
B vs C : 55-45
C vs A : 51-49

A : 105 ; B : 99 ; C : 96

I would need a proof and Elo points, not a numerical example. And maybe an extension to more than 3 engines. I don't have a feeling for extreme cases, say what would happen with 99:1 to 2:98 (unlikely) non-transitive cases.
EDIT: Also, 3 engines is somewhat symmetric, but what about 4 engines:

Direct matches RR:
A>B
A>C
A>D
B>C
B>D
C>A

Total points in RR A>B>C>D. That necessarily gives Elo ratings A>B>C>D?

Vinvin · Post by **Vinvin** » Tue Jun 03, 2014 10:47 pm

Laskos wrote:
Vinvin wrote:
Laskos wrote:
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A

and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
That seems logical !
Numerical example ?
A vs B : 56-44
B vs C : 55-45
C vs A : 51-49

A : 105 ; B : 99 ; C : 96
I would need a proof and Elo points, not a numerical example. And maybe an extension to more than 3 engines. I don't have a feeling for extreme cases, say what would happen with 99:1 to 2:98 (unlikely) non-transitive cases.

Your last inequality is transitive "A>B>C".
I don't really get your point.

Laskos · Post by **Laskos** » Tue Jun 03, 2014 10:50 pm

Vinvin wrote:
Laskos wrote:
Vinvin wrote:
Laskos wrote:
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A

and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
That seems logical !
Numerical example ?
A vs B : 56-44
B vs C : 55-45
C vs A : 51-49

A : 105 ; B : 99 ; C : 96
I would need a proof and Elo points, not a numerical example. And maybe an extension to more than 3 engines. I don't have a feeling for extreme cases, say what would happen with 99:1 to 2:98 (unlikely) non-transitive cases.
Your last inequality is transitive "A>B>C".
I don't really get your point.

Non-transitive in direct matches, I think that was clear.

CORRECTED EDIT from previous post:

Direct matches RR:
A>B
A>C
B>C
B>D
C>D
D>A

Total points in RR are A>B>C>D. That necessarily gives Elo ratings A>B>C>D?

michiguel · Post by **michiguel** » Tue Jun 03, 2014 11:21 pm

Laskos wrote:
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.

1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%

Miguel
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A

and total points in RR are A>B>C, then Elo ratings are always also A>B>C?

It can be demonstrated as long as certain assumptions are respected, like two draws equal one win + loss. But, I am not sure I can do it elegantly or understandably by quickly typing here.

For instance, let's try a reductio ad absurdum. Let's assume that the elo is EloB > EloA > EloC. In that case, A face stronger schedule than B. Both faced C, but the head to head match was tougher for A (because EloB > EloA). Consequently, A face a tougher schedule and got more points, which means it should have a higher elo. But, that contradicts the initial assumption EloB > EloA > EloC, disproving it. If you keep doing this analysis, you will see that the only reasonable scenario is EloA > EloB > EloC.

Miguel

H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?

Re: H4 or S5 !?