lkaufman wrote:... I think that the decision was correct, but that now you (and CCRL and CEGT) should switch to Ordo. ...
My biggest problem with ORDO is the missing error bar. I like the concept of uncertanty and these absolut values (with decimals?) somehow look too precise.
Is there a way to switch on an error bar in ORDO? I did not check so maybe ...
BYe
Ingo
ordo -W -p TOPRES.pgn -a2800 -s1000
where
-W automatic white advantage
-a2800 average set to 2800
-s1000 simulate ranking 1000 times to calc standard deviations
Each engine's error is the error relative to the average of the pool.
or
ordo -p TOPRES.pgn -a3100 -A "Stockfish 5" -W -s1000
Where Stockfish is fixed to 3100, so it will have no error for that reason.
Then, each engines error is the error relative to SF. Of course, errors are bigger (they now implicitly include SF error too). In the previous example, each engines error is the error relative to the average of the pool.
This is better if you want to compare one engine to the rest.
quick example: ordo -a 2500 -p input.pgn -o output.txt
- Processes input.pgn (PGN file) to calculate ratings to output.txt.
- The general pool will have an average of 2500
usage: ordo [-OPTION]
-h print this help
-H print just the switches
-v print version number and exit
-L display the license information
-q quiet mode (no screen progress updates)
-a <avg> set rating for the pool average
-A <player> anchor: rating given by '-a' is fixed for <player>, if provided
-m <file> multiple anchors: file contains rows of "AnchorName",AnchorRating
-w <value> white advantage value (default=0.0)
-W white advantage, automatically adjusted
-z <value> scaling: set rating for winning expectancy of 76% (default=202)
-T display winning expectancy table
-p <file> input file in PGN format
-c <file> output file (comma separated value format)
-o <file> output file (text format), goes to the screen if not present
-g <file> output file with group connection info (no rating output on screen)
-s # perform # simulations to calculate errors
-e <file> saves an error matrix, if -s was used
-F <value> confidence (%) to estimate error margins. Default is 95.0
michiguel wrote:
ordo -p TOPRES.pgn -a3100 -A "Stockfish 5" -W -s1000
Where Stockfish is fixed to 3100, so it will have no error for that reason.
Then, each engines error is the error relative to SF. Of course, errors are bigger (they now implicitly include SF error too). In the previous example, each engines error is the error relative to the average of the pool.
This is better if you want to compare one engine to the rest.
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A
and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
That seems logical !
Numerical example ?
A vs B : 56-44
B vs C : 55-45
C vs A : 51-49
A : 105 ; B : 99 ; C : 96
I would need a proof and Elo points, not a numerical example. And maybe an extension to more than 3 engines. I don't have a feeling for extreme cases, say what would happen with 99:1 to 2:98 (unlikely) non-transitive cases.
EDIT: Also, 3 engines is somewhat symmetric, but what about 4 engines:
Direct matches RR:
A>B
A>C
A>D
B>C
B>D
C>A
Total points in RR A>B>C>D. That necessarily gives Elo ratings A>B>C>D?
Last edited by Laskos on Tue Jun 03, 2014 10:49 pm, edited 1 time in total.
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A
and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
That seems logical !
Numerical example ?
A vs B : 56-44
B vs C : 55-45
C vs A : 51-49
A : 105 ; B : 99 ; C : 96
I would need a proof and Elo points, not a numerical example. And maybe an extension to more than 3 engines. I don't have a feeling for extreme cases, say what would happen with 99:1 to 2:98 (unlikely) non-transitive cases.
Your last inequality is transitive "A>B>C".
I don't really get your point.
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A
and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
That seems logical !
Numerical example ?
A vs B : 56-44
B vs C : 55-45
C vs A : 51-49
A : 105 ; B : 99 ; C : 96
I would need a proof and Elo points, not a numerical example. And maybe an extension to more than 3 engines. I don't have a feeling for extreme cases, say what would happen with 99:1 to 2:98 (unlikely) non-transitive cases.
Your last inequality is transitive "A>B>C".
I don't really get your point.
Non-transitive in direct matches, I think that was clear.
CORRECTED EDIT from previous post:
Direct matches RR:
A>B
A>C
B>C
B>D
C>D
D>A
Total points in RR are A>B>C>D. That necessarily gives Elo ratings A>B>C>D?
michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.
Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:
Direct matches in RR
A>B
B>C
C>A
and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
It can be demonstrated as long as certain assumptions are respected, like two draws equal one win + loss. But, I am not sure I can do it elegantly or understandably by quickly typing here.
For instance, let's try a reductio ad absurdum. Let's assume that the elo is EloB > EloA > EloC. In that case, A face stronger schedule than B. Both faced C, but the head to head match was tougher for A (because EloB > EloA). Consequently, A face a tougher schedule and got more points, which means it should have a higher elo. But, that contradicts the initial assumption EloB > EloA > EloC, disproving it. If you keep doing this analysis, you will see that the only reasonable scenario is EloA > EloB > EloC.