Ordo 0.9.6

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
michiguel
Posts: 6386
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Ordo 0.9.6

Post by michiguel » Wed Sep 10, 2014 4:34 pm

https://sites.google.com/site/gaviotach ... e/releases

Finally, I merged the experimental branch with "loose" anchors and relative ones, mentioned a while ago.
http://www.talkchess.com/forum/viewtopi ... o+approach

In addition, now the user can round the output number (switch -N)

Switches added:

-N <value> Output, number of decimals, minimum is 0 (default=1)
-y <file> loose anchors: file contains rows of "Player",Rating,Uncertainty
-r <file> relations: rows of "PlayerA","PlayerB",delta_rating,uncertainty
-u <value> white advantage uncertainty value (default=0.0)
-k <value> draw rate uncertainty value % (default=0.0 %)

Excerpts from the readme file:

Loose anchors with prior information

Ordo offers an alternative approach to calculate ratings with previous knowledge from the user (using Bayesian concepts). With the switch -y, the user can provide a file with a list of players whose ratings will float around an estimated value. Those players will work as loose anchors in the list. This strategy is useful when the data is scarce and, as a consequence, wild swings could appear in the ratings. This is what happens at the beginning of a new rating list or tournament. Ordo accepts an estimated rating for a player, but takes into account how uncertain that value is. In other words, the user also has to provide the standard error for the estimated value.... etc

Relative anchors

Another problem in some engine tournaments is that version upgrades enter with no previous ratings. However, we know in certain situations that the new versions cannot have very different ratings from the previous one. Therefore, the user can make a good educated guess about the rating of the new version.... etc

Latest readme file

Miguel

Milos
Posts: 3383
Joined: Wed Nov 25, 2009 12:47 am

Re: Ordo 0.9.6

Post by Milos » Wed Sep 10, 2014 11:19 pm

michiguel wrote:https://sites.google.com/site/gaviotach ... e/releases

Finally, I merged the experimental branch with "loose" anchors and relative ones, mentioned a while ago.
http://www.talkchess.com/forum/viewtopi ... o+approach

In addition, now the user can round the output number (switch -N)

Switches added:

-N <value> Output, number of decimals, minimum is 0 (default=1)
-y <file> loose anchors: file contains rows of "Player",Rating,Uncertainty
-r <file> relations: rows of "PlayerA","PlayerB",delta_rating,uncertainty
-u <value> white advantage uncertainty value (default=0.0)
-k <value> draw rate uncertainty value % (default=0.0 %)

Excerpts from the readme file:

Loose anchors with prior information

Ordo offers an alternative approach to calculate ratings with previous knowledge from the user (using Bayesian concepts). With the switch -y, the user can provide a file with a list of players whose ratings will float around an estimated value. Those players will work as loose anchors in the list. This strategy is useful when the data is scarce and, as a consequence, wild swings could appear in the ratings. This is what happens at the beginning of a new rating list or tournament. Ordo accepts an estimated rating for a player, but takes into account how uncertain that value is. In other words, the user also has to provide the standard error for the estimated value.... etc
I've tested it. And while standard rating calculation is almost correct and error bar calculation is close (but not totally accurate), loose anchors things is quite flawed (as I pointed long time ago).
Here is an example, a pgn file between 2 opponents contains result +46/=52/-22.
Ordo gives 71Elo difference with 22.7Elo error bars (with -s10000 switch).
Real Elo difference is 70Elo and error bars are 23.2Elo.
Now I've added loose anchor file with 147Elo difference and 24.2Elo error bars (which is equivalent to a match with result +68/=32/-20).
And after running with -y switch Ordo gives 95Elo difference and 15.8Elo error bars.

Real values can be calculated by adding these 2 matches together which gives result +114/=84/-42 that is equivalent to 108Elo difference and 16.9Elo error bars.

So the error in calculation is 13Elo almost as big as the error margin itself.

This is just a small proof that your "loose anchors" calculation is pretty crappy.

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Ordo 0.9.6

Post by Adam Hair » Fri Sep 12, 2014 3:41 am

Milos wrote:
michiguel wrote:https://sites.google.com/site/gaviotach ... e/releases

Finally, I merged the experimental branch with "loose" anchors and relative ones, mentioned a while ago.
http://www.talkchess.com/forum/viewtopi ... o+approach

In addition, now the user can round the output number (switch -N)

Switches added:

-N <value> Output, number of decimals, minimum is 0 (default=1)
-y <file> loose anchors: file contains rows of "Player",Rating,Uncertainty
-r <file> relations: rows of "PlayerA","PlayerB",delta_rating,uncertainty
-u <value> white advantage uncertainty value (default=0.0)
-k <value> draw rate uncertainty value % (default=0.0 %)

Excerpts from the readme file:

Loose anchors with prior information

Ordo offers an alternative approach to calculate ratings with previous knowledge from the user (using Bayesian concepts). With the switch -y, the user can provide a file with a list of players whose ratings will float around an estimated value. Those players will work as loose anchors in the list. This strategy is useful when the data is scarce and, as a consequence, wild swings could appear in the ratings. This is what happens at the beginning of a new rating list or tournament. Ordo accepts an estimated rating for a player, but takes into account how uncertain that value is. In other words, the user also has to provide the standard error for the estimated value.... etc
I've tested it. And while standard rating calculation is almost correct and error bar calculation is close (but not totally accurate), loose anchors things is quite flawed (as I pointed long time ago).
Here is an example, a pgn file between 2 opponents contains result +46/=52/-22.
Ordo gives 71Elo difference with 22.7Elo error bars (with -s10000 switch).
Real Elo difference is 70Elo and error bars are 23.2Elo.
By default, Ordo uses a draw rate of 50%. To change this, you either have use -d [draw rate] or use -D to let Ordo calculate the draw rate. Also, the default scale for Ordo is set so that 76% corresponds with 202 Elo. In order to match the scale used in your calculation, you may need to adjust Ordo's scale by using the -z switch.
Milos wrote: Now I've added loose anchor file with 147Elo difference and 24.2Elo error bars (which is equivalent to a match with result +68/=32/-20).
And after running with -y switch Ordo gives 95Elo difference and 15.8Elo error bars.

Real values can be calculated by adding these 2 matches together which gives result +114/=84/-42 that is equivalent to 108Elo difference and 16.9Elo error bars.

So the error in calculation is 13Elo almost as big as the error margin itself.

This is just a small proof that your "loose anchors" calculation is pretty crappy.
I do believe that there is a problem with the loose anchors calculation at the moment, but I am pretty certain that your demonstration is faulty. Let's say that we use a prior that corresponds to +68/=32/-20. Then we run a match that ends +26/=92/-2. According to your logic, the posterior estimation of the difference should be 108 +/-15 even though the match gives us a more precise estimation than the prior. Does that seem right to you?

By the way, how are you calculating the error bars? They seem to be incorrect to me. I would be happy for you to show me that I am wrong.

User avatar
Ajedrecista
Posts: 1395
Joined: Wed Jul 13, 2011 7:04 pm
Location: Madrid, Spain.
Contact:

Re: Ordo 0.9.6.

Post by Ajedrecista » Fri Sep 12, 2014 8:00 am

Hello:

I am not an expert in rating calculations, but when Milos says 'standard rating calculation is almost correct' I think that it is because one Ordo point is not equivalent to one logistic Elo (like Celsius and Fahrenheit degrees), which Adam explains as 'the default scale for Ordo is set so that 76% corresponds with 202 Elo'.

Regarding error bars, I do not know how are you calculating them and with which confidence level, but I get completely different results. My upper and lower bounds are not symmetric (they are symmetric in the case of score = 50%) but almost (a few Elo of difference (in absolute value) in these examples with 95% confidence ~ 1.96-sigma confidence), so I report my own results giving an average error bar:

Code: Select all

95% confidence ~ 1.96-sigma confidence&#58;

+ 46 = 52 -22      70.44 ± 47.52
+ 68 = 32 -20     147.19 ± 57.02
+114 = 84 -42     107.54 ± 36.39
+ 26 = 92 - 2      70.44 ± 28.66
+ 94 =124 -22     107.54 ± 30.46
My tool calculates error bars with confidence level between 65% and 99.9% and your reported error bars are a little smaller than with 65% confidence ~ 0.9346-sigma confidence... according with my computations, which may not be perfect, although I have not seen such differences never. I use sample mean and sample standard deviation. Just search my name and 'error bars' in this forum and you will get tons of examples.

What confidence interval are you using?

------------

@Miguel and Adam: thank you very much for your effort!

Regards from Spain.

Ajedrecista.

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Ordo 0.9.6.

Post by Adam Hair » Fri Sep 12, 2014 10:10 am

Ajedrecista wrote:
@Miguel and Adam: thank you very much for your effort!

Regards from Spain.

Ajedrecista.
Hello Jesús :)

Thank you! Most of the effort is made by Miguel. But Ordo is a project that is interesting to both of us, and we hope that we can continue to improve it.

Adam

User avatar
michiguel
Posts: 6386
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Re: Ordo 0.9.6.

Post by michiguel » Sun Sep 14, 2014 5:28 am

Ajedrecista wrote:Hello:

I am not an expert in rating calculations, but when Milos says 'standard rating calculation is almost correct' I think that it is because one Ordo point is not equivalent to one logistic Elo (like Celsius and Fahrenheit degrees), which Adam explains as 'the default scale for Ordo is set so that 76% corresponds with 202 Elo'.

Regarding error bars, I do not know how are you calculating them and with which confidence level, but I get completely different results. My upper and lower bounds are not symmetric (they are symmetric in the case of score = 50%) but almost (a few Elo of difference (in absolute value) in these examples with 95% confidence ~ 1.96-sigma confidence), so I report my own results giving an average error bar:

Code: Select all

95% confidence ~ 1.96-sigma confidence&#58;

+ 46 = 52 -22      70.44 ± 47.52
+ 68 = 32 -20     147.19 ± 57.02
+114 = 84 -42     107.54 ± 36.39
+ 26 = 92 - 2      70.44 ± 28.66
+ 94 =124 -22     107.54 ± 30.46
My tool calculates error bars with confidence level between 65% and 99.9% and your reported error bars are a little smaller than with 65% confidence ~ 0.9346-sigma confidence... according with my computations, which may not be perfect, although I have not seen such differences never. I use sample mean and sample standard deviation. Just search my name and 'error bars' in this forum and you will get tons of examples.

What confidence interval are you using?

------------

@Miguel and Adam: thank you very much for your effort!

Regards from Spain.

Ajedrecista.
The sample s1.pgn I uploaded temporarily here
https://sites.google.com/site/gaviotachessengine/ordo

If I run it this way

ordo -p s1.pgn -a0 -A"B" -F95 -z200 -s10000 -D -q
which means
-a0 -A "B" --> anchor player "B" to zero, so only A will oscilate in order for the error to represent the "difference"
-F95 ---> confidence 95% (it is the default, just to make it explicit)
-z200 ---> forces 200 points (default is 202) to be 76% performance to exactly align the scale with yours, so the error will mean the same
-s10000 ----> (ten thousands simulations)
-D ---> calculate draw rate
-q (quiet)

I get

Code: Select all

   # PLAYER    &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 A    &#58;   70.4   46.5     72.0     120   60.0%
   2 B    &#58;    0.0   ----     48.0     120   40.0%
which is pretty close to your numbers.

Milos must have run it somewhat without fixing one engine and without the switch -D (which will assume 50% drawrate).

ordo -p s1.pgn -s10000 -q

Code: Select all

   # PLAYER    &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 A    &#58; 2335.5   22.3     72.0     120   60.0%
   2 B    &#58; 2264.5   22.3     48.0     120   40.0%
Which gives an error over the average of the pool (the error of the difference would be double).

Miguel

User avatar
michiguel
Posts: 6386
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Re: Ordo 0.9.6

Post by michiguel » Sun Sep 14, 2014 6:06 am

Adam Hair wrote:
Milos wrote:
michiguel wrote:https://sites.google.com/site/gaviotach ... e/releases

Finally, I merged the experimental branch with "loose" anchors and relative ones, mentioned a while ago.
http://www.talkchess.com/forum/viewtopi ... o+approach

In addition, now the user can round the output number (switch -N)

Switches added:

-N <value> Output, number of decimals, minimum is 0 (default=1)
-y <file> loose anchors: file contains rows of "Player",Rating,Uncertainty
-r <file> relations: rows of "PlayerA","PlayerB",delta_rating,uncertainty
-u <value> white advantage uncertainty value (default=0.0)
-k <value> draw rate uncertainty value % (default=0.0 %)

Excerpts from the readme file:

Loose anchors with prior information

Ordo offers an alternative approach to calculate ratings with previous knowledge from the user (using Bayesian concepts). With the switch -y, the user can provide a file with a list of players whose ratings will float around an estimated value. Those players will work as loose anchors in the list. This strategy is useful when the data is scarce and, as a consequence, wild swings could appear in the ratings. This is what happens at the beginning of a new rating list or tournament. Ordo accepts an estimated rating for a player, but takes into account how uncertain that value is. In other words, the user also has to provide the standard error for the estimated value.... etc
I've tested it. And while standard rating calculation is almost correct and error bar calculation is close (but not totally accurate), loose anchors things is quite flawed (as I pointed long time ago).
Here is an example, a pgn file between 2 opponents contains result +46/=52/-22.
Ordo gives 71Elo difference with 22.7Elo error bars (with -s10000 switch).
Real Elo difference is 70Elo and error bars are 23.2Elo.
By default, Ordo uses a draw rate of 50%. To change this, you either have use -d [draw rate] or use -D to let Ordo calculate the draw rate. Also, the default scale for Ordo is set so that 76% corresponds with 202 Elo. In order to match the scale used in your calculation, you may need to adjust Ordo's scale by using the -z switch.
running
ordo -p s1.pgn -s10000 -a0 -z199 -D -q

Code: Select all

   # PLAYER    &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 A    &#58;   35.0   23.1     72.0     120   60.0%
   2 B    &#58;  -34.9   23.1     48.0     120   40.0%
gives numbers that are almost identical. -z199 forces the scale to be 70 points for 60% performance, in order to compare the errors (so they will mean the same). -D will calculate the draw rate (for evenly match engines). It is 45% rather than the default 50%.


Milos wrote: Now I've added loose anchor file with 147Elo difference and 24.2Elo error bars (which is equivalent to a match with result +68/=32/-20).
And after running with -y switch Ordo gives 95Elo difference and 15.8Elo error bars.

Real values can be calculated by adding these 2 matches together which gives result +114/=84/-42 that is equivalent to 108Elo difference and 16.9Elo error bars.

So the error in calculation is 13Elo almost as big as the error margin itself.

This is just a small proof that your "loose anchors" calculation is pretty crappy.
I do believe that there is a problem with the loose anchors calculation at the moment, but I am pretty certain that your demonstration is faulty. Let's say that we use a prior that corresponds to +68/=32/-20. Then we run a match that ends +26/=92/-2. According to your logic, the posterior estimation of the difference should be 108 +/-15 even though the match gives us a more precise estimation than the prior. Does that seem right to you?

By the way, how are you calculating the error bars? They seem to be incorrect to me. I would be happy for you to show me that I am wrong.
Ordo's calculation is fine, but if priors (loose anchors) are used the errors will be underestimated in version 0.9.6. The reason is simple, errors are calculated based on the simulated variability of the sample, ignoring the contribution of the priors on the error. In other words, calculating the value with the maximum probability of the posterior is straightforward, but the whole distribution is not. For that reason, I am resampling the prior information in the next version and seems to work fine.

Milos results seem to be contradictory because there are several issues in the design, but Ordo's number are correct. It could be done as follows
(files s0.pgn, s1.pgn, sc.pgn, and p.csv are here https://sites.google.com/site/gaviotachessengine/ordo)

ordo -p s0.pgn -a0 -A "B" -F68 -z199 -s10000 -q

This fixes player B, so A will give the difference and the error of the difference. F68 forces to bet one sigma, since that is what it is used for uncertainty for the loose anchors. The default is 95% and that was one of the problems. -z199 to keep the same scale as above. I get:

Code: Select all

   # PLAYER    &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 A    &#58;  146.3   24.1     84.0     120   70.0%
   2 B    &#58;    0.0   ----     36.0     120   30.0%
This will become a prior information as a "relative anchor" in file r.csv
"A", "B", 146.3, 24.1
That is, A is stronger than B by 146.1 with an uncertainty of 24.1 (68%).

Then using this and running
ordo -p s1.pgn -a0 -A "B" -F68 -z199 -s10000 -q -r r.csv

we get

Code: Select all

   # PLAYER    &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 A    &#58;  106.4   12.0     72.0     120   60.0%
     B    &#58;    0.0   ----     48.0     120   40.0%
which needs to get compared to the sample that combines s0.pgn + s1.pgn = sc.pgn

ordo -p sc.pgn -a0 -A "B" -F68 -z199 -s10000 -q

Code: Select all

   # PLAYER    &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 A    &#58;  106.9   16.4    156.0     240   65.0%
   2 B    &#58;    0.0   ----     84.0     240   35.0%
The rating of the combine sample is virtually the same to s1 using prior information (106.9 v 106.4) but the error is underestimated (12 vs 16.4). Not a coincidence that the difference is near sqrt(2).

In the unreleased version, in which I am re-sampling the prior information too, I get

Code: Select all

   # PLAYER    &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 A    &#58;  106.4   17.0     72.0     120   60.0%
     B    &#58;    0.0   ----     48.0     120   40.0%
Miguel

User avatar
michiguel
Posts: 6386
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Re: Ordo 0.9.6

Post by michiguel » Mon Sep 15, 2014 4:38 am

[snip]
In the unreleased version, in which I am re-sampling the prior information too, I get

Code: Select all

   # PLAYER    &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 A    &#58;  106.4   17.0     72.0     120   60.0%
     B    &#58;    0.0   ----     48.0     120   40.0%
Miguel
That is version v0.9.7, with the above mentioned modification and other minor things
https://sites.google.com/site/gaviotach ... e/releases

Miguel

PaulieD
Posts: 205
Joined: Tue Jun 25, 2013 6:19 pm

Re: Ordo 0.9.6

Post by PaulieD » Mon Sep 15, 2014 10:16 pm

Any chance a gui will be developed for Ordo usage?

Post Reply