## Ordo 0.9.6

**Moderators:** hgm, Harvey Williamson, bob

**Forum rules**

This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

### Ordo 0.9.6

https://sites.google.com/site/gaviotach ... e/releases

Finally, I merged the experimental branch with "loose" anchors and relative ones, mentioned a while ago.

http://www.talkchess.com/forum/viewtopi ... o+approach

In addition, now the user can round the output number (switch -N)

Switches added:

-N <value> Output, number of decimals, minimum is 0 (default=1)

-y <file> loose anchors: file contains rows of "Player",Rating,Uncertainty

-r <file> relations: rows of "PlayerA","PlayerB",delta_rating,uncertainty

-u <value> white advantage uncertainty value (default=0.0)

-k <value> draw rate uncertainty value % (default=0.0 %)

Excerpts from the readme file:

Loose anchors with prior information

Ordo offers an alternative approach to calculate ratings with previous knowledge from the user (using Bayesian concepts). With the switch -y, the user can provide a file with a list of players whose ratings will float around an estimated value. Those players will work as loose anchors in the list. This strategy is useful when the data is scarce and, as a consequence, wild swings could appear in the ratings. This is what happens at the beginning of a new rating list or tournament. Ordo accepts an estimated rating for a player, but takes into account how uncertain that value is. In other words, the user also has to provide the standard error for the estimated value.... etc

Relative anchors

Another problem in some engine tournaments is that version upgrades enter with no previous ratings. However, we know in certain situations that the new versions cannot have very different ratings from the previous one. Therefore, the user can make a good educated guess about the rating of the new version.... etc

Latest readme file

Miguel

Finally, I merged the experimental branch with "loose" anchors and relative ones, mentioned a while ago.

http://www.talkchess.com/forum/viewtopi ... o+approach

In addition, now the user can round the output number (switch -N)

Switches added:

-N <value> Output, number of decimals, minimum is 0 (default=1)

-y <file> loose anchors: file contains rows of "Player",Rating,Uncertainty

-r <file> relations: rows of "PlayerA","PlayerB",delta_rating,uncertainty

-u <value> white advantage uncertainty value (default=0.0)

-k <value> draw rate uncertainty value % (default=0.0 %)

Excerpts from the readme file:

Loose anchors with prior information

Ordo offers an alternative approach to calculate ratings with previous knowledge from the user (using Bayesian concepts). With the switch -y, the user can provide a file with a list of players whose ratings will float around an estimated value. Those players will work as loose anchors in the list. This strategy is useful when the data is scarce and, as a consequence, wild swings could appear in the ratings. This is what happens at the beginning of a new rating list or tournament. Ordo accepts an estimated rating for a player, but takes into account how uncertain that value is. In other words, the user also has to provide the standard error for the estimated value.... etc

Relative anchors

Another problem in some engine tournaments is that version upgrades enter with no previous ratings. However, we know in certain situations that the new versions cannot have very different ratings from the previous one. Therefore, the user can make a good educated guess about the rating of the new version.... etc

Latest readme file

Miguel

### Re: Ordo 0.9.6

I've tested it. And while standard rating calculation is almost correct and error bar calculation is close (but not totally accurate), loose anchors things is quite flawed (as I pointed long time ago).michiguel wrote:https://sites.google.com/site/gaviotach ... e/releases

Finally, I merged the experimental branch with "loose" anchors and relative ones, mentioned a while ago.

http://www.talkchess.com/forum/viewtopi ... o+approach

In addition, now the user can round the output number (switch -N)

Switches added:

-N <value> Output, number of decimals, minimum is 0 (default=1)

-y <file> loose anchors: file contains rows of "Player",Rating,Uncertainty

-r <file> relations: rows of "PlayerA","PlayerB",delta_rating,uncertainty

-u <value> white advantage uncertainty value (default=0.0)

-k <value> draw rate uncertainty value % (default=0.0 %)

Excerpts from the readme file:

Loose anchors with prior information

Ordo offers an alternative approach to calculate ratings with previous knowledge from the user (using Bayesian concepts). With the switch -y, the user can provide a file with a list of players whose ratings will float around an estimated value. Those players will work as loose anchors in the list. This strategy is useful when the data is scarce and, as a consequence, wild swings could appear in the ratings. This is what happens at the beginning of a new rating list or tournament. Ordo accepts an estimated rating for a player, but takes into account how uncertain that value is. In other words, the user also has to provide the standard error for the estimated value.... etc

Here is an example, a pgn file between 2 opponents contains result +46/=52/-22.

Ordo gives 71Elo difference with 22.7Elo error bars (with -s10000 switch).

Real Elo difference is 70Elo and error bars are 23.2Elo.

Now I've added loose anchor file with 147Elo difference and 24.2Elo error bars (which is equivalent to a match with result +68/=32/-20).

And after running with -y switch Ordo gives 95Elo difference and 15.8Elo error bars.

Real values can be calculated by adding these 2 matches together which gives result +114/=84/-42 that is equivalent to 108Elo difference and 16.9Elo error bars.

So the error in calculation is 13Elo almost as big as the error margin itself.

This is just a small proof that your "loose anchors" calculation is pretty crappy.

### Re: Ordo 0.9.6

By default, Ordo uses a draw rate of 50%. To change this, you either have use -d [draw rate] or use -D to let Ordo calculate the draw rate. Also, the default scale for Ordo is set so that 76% corresponds with 202 Elo. In order to match the scale used in your calculation, you may need to adjust Ordo's scale by using the -z switch.Milos wrote:I've tested it. And while standard rating calculation is almost correct and error bar calculation is close (but not totally accurate), loose anchors things is quite flawed (as I pointed long time ago).michiguel wrote:https://sites.google.com/site/gaviotach ... e/releases

Finally, I merged the experimental branch with "loose" anchors and relative ones, mentioned a while ago.

http://www.talkchess.com/forum/viewtopi ... o+approach

In addition, now the user can round the output number (switch -N)

Switches added:

-N <value> Output, number of decimals, minimum is 0 (default=1)

-y <file> loose anchors: file contains rows of "Player",Rating,Uncertainty

-r <file> relations: rows of "PlayerA","PlayerB",delta_rating,uncertainty

-u <value> white advantage uncertainty value (default=0.0)

-k <value> draw rate uncertainty value % (default=0.0 %)

Excerpts from the readme file:

Loose anchors with prior information

Ordo offers an alternative approach to calculate ratings with previous knowledge from the user (using Bayesian concepts). With the switch -y, the user can provide a file with a list of players whose ratings will float around an estimated value. Those players will work as loose anchors in the list. This strategy is useful when the data is scarce and, as a consequence, wild swings could appear in the ratings. This is what happens at the beginning of a new rating list or tournament. Ordo accepts an estimated rating for a player, but takes into account how uncertain that value is. In other words, the user also has to provide the standard error for the estimated value.... etc

Here is an example, a pgn file between 2 opponents contains result +46/=52/-22.

Ordo gives 71Elo difference with 22.7Elo error bars (with -s10000 switch).

Real Elo difference is 70Elo and error bars are 23.2Elo.

I do believe that there is a problem with the loose anchors calculation at the moment, but I am pretty certain that your demonstration is faulty. Let's say that we use a prior that corresponds to +68/=32/-20. Then we run a match that ends +26/=92/-2. According to your logic, the posterior estimation of the difference should be 108 +/-15 even though the match gives us a more precise estimation than the prior. Does that seem right to you?Milos wrote: Now I've added loose anchor file with 147Elo difference and 24.2Elo error bars (which is equivalent to a match with result +68/=32/-20).

And after running with -y switch Ordo gives 95Elo difference and 15.8Elo error bars.

Real values can be calculated by adding these 2 matches together which gives result +114/=84/-42 that is equivalent to 108Elo difference and 16.9Elo error bars.

So the error in calculation is 13Elo almost as big as the error margin itself.

This is just a small proof that your "loose anchors" calculation is pretty crappy.

By the way, how are you calculating the error bars? They seem to be incorrect to me. I would be happy for you to show me that I am wrong.

- Ajedrecista
**Posts:**1376**Joined:**Wed Jul 13, 2011 7:04 pm**Location:**Madrid, Spain.-
**Contact:**

### Re: Ordo 0.9.6.

Hello:

I am not an expert in rating calculations, but when Milos says 'standard rating calculation is almost correct' I think that it is because one Ordo point is not equivalent to one logistic Elo (like Celsius and Fahrenheit degrees), which Adam explains as 'the default scale for Ordo is set so that 76% corresponds with 202 Elo'.

Regarding error bars, I do not know how are you calculating them and with which confidence level, but I get completely different results. My upper and lower bounds are not symmetric (they are symmetric in the case of score = 50%) but almost (a few Elo of difference (in absolute value) in these examples with 95% confidence ~ 1.96-sigma confidence), so I report my own results giving an average error bar:

My tool calculates error bars with confidence level between 65% and 99.9% and your reported error bars are a little smaller than with 65% confidence ~ 0.9346-sigma confidence... according with my computations, which may not be perfect, although I have not seen such differences never. I use sample mean and sample standard deviation. Just search my name and 'error bars' in this forum and you will get tons of examples.

What confidence interval are you using?

------------

@Miguel and Adam: thank you very much for your effort!

Regards from Spain.

Ajedrecista.

I am not an expert in rating calculations, but when Milos says 'standard rating calculation is almost correct' I think that it is because one Ordo point is not equivalent to one logistic Elo (like Celsius and Fahrenheit degrees), which Adam explains as 'the default scale for Ordo is set so that 76% corresponds with 202 Elo'.

Regarding error bars, I do not know how are you calculating them and with which confidence level, but I get completely different results. My upper and lower bounds are not symmetric (they are symmetric in the case of score = 50%) but almost (a few Elo of difference (in absolute value) in these examples with 95% confidence ~ 1.96-sigma confidence), so I report my own results giving an average error bar:

Code: Select all

```
95% confidence ~ 1.96-sigma confidence:
+ 46 = 52 -22 70.44 ± 47.52
+ 68 = 32 -20 147.19 ± 57.02
+114 = 84 -42 107.54 ± 36.39
+ 26 = 92 - 2 70.44 ± 28.66
+ 94 =124 -22 107.54 ± 30.46
```

What confidence interval are you using?

------------

@Miguel and Adam: thank you very much for your effort!

Regards from Spain.

Ajedrecista.

### Re: Ordo 0.9.6.

Hello JesúsAjedrecista wrote:

@Miguel and Adam: thank you very much for your effort!

Regards from Spain.

Ajedrecista.

Thank you! Most of the effort is made by Miguel. But Ordo is a project that is interesting to both of us, and we hope that we can continue to improve it.

Adam

### Re: Ordo 0.9.6.

The sample s1.pgn I uploaded temporarily hereAjedrecista wrote:Hello:

I am not an expert in rating calculations, but when Milos says 'standard rating calculation is almost correct' I think that it is because one Ordo point is not equivalent to one logistic Elo (like Celsius and Fahrenheit degrees), which Adam explains as 'the default scale for Ordo is set so that 76% corresponds with 202 Elo'.

Regarding error bars, I do not know how are you calculating them and with which confidence level, but I get completely different results. My upper and lower bounds are not symmetric (they are symmetric in the case of score = 50%) but almost (a few Elo of difference (in absolute value) in these examples with 95% confidence ~ 1.96-sigma confidence), so I report my own results giving an average error bar:

My tool calculates error bars with confidence level between 65% and 99.9% and your reported error bars are a little smaller than with 65% confidence ~ 0.9346-sigma confidence... according with my computations, which may not be perfect, although I have not seen such differences never. I use sample mean and sample standard deviation. Just search my name and 'error bars' in this forum and you will get tons of examples.Code: Select all

`95% confidence ~ 1.96-sigma confidence: + 46 = 52 -22 70.44 ± 47.52 + 68 = 32 -20 147.19 ± 57.02 +114 = 84 -42 107.54 ± 36.39 + 26 = 92 - 2 70.44 ± 28.66 + 94 =124 -22 107.54 ± 30.46`

What confidence interval are you using?

------------

@Miguel and Adam: thank you very much for your effort!

Regards from Spain.

Ajedrecista.

https://sites.google.com/site/gaviotachessengine/ordo

If I run it this way

**ordo -p s1.pgn -a0 -A"B" -F95 -z200 -s10000 -D -q**

which means

**-a0 -A "B"**--> anchor player "B" to zero, so only A will oscilate in order for the error to represent the "difference"

**-F95**---> confidence 95% (it is the default, just to make it explicit)

**-z200**---> forces 200 points (default is 202) to be 76% performance to exactly align the scale with yours, so the error will mean the same

**-s10000**----> (ten thousands simulations)

**-D**---> calculate draw rate

**-q**(quiet)

I get

Code: Select all

```
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 A : 70.4 46.5 72.0 120 60.0%
2 B : 0.0 ---- 48.0 120 40.0%
```

Milos must have run it somewhat without fixing one engine and without the switch -D (which will assume 50% drawrate).

**ordo -p s1.pgn -s10000 -q**

Code: Select all

```
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 A : 2335.5 22.3 72.0 120 60.0%
2 B : 2264.5 22.3 48.0 120 40.0%
```

Miguel

### Re: Ordo 0.9.6

runningAdam Hair wrote:By default, Ordo uses a draw rate of 50%. To change this, you either have use -d [draw rate] or use -D to let Ordo calculate the draw rate. Also, the default scale for Ordo is set so that 76% corresponds with 202 Elo. In order to match the scale used in your calculation, you may need to adjust Ordo's scale by using the -z switch.Milos wrote:I've tested it. And while standard rating calculation is almost correct and error bar calculation is close (but not totally accurate), loose anchors things is quite flawed (as I pointed long time ago).michiguel wrote:https://sites.google.com/site/gaviotach ... e/releases

Finally, I merged the experimental branch with "loose" anchors and relative ones, mentioned a while ago.

http://www.talkchess.com/forum/viewtopi ... o+approach

In addition, now the user can round the output number (switch -N)

Switches added:

-N <value> Output, number of decimals, minimum is 0 (default=1)

-y <file> loose anchors: file contains rows of "Player",Rating,Uncertainty

-r <file> relations: rows of "PlayerA","PlayerB",delta_rating,uncertainty

-u <value> white advantage uncertainty value (default=0.0)

-k <value> draw rate uncertainty value % (default=0.0 %)

Excerpts from the readme file:

Loose anchors with prior information

Ordo offers an alternative approach to calculate ratings with previous knowledge from the user (using Bayesian concepts). With the switch -y, the user can provide a file with a list of players whose ratings will float around an estimated value. Those players will work as loose anchors in the list. This strategy is useful when the data is scarce and, as a consequence, wild swings could appear in the ratings. This is what happens at the beginning of a new rating list or tournament. Ordo accepts an estimated rating for a player, but takes into account how uncertain that value is. In other words, the user also has to provide the standard error for the estimated value.... etc

Here is an example, a pgn file between 2 opponents contains result +46/=52/-22.

Ordo gives 71Elo difference with 22.7Elo error bars (with -s10000 switch).

Real Elo difference is 70Elo and error bars are 23.2Elo.

**ordo -p s1.pgn -s10000 -a0 -z199 -D -q**

Code: Select all

```
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 A : 35.0 23.1 72.0 120 60.0%
2 B : -34.9 23.1 48.0 120 40.0%
```

Ordo's calculation is fine, but if priors (loose anchors) are used the errors will be underestimated in version 0.9.6. The reason is simple, errors are calculated based on the simulated variability of the sample, ignoring the contribution of the priors on the error. In other words, calculating the value with the maximum probability of the posterior is straightforward, but the whole distribution is not. For that reason, I am resampling the prior information in the next version and seems to work fine.I do believe that there is a problem with the loose anchors calculation at the moment, but I am pretty certain that your demonstration is faulty. Let's say that we use a prior that corresponds to +68/=32/-20. Then we run a match that ends +26/=92/-2. According to your logic, the posterior estimation of the difference should be 108 +/-15 even though the match gives us a more precise estimation than the prior. Does that seem right to you?Milos wrote: Now I've added loose anchor file with 147Elo difference and 24.2Elo error bars (which is equivalent to a match with result +68/=32/-20).

And after running with -y switch Ordo gives 95Elo difference and 15.8Elo error bars.

Real values can be calculated by adding these 2 matches together which gives result +114/=84/-42 that is equivalent to 108Elo difference and 16.9Elo error bars.

So the error in calculation is 13Elo almost as big as the error margin itself.

This is just a small proof that your "loose anchors" calculation is pretty crappy.

By the way, how are you calculating the error bars? They seem to be incorrect to me. I would be happy for you to show me that I am wrong.

Milos results seem to be contradictory because there are several issues in the design, but Ordo's number are correct. It could be done as follows

(files s0.pgn, s1.pgn, sc.pgn, and p.csv are here https://sites.google.com/site/gaviotachessengine/ordo)

**ordo -p s0.pgn -a0 -A "B" -F68 -z199 -s10000 -q**

This fixes player B, so A will give the difference and the error of the difference. F68 forces to bet one sigma, since that is what it is used for uncertainty for the loose anchors. The default is 95% and that was one of the problems. -z199 to keep the same scale as above. I get:

Code: Select all

```
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 A : 146.3 24.1 84.0 120 70.0%
2 B : 0.0 ---- 36.0 120 30.0%
```

"A", "B", 146.3, 24.1

That is, A is stronger than B by 146.1 with an uncertainty of 24.1 (68%).

Then using this and running

**ordo -p s1.pgn -a0 -A "B" -F68 -z199 -s10000 -q -r r.csv**

we get

Code: Select all

```
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 A : 106.4 12.0 72.0 120 60.0%
B : 0.0 ---- 48.0 120 40.0%
```

**ordo -p sc.pgn -a0 -A "B" -F68 -z199 -s10000 -q**

Code: Select all

```
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 A : 106.9 16.4 156.0 240 65.0%
2 B : 0.0 ---- 84.0 240 35.0%
```

In the unreleased version, in which I am re-sampling the prior information too, I get

Code: Select all

```
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 A : 106.4 17.0 72.0 120 60.0%
B : 0.0 ---- 48.0 120 40.0%
```

### Re: Ordo 0.9.6

[snip]

https://sites.google.com/site/gaviotach ... e/releases

Miguel

That is version v0.9.7, with the above mentioned modification and other minor thingsIn the unreleased version, in which I am re-sampling the prior information too, I getMiguelCode: Select all

`# PLAYER : RATING ERROR POINTS PLAYED (%) 1 A : 106.4 17.0 72.0 120 60.0% B : 0.0 ---- 48.0 120 40.0%`

https://sites.google.com/site/gaviotach ... e/releases

Miguel

### Re: Ordo 0.9.6

Any chance a gui will be developed for Ordo usage?