## Critter 1.6 - Critter 1.4a ponder ON/OFF

**Moderators:** bob, hgm, Harvey Williamson

**Forum rules**

This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

### Critter 1.6 - Critter 1.4a ponder ON/OFF

Blitz 10s per game, ponder ON

1 Critter 1.6 64-bit +2 +54/=93/-53 50.25% 100.5/200

2 Critter 1.4a 64-bit SSE4 -2 +53/=93/-54 49.75% 99.5/200

Blitz 10s per game, ponder OFF

1 Critter 1.6 64-bit +23 +56/=101/-43 53.25% 106.5/200

2 Critter 1.4a 64-bit SSE4 -23 +43/=101/-56 46.75% 93.5/200

Games played at chess960

Fritz 13 GUI

1 core used

i7 980x

windows 7

Best Regards

1 Critter 1.6 64-bit +2 +54/=93/-53 50.25% 100.5/200

2 Critter 1.4a 64-bit SSE4 -2 +53/=93/-54 49.75% 99.5/200

Blitz 10s per game, ponder OFF

1 Critter 1.6 64-bit +23 +56/=101/-43 53.25% 106.5/200

2 Critter 1.4a 64-bit SSE4 -23 +43/=101/-56 46.75% 93.5/200

Games played at chess960

Fritz 13 GUI

1 core used

i7 980x

windows 7

Best Regards

MM

### Re: Critter 1.6 - Critter 1.4a ponder ON/OFF

95% error bar isMM wrote:Code: Select all

`1 Critter 1.6 64-bit +23 +56/=101/-43 53.25% 106.5/200 2 Critter 1.4a 64-bit SSE4 -23 +43/=101/-56 46.75% 93.5/200`

**±34 Elo**

### Re: Critter 1.6 - Critter 1.4a ponder ON/OFF

Hi all,

just for the record.

utente-PC, Blitz 1m ponder ON, 1 core, 3,33 ghz, no tablebases.

1 Critter 1.4a 64-bit SSE4 +110/=280/-110 50.00% 250.0/500 -3036.00

2 Critter 1.6 64-bit +110/=280/-110 50.00% 250.0/500 -3036.00

Best Regards

just for the record.

utente-PC, Blitz 1m ponder ON, 1 core, 3,33 ghz, no tablebases.

1 Critter 1.4a 64-bit SSE4 +110/=280/-110 50.00% 250.0/500 -3036.00

2 Critter 1.6 64-bit +110/=280/-110 50.00% 250.0/500 -3036.00

Best Regards

MM

### Re: Critter 1.6 - Critter 1.4a ponder ON/OFF

MM wrote:Hi all,

just for the record.

utente-PC, Blitz 1m ponder ON, 1 core, 3,33 ghz, no tablebases.

1 Critter 1.4a 64-bit SSE4 +110/=280/-110 50.00% 250.0/500 -3036.00

2 Critter 1.6 64-bit +110/=280/-110 50.00% 250.0/500 -3036.00

Best Regards

utente-PC, Blitz 1m ponder OFF

1 Critter 1.6 64-bit +23 +125/=283/-92 53.30% 266.5/500

2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500

MM

### Re: Critter 1.6 - Critter 1.4a ponder ON/OFF

95% error bar is now ±21 Elo, which means that there is more than 95% probability that with ponder OFF, Critter 1.6 is stronger than Critter 1.4a SSE4MM wrote:Code: Select all

`1 Critter 1.6 64-bit +23 +125/=283/-92 53.30% 266.5/500 2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500`

However, it cannot yet be said that the (ponder OFF) and (ponder ON) distributions are distinct, with 95% probability, the global result (summation ON+OFF) being:

Code: Select all

```
1 Critter 1.6 64-bit +15 +235/=563/-202 51.65% 516.5/1000
2 Critter 1.4a 64-bit SSE4 -15 +202/=563/-235 48.35% 483.5/1000
```

- Ajedrecista
**Posts:**1397**Joined:**Wed Jul 13, 2011 7:04 pm**Location:**Madrid, Spain.-
**Contact:**

### Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Hello Ernest:

It seems that my programme more less agrees with BayesElo in the first match, which is an achievement! However, in the second match, ratting difference is ~ 11.5 Elo, not 15 Elo... which is more likely the error bar. I notice that you are adding two 500-game matches, so I do not know if I can simply use +235 =563 -202 of one 1000-game match or not.

Regards from Spain.

Ajedrecista.

I suppose that all these error bars were obtained with the great BayesElo. I ran my own small programme, just to compare my results:ernest wrote:95% error bar is now ±21 Elo, which means that there is more than 95% probability that with ponder OFF, Critter 1.6 is stronger than Critter 1.4a SSE4MM wrote:Code: Select all

`1 Critter 1.6 64-bit +23 +125/=283/-92 53.30% 266.5/500 2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500`

However, it cannot yet be said that the (ponder OFF) and (ponder ON) distributions are distinct, with 95% probability, the global result (summation ON+OFF) being:This means that it cannot be said (with 95% probability), as of yet, that compared to Critter 1.4a SSE4, Critter 1.6 performs better with ponder OFF than with Ponder ON.Code: Select all

`1 Critter 1.6 64-bit +15 +235/=563/-202 51.65% 516.5/1000 2 Critter 1.4a 64-bit SSE4 -15 +202/=563/-235 48.35% 483.5/1000`

Code: Select all

```
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Write down the number of wins:
125
Write down the number of loses:
92
Write down the number of draws:
283
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************
---------------------------------------
Elo interval for 1-sigma confidence:
Elo rating difference: 22.96 Elo
Lower rating difference: 12.75 Elo
Upper rating difference: 33.22 Elo
Lower bound uncertainty: -10.21 Elo
Upper bound uncertainty: 10.25 Elo
Average error: +/- 10.23 Elo
K = (average error)*[sqrt(n)] = 228.80
Elo interval: ] 12.75, 33.22[
---------------------------------------
Elo interval for 2-sigma confidence:
Elo rating difference: 22.96 Elo
Lower rating difference: 2.56 Elo
Upper rating difference: 43.53 Elo
Lower bound uncertainty: -20.40 Elo
Upper bound uncertainty: 20.56 Elo
Average error: +/- 20.48 Elo
K = (average error)*[sqrt(n)] = 458.00
Elo interval: ] 2.56, 43.53[
---------------------------------------
Elo interval for 3-sigma confidence:
Elo rating difference: 22.96 Elo
Lower rating difference: -7.62 Elo
Upper rating difference: 53.91 Elo
Lower bound uncertainty: -30.59 Elo
Upper bound uncertainty: 30.95 Elo
Average error: +/- 30.77 Elo
K = (average error)*[sqrt(n)] = 688.01
Elo interval: ] -7.62, 53.91[
---------------------------------------
Number of games of the match: 500
Score: 53.30 %
Elo rating difference: 22.96 Elo
Draw ratio: 56.60 %
**********************************************
1 sigma: 1.4657 % of the points of the match.
2 sigma: 2.9314 % of the points of the match.
3 sigma: 4.3970 % of the points of the match.
**********************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS: 98.78 %
This value of LOS is rounded up to 0.01%
End of the calculations. Approximated elapsed time: 50 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
```

Code: Select all

```
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Write down the number of wins:
235
Write down the number of loses:
202
Write down the number of draws:
563
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************
---------------------------------------
Elo interval for 1-sigma confidence:
Elo rating difference: 11.47 Elo
Lower rating difference: 4.21 Elo
Upper rating difference: 18.74 Elo
Lower bound uncertainty: -7.26 Elo
Upper bound uncertainty: 7.27 Elo
Average error: +/- 7.26 Elo
K = (average error)*[sqrt(n)] = 229.67
Elo interval: ] 4.21, 18.74[
---------------------------------------
Elo interval for 2-sigma confidence:
Elo rating difference: 11.47 Elo
Lower rating difference: -3.04 Elo
Upper rating difference: 26.02 Elo
Lower bound uncertainty: -14.51 Elo
Upper bound uncertainty: 14.55 Elo
Average error: +/- 14.53 Elo
K = (average error)*[sqrt(n)] = 459.55
Elo interval: ] -3.04, 26.02[
---------------------------------------
Elo interval for 3-sigma confidence:
Elo rating difference: 11.47 Elo
Lower rating difference: -10.30 Elo
Upper rating difference: 33.33 Elo
Lower bound uncertainty: -21.77 Elo
Upper bound uncertainty: 21.86 Elo
Average error: +/- 21.81 Elo
K = (average error)*[sqrt(n)] = 689.83
Elo interval: ] -10.30, 33.33[
---------------------------------------
Number of games of the match: 1000
Score: 51.65 %
Elo rating difference: 11.47 Elo
Draw ratio: 56.30 %
**********************************************
1 sigma: 1.0439 % of the points of the match.
2 sigma: 2.0878 % of the points of the match.
3 sigma: 3.1318 % of the points of the match.
**********************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS: 94.30 %
This value of LOS is rounded up to 0.01%
End of the calculations. Approximated elapsed time: 47 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
```

Regards from Spain.

Ajedrecista.

### Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Hi Jesus,Ajedrecista wrote:I suppose that all these error bars were obtained with the great BayesElo.

Not at all, I compute these ratings and error bars by hand (sometimes using a hand calculator), using my basic knowledge (school/university) in statistics.

Here we have a trinomial distribution (win-loss-draw) and when the result is close to 50%, you have SD(sigma)=[sqrt(W+L)]/2N (formula is a little more complicated if the result is not close to 50%)

So I find the 2SD error bar (95% probability) of

Code: Select all

```
P OFF
1 Critter 1.6 64-bit +23 +125/=283/-92 53.30% 266.5/500
2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500
```

and multiplying that % by 7 (valid for low %) you get the Elo error bar = 20.6

**rounded to 21**.

Now if we want to see if Ponder OFF or ON makes a

**significant**difference in a match between Critter 1.6 and Critter 1.4a SSE4, we have to consider the global (sum) distribution

Code: Select all

```
P ON+OFF
1 Critter 1.6 64-bit +12 +235/=563/-202 51.65% 516.5/1000
2 Critter 1.4a 64-bit SSE4 -12 +202/=563/-235 48.35% 483.5/1000
```

**note:**see the +12 Elo advantage, not +15 as written by mistake previously, +15 is actually the 2SD of this global (sum) distribution.

(you were perfectly right with your

*However, in the second match, ratting difference is ~ 11.5 Elo, not 15 Elo... which is more likely the error bar*).

If from this 1000-game distribution you pick a 500-game sample, you expect that sample to have a mean of 51.65% (or +12 Elo) and a SD of sqrt(1000/500)*15/2= 11 Elo

Since the actual P ON sample (50%, 0 Elo) is 12 Elo away from that mean, SD being 11 Elo, that P ON sample does not distinguish itself enough from the P ON+OFF distribution.

Same reasoning for the actual P OFF sample (53.3%, 23 Elo).

- Ajedrecista
**Posts:**1397**Joined:**Wed Jul 13, 2011 7:04 pm**Location:**Madrid, Spain.-
**Contact:**

### Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Hi again!

You will find a value for the average error |<e>| in the post I called #2. This is:

If you had not posted the trick of multiplying by seven, I will not realize never about this number 16/ln(10), so today I have learnt something. Thanks!

@Maurizio: There is no intention of hijacking your thread, but you can see that statistics applied to error bars could be a whole world! At least it is boundless for me. Thanks for your comprehension and your tests! Please keep up the good work.

Regards from Spain.

Ajedrecista.

I see. I also used to calculate them by hand with the only help of a hand calculator, until I did a programme in Fortran. I use this standard deviation:ernest wrote:I compute these ratings and error bars by hand (sometimes using a hand calculator), using my basic knowledge (school/university) in statistics.

I took this formula from the 22nd post of this thread. I posted two messages in January that might be useful: #1 and #2.n = wins + draws + loses

µ = (wins + draws/2)/n

D = draws/n

σ = sqrt{[µ·(1 - µ) - D/4]/n}

I did not know that, when µ ~ 0.5, then σ ~ sqrt(wins + loses)/2n in this trinomial distribution. It is interesting, so thank you for share it. Rewriting your standard deviation using the draw ratio D: σ = sqrt[n·(1 - D)]/2n = (1/2)·sqrt[(1 - D)/n]. If I compare our nσ², I obtain:ernest wrote:Here we have a trinomial distribution (win-loss-draw) and when the result is close to 50%, you have SD(sigma)=[sqrt(W+L)]/2N (formula is a little more complicated if the result is not close to 50%)

Which are exactly the same with µ = 1/2. Your nσ² does not depend on µ, while mine yes... although the expression of σ that I use is not good for µ (or 1 - µ) > 0.85 or 0.9, for saying something. For your info: µ must be in the interval [0.15, 0.85] in my programme, else it does not calculate anything. The farest is µ from 1/2, the less accurate is the value of σ; it also has a problem with the extreme case of D = 1 (100% of draws), when σ = 0. But it is just a model that works reasonably well in real cases.(Yours): nσ² = (1 - D)/4

(Mine): nσ² = µ·(1 - µ) - D/4; (mine with µ = 0.5): nσ² = (1 - D)/4

You will find a value for the average error |<e>| in the post I called #2. This is:

Where k denotes the confidence level (k = 1.96 for ~ 95% confidence, k = 2 for ~ 95.45% confidence, etc.). If I replace µ = 1/2 in that equation:|<e>| = 200·log[(µ + kσ)(1 - µ + kσ)/(µ - kσ)(1 - µ - kσ)]

Here, σ is not in percentage; if you want σ in percentage, then the constant that multiplies kσ is 16/ln(10) ~ 6.9487, which is almost your seven. This could be valid in my approximation with a normal distribution, although this should be valid only when µ = 0.5 and kσ (or σ, for reasonable confidence levels, where k is finite) tends to zero, because it is a rough approximation with those assumptions.|<e>| = 200·log[(0.5 + kσ)(0.5 + kσ)/(0.5 - kσ)(0.5 - kσ)] = 400·log[(0.5 + kσ)/(0.5 - kσ)] = [400/ln(10)]·[ln(1 + 2kσ) - ln(1 - 2kσ)]

With kσ > 0 and kσ << 1 (lots of games): ln(1 + 2kσ) ~ 2kσ; ln(1 - 2kσ) ~ -2kσ

|<e>| ~ 400·4kσ/ln(10) = [1600/ln(10)]·kσ

If you had not posted the trick of multiplying by seven, I will not realize never about this number 16/ln(10), so today I have learnt something. Thanks!

@Maurizio: There is no intention of hijacking your thread, but you can see that statistics applied to error bars could be a whole world! At least it is boundless for me. Thanks for your comprehension and your tests! Please keep up the good work.

Regards from Spain.

Ajedrecista.

### Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Hi, i'm only glad of this interest And i'm interested too. ThanksAjedrecista wrote:

@Maurizio: There is no intention of hijacking your thread, but you can see that statistics applied to error bars could be a whole world! At least it is boundless for me. Thanks for your comprehension and your tests! Please keep up the good work.

Regards from Spain.

Ajedrecista.

MM

### Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Hi Jesus,Ajedrecista wrote:Hi again!

Thanks for this detailed post, I will study it carefully!

Of course, your program gives more accurate numbers, I only get (not too bad) approximations.

Do you have a comment on my section starting with

*Now if we want to see if Ponder OFF or ON makes a*

**significant**difference in a match between Critter 1.6 and Critter 1.4a SSE4, we have to consider the global (sum) distributionwhich shows that so far (i.e. with only those 500+500 games) the difference is NOT significant?