Critter 1.6 - Critter 1.4a ponder ON/OFF

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Critter 1.6 - Critter 1.4a ponder ON/OFF

Post by MM »

Blitz 10s per game, ponder ON


1 Critter 1.6 64-bit +2 +54/=93/-53 50.25% 100.5/200
2 Critter 1.4a 64-bit SSE4 -2 +53/=93/-54 49.75% 99.5/200



Blitz 10s per game, ponder OFF


1 Critter 1.6 64-bit +23 +56/=101/-43 53.25% 106.5/200
2 Critter 1.4a 64-bit SSE4 -23 +43/=101/-56 46.75% 93.5/200


Games played at chess960

Fritz 13 GUI

1 core used

i7 980x

windows 7


Best Regards
MM
ernest
Posts: 2041
Joined: Wed Mar 08, 2006 8:30 pm

Re: Critter 1.6 - Critter 1.4a ponder ON/OFF

Post by ernest »

MM wrote:

Code: Select all

1   Critter 1.6 64-bit             +23  +56/=101/-43 53.25%  106.5/200
2   Critter 1.4a 64-bit SSE4       -23  +43/=101/-56 46.75%   93.5/200
95% error bar is ±34 Elo 8-)
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Critter 1.6 - Critter 1.4a ponder ON/OFF

Post by MM »

Hi all,

just for the record.

utente-PC, Blitz 1m ponder ON, 1 core, 3,33 ghz, no tablebases.

1 Critter 1.4a 64-bit SSE4 +110/=280/-110 50.00% 250.0/500 -3036.00
2 Critter 1.6 64-bit +110/=280/-110 50.00% 250.0/500 -3036.00

Best Regards
MM
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Critter 1.6 - Critter 1.4a ponder ON/OFF

Post by MM »

MM wrote:Hi all,

just for the record.

utente-PC, Blitz 1m ponder ON, 1 core, 3,33 ghz, no tablebases.

1 Critter 1.4a 64-bit SSE4 +110/=280/-110 50.00% 250.0/500 -3036.00
2 Critter 1.6 64-bit +110/=280/-110 50.00% 250.0/500 -3036.00

Best Regards

utente-PC, Blitz 1m ponder OFF

1 Critter 1.6 64-bit +23 +125/=283/-92 53.30% 266.5/500
2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500
MM
ernest
Posts: 2041
Joined: Wed Mar 08, 2006 8:30 pm

Re: Critter 1.6 - Critter 1.4a ponder ON/OFF

Post by ernest »

MM wrote:

Code: Select all

1	Critter 1.6 64-bit	       +23	+125/=283/-92	53.30%		266.5/500
2	Critter 1.4a 64-bit SSE4	 -23	+92/=283/-125	46.70%		233.5/500
95% error bar is now ±21 Elo, which means that there is more than 95% probability that with ponder OFF, Critter 1.6 is stronger than Critter 1.4a SSE4

However, it cannot yet be said that the (ponder OFF) and (ponder ON) distributions are distinct, with 95% probability, the global result (summation ON+OFF) being:

Code: Select all

1 Critter 1.6 64-bit	       +15	+235/=563/-202	51.65%		516.5/1000
2 Critter 1.4a 64-bit SSE4	 -15	+202/=563/-235	48.35%		483.5/1000
This means that it cannot be said (with 95% probability), as of yet, that compared to Critter 1.4a SSE4, Critter 1.6 performs better with ponder OFF than with Ponder ON.
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Post by Ajedrecista »

Hello Ernest:
ernest wrote:
MM wrote:

Code: Select all

1	Critter 1.6 64-bit	       +23	+125/=283/-92	53.30%		266.5/500
2	Critter 1.4a 64-bit SSE4	 -23	+92/=283/-125	46.70%		233.5/500
95% error bar is now ±21 Elo, which means that there is more than 95% probability that with ponder OFF, Critter 1.6 is stronger than Critter 1.4a SSE4

However, it cannot yet be said that the (ponder OFF) and (ponder ON) distributions are distinct, with 95% probability, the global result (summation ON+OFF) being:

Code: Select all

1 Critter 1.6 64-bit	       +15	+235/=563/-202	51.65%		516.5/1000
2 Critter 1.4a 64-bit SSE4	 -15	+202/=563/-235	48.35%		483.5/1000
This means that it cannot be said (with 95% probability), as of yet, that compared to Critter 1.4a SSE4, Critter 1.6 performs better with ponder OFF than with Ponder ON.
I suppose that all these error bars were obtained with the great BayesElo. I ran my own small programme, just to compare my results:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

125

Write down the number of loses:

92

Write down the number of draws:

283

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     22.96 Elo

Lower rating difference:   12.75 Elo
Upper rating difference:   33.22 Elo

Lower bound uncertainty:  -10.21 Elo
Upper bound uncertainty:   10.25 Elo
Average error:        +/-  10.23 Elo

K = (average error)*[sqrt(n)] =  228.80

Elo interval: ]  12.75,   33.22[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     22.96 Elo

Lower rating difference:    2.56 Elo
Upper rating difference:   43.53 Elo

Lower bound uncertainty:  -20.40 Elo
Upper bound uncertainty:   20.56 Elo
Average error:        +/-  20.48 Elo

K = (average error)*[sqrt(n)] =  458.00

Elo interval: ]   2.56,   43.53[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     22.96 Elo

Lower rating difference:   -7.62 Elo
Upper rating difference:   53.91 Elo

Lower bound uncertainty:  -30.59 Elo
Upper bound uncertainty:   30.95 Elo
Average error:        +/-  30.77 Elo

K = (average error)*[sqrt(n)] =  688.01

Elo interval: ]  -7.62,   53.91[
---------------------------------------

Number of games of the match:                500
Score: 53.30 %
Elo rating difference:   22.96 Elo
Draw ratio: 56.60 %

**********************************************
1 sigma:  1.4657 % of the points of the match.
2 sigma:  2.9314 % of the points of the match.
3 sigma:  4.3970 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:  98.78 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  50 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

235

Write down the number of loses:

202

Write down the number of draws:

563

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     11.47 Elo

Lower rating difference:    4.21 Elo
Upper rating difference:   18.74 Elo

Lower bound uncertainty:   -7.26 Elo
Upper bound uncertainty:    7.27 Elo
Average error:        +/-   7.26 Elo

K = (average error)*[sqrt(n)] =  229.67

Elo interval: ]   4.21,   18.74[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     11.47 Elo

Lower rating difference:   -3.04 Elo
Upper rating difference:   26.02 Elo

Lower bound uncertainty:  -14.51 Elo
Upper bound uncertainty:   14.55 Elo
Average error:        +/-  14.53 Elo

K = (average error)*[sqrt(n)] =  459.55

Elo interval: ]  -3.04,   26.02[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     11.47 Elo

Lower rating difference:  -10.30 Elo
Upper rating difference:   33.33 Elo

Lower bound uncertainty:  -21.77 Elo
Upper bound uncertainty:   21.86 Elo
Average error:        +/-  21.81 Elo

K = (average error)*[sqrt(n)] =  689.83

Elo interval: ] -10.30,   33.33[
---------------------------------------

Number of games of the match:               1000
Score: 51.65 %
Elo rating difference:   11.47 Elo
Draw ratio: 56.30 %

**********************************************
1 sigma:  1.0439 % of the points of the match.
2 sigma:  2.0878 % of the points of the match.
3 sigma:  3.1318 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:  94.30 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  47 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
It seems that my programme more less agrees with BayesElo in the first match, which is an achievement! However, in the second match, ratting difference is ~ 11.5 Elo, not 15 Elo... which is more likely the error bar. I notice that you are adding two 500-game matches, so I do not know if I can simply use +235 =563 -202 of one 1000-game match or not.

Regards from Spain.

Ajedrecista.
ernest
Posts: 2041
Joined: Wed Mar 08, 2006 8:30 pm

Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Post by ernest »

Ajedrecista wrote:I suppose that all these error bars were obtained with the great BayesElo.
Hi Jesus,

Not at all, I compute these ratings and error bars by hand (sometimes using a hand calculator), using my basic knowledge (school/university) in statistics.
Here we have a trinomial distribution (win-loss-draw) and when the result is close to 50%, you have SD(sigma)=[sqrt(W+L)]/2N (formula is a little more complicated if the result is not close to 50%)
So I find the 2SD error bar (95% probability) of

Code: Select all

P OFF
1 Critter 1.6 64-bit       +23 +125/=283/-92 53.30% 266.5/500 
2 Critter 1.4a 64-bit SSE4 -23 +92/=283/-125 46.70% 233.5/500
to be [sqrt(125+92)]/500 = 2.95%
and multiplying that % by 7 (valid for low %) you get the Elo error bar = 20.6 rounded to 21.

Now if we want to see if Ponder OFF or ON makes a significant difference in a match between Critter 1.6 and Critter 1.4a SSE4, we have to consider the global (sum) distribution

Code: Select all

P ON+OFF
1 Critter 1.6 64-bit       +12 +235/=563/-202 51.65% 516.5/1000 
2 Critter 1.4a 64-bit SSE4 -12 +202/=563/-235 48.35% 483.5/1000 
note: see the +12 Elo advantage, not +15 as written by mistake previously, +15 is actually the 2SD of this global (sum) distribution.
(you were perfectly right with your However, in the second match, ratting difference is ~ 11.5 Elo, not 15 Elo... which is more likely the error bar :) ).

If from this 1000-game distribution you pick a 500-game sample, you expect that sample to have a mean of 51.65% (or +12 Elo) and a SD of sqrt(1000/500)*15/2= 11 Elo
Since the actual P ON sample (50%, 0 Elo) is 12 Elo away from that mean, SD being 11 Elo, that P ON sample does not distinguish itself enough from the P ON+OFF distribution.
Same reasoning for the actual P OFF sample (53.3%, 23 Elo).
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Post by Ajedrecista »

Hi again!
ernest wrote:I compute these ratings and error bars by hand (sometimes using a hand calculator), using my basic knowledge (school/university) in statistics.
I see. I also used to calculate them by hand with the only help of a hand calculator, until I did a programme in Fortran. I use this standard deviation:
n = wins + draws + loses
µ = (wins + draws/2)/n
D = draws/n

σ = sqrt{[µ·(1 - µ) - D/4]/n}
I took this formula from the 22nd post of this thread. I posted two messages in January that might be useful: #1 and #2.
ernest wrote:Here we have a trinomial distribution (win-loss-draw) and when the result is close to 50%, you have SD(sigma)=[sqrt(W+L)]/2N (formula is a little more complicated if the result is not close to 50%)
I did not know that, when µ ~ 0.5, then σ ~ sqrt(wins + loses)/2n in this trinomial distribution. It is interesting, so thank you for share it. Rewriting your standard deviation using the draw ratio D: σ = sqrt[n·(1 - D)]/2n = (1/2)·sqrt[(1 - D)/n]. If I compare our nσ², I obtain:
(Yours): nσ² = (1 - D)/4

(Mine): nσ² = µ·(1 - µ) - D/4; (mine with µ = 0.5): nσ² = (1 - D)/4
Which are exactly the same with µ = 1/2. Your nσ² does not depend on µ, while mine yes... although the expression of σ that I use is not good for µ (or 1 - µ) > 0.85 or 0.9, for saying something. For your info: µ must be in the interval [0.15, 0.85] in my programme, else it does not calculate anything. The farest is µ from 1/2, the less accurate is the value of σ; it also has a problem with the extreme case of D = 1 (100% of draws), when σ = 0. But it is just a model that works reasonably well in real cases.

You will find a value for the average error |<e>| in the post I called #2. This is:
|<e>| = 200·log[(µ + k&#963;)(1 - µ + k&#963;)/(µ - k&#963;)(1 - µ - k&#963;)]
Where k denotes the confidence level (k = 1.96 for ~ 95% confidence, k = 2 for ~ 95.45% confidence, etc.). If I replace µ = 1/2 in that equation:
|<e>| = 200·log[(0.5 + k&#963;)(0.5 + k&#963;)/(0.5 - k&#963;)(0.5 - k&#963;)] = 400·log[(0.5 + k&#963;)/(0.5 - k&#963;)] = [400/ln(10)]·[ln(1 + 2k&#963;) - ln(1 - 2k&#963;)]

With k&#963; > 0 and k&#963; << 1 (lots of games): ln(1 + 2k&#963;) ~ 2k&#963;; ln(1 - 2k&#963;) ~ -2k&#963;

|<e>| ~ 400·4k&#963;/ln(10) = [1600/ln(10)]·k&#963;
Here, &#963; is not in percentage; if you want &#963; in percentage, then the constant that multiplies k&#963; is 16/ln(10) ~ 6.9487, which is almost your seven. This could be valid in my approximation with a normal distribution, although this should be valid only when µ = 0.5 and k&#963; (or &#963;, for reasonable confidence levels, where k is finite) tends to zero, because it is a rough approximation with those assumptions.

If you had not posted the trick of multiplying by seven, I will not realize never about this number 16/ln(10), so today I have learnt something. Thanks!

@Maurizio: There is no intention of hijacking your thread, but you can see that statistics applied to error bars could be a whole world! At least it is boundless for me. Thanks for your comprehension and your tests! Please keep up the good work.

Regards from Spain.

Ajedrecista.
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Post by MM »

Ajedrecista wrote:

@Maurizio: There is no intention of hijacking your thread, but you can see that statistics applied to error bars could be a whole world! At least it is boundless for me. Thanks for your comprehension and your tests! Please keep up the good work.

Regards from Spain.

Ajedrecista.
Hi, i'm only glad of this interest :-) And i'm interested too. Thanks
MM
ernest
Posts: 2041
Joined: Wed Mar 08, 2006 8:30 pm

Re: Critter 1.6 - Critter 1.4a, ponder ON/OFF.

Post by ernest »

Ajedrecista wrote:Hi again!
Hi Jesus,

Thanks for this detailed post, I will study it carefully!

Of course, your program gives more accurate numbers, I only get (not too bad) approximations.

Do you have a comment on my section starting with
Now if we want to see if Ponder OFF or ON makes a significant difference in a match between Critter 1.6 and Critter 1.4a SSE4, we have to consider the global (sum) distribution
which shows that so far (i.e. with only those 500+500 games) the difference is NOT significant?