Komodo 5 release now available!

Rolf · Post by **Rolf** » Mon Jul 30, 2012 1:15 pm

Houdini wrote:
rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too ... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.
Code: Select all
CCRL 404FRC Rating List - All engines, best versions only

Rank           Engine         ELO    +    -   Score  AvOp  Games
 &#91;b&#93;1 Critter 1.6 64-bit         3289  +22  -22 &#91;/b&#93; 76.7% -212.8   900
 &#91;b&#93;2 Houdini 2.0 64-bit         3280  +18  -18&#91;/b&#93;  69.4% -156.8  1200
 3 Stockfish 2.2.2 64-bit     3182  +17  -17  60.2%  -81.7  1300
 4 Rybka 4 64-bit             3170  +14  -14  61.4%  -87.0  1800
 5 Naum 4.2 64-bit            3029  +12  -11  48.8%   +6.8  3100
 6 Shredder 12                3020  +12  -12  45.3%  +32.3  2900
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors.
After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard!

I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different .

Robert

Why do you care about a result of 9 Elo points for the entity with a deviation of >22 Elo points? Another aspect is that all tune against Houdini. In other words such results are crap. Now they are all desperastely waiting for H3 or even better R5, but Vas would be gepoudert with Wäscheklammertjes if he would deliver.

Ajedrecista · Post by **Ajedrecista** » Mon Jul 30, 2012 1:31 pm

Hello Robert:

Houdini wrote:
rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too ... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.
Code: Select all
CCRL 404FRC Rating List - All engines, best versions only

Rank           Engine         ELO    +    -   Score  AvOp  Games
 1 Critter 1.6 64-bit         3289  +22  -22  76.7% -212.8   900
 2 Houdini 2.0 64-bit         3280  +18  -18  69.4% -156.8  1200
 3 Stockfish 2.2.2 64-bit     3182  +17  -17  60.2%  -81.7  1300
 4 Rybka 4 64-bit             3170  +14  -14  61.4%  -87.0  1800
 5 Naum 4.2 64-bit            3029  +12  -11  48.8%   +6.8  3100
 6 Shredder 12                3020  +12  -12  45.3%  +32.3  2900
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors.
After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard!

I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different .

Robert

May I ask you for the confidence interval of that error bar? For 1920 games and around 41% of draws, I get ± 9 Elo (using my own programme) with a confidence interval of around 86%, which is fairly low IMHO. I get ~ ± 12 Elo for a more common confidence interval of 95%. If I take +616 -516 =788 (which are very close to 1010 - 910 with 41% of draws), this is what I get for 95% confidence:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines&#58;
----------------------------------------------------------------

&#40;The input and output data is referred to the first engine&#41;.

Please write down non-negative integers.

Maximum number of games supported&#58; 2147483647.

Write down the number of wins &#40;up to 1825361100&#41;&#58;

616

Write down the number of loses &#40;up to 1825361100&#41;&#58;

516

Write down the number of draws &#40;up to 2147482515&#41;&#58;

788

 Write down the confidence level &#40;in percentage&#41; between 65% and 99.9% &#40;it will be rounded up to 0.01%)&#58;

95

Write down the clock rate of the CPU &#40;in GHz&#41;, only for timing the elapsed time of the calculations&#58;

3

---------------------------------------
Elo interval for 95.00 % confidence&#58;

Elo rating difference&#58;     18.11 Elo

Lower rating difference&#58;    6.19 Elo
Upper rating difference&#58;   30.08 Elo

Lower bound uncertainty&#58;  -11.92 Elo
Upper bound uncertainty&#58;   11.96 Elo
Average error&#58;        +/-  11.94 Elo

K = &#40;average error&#41;*&#91;sqrt&#40;n&#41;&#93; =  523.29

Elo interval&#58; &#93;   6.19,   30.08&#91;
---------------------------------------

Number of games of the match&#58;      1920
Score&#58; 52.60 %
Elo rating difference&#58;   18.11 Elo
Draw ratio&#58; 41.04 %

*********************************************************
Standard deviation&#58;  1.7133 % of the points of the match.
*********************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority &#40;LOS&#41; in a one-sided test&#58;
-------------------------------------------------------------------

LOS &#40;taking into account draws&#41; is always calculated, if possible.

LOS &#40;not taking into account draws&#41; is only calculated if wins + loses < 16001.

LOS &#40;average value&#41; is calculated only when LOS &#40;not taking into account draws&#41; is calculated.
______________________________________________

LOS&#58;  99.86 % &#40;taking into account draws&#41;.
LOS&#58;  99.85 % &#40;not taking into account draws&#41;.
LOS&#58;  99.85 % &#40;average value&#41;.
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time&#58;   99 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.

Sorry for the long output of my programme; I get around +18 ± 12 Elo for Critter with your provided data (given a reasonable confidence interval and 1920 games (with more less tied scores) and 41% of draws, my error bars should not differ a lot in comparison with BayesElo results, this is why I am confused). The LOS value is also telltale... that value should differ very little if you calculate it with a better programme as BayesElo is. By the way, have you calculated that error bar (± 9 Elo) with BayesElo or by yourself? Thanks in advance.

People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!

Regards from Spain.

Ajedrecista.

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Mon Jul 30, 2012 1:39 pm

Rolf wrote:
Houdini wrote:
rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too ... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.
Code: Select all
CCRL 404FRC Rating List - All engines, best versions only Rank Engine ELO + - Score AvOp Games &#91;b&#93;1 Critter 1.6 64-bit 3289 +22 -22 &#91;/b&#93; 76.7% -212.8 900 &#91;b&#93;2 Houdini 2.0 64-bit 3280 +18 -18&#91;/b&#93; 69.4% -156.8 1200 3 Stockfish 2.2.2 64-bit 3182 +17 -17 60.2% -81.7 1300 4 Rybka 4 64-bit 3170 +14 -14 61.4% -87.0 1800 5 Naum 4.2 64-bit 3029 +12 -11 48.8% +6.8 3100 6 Shredder 12 3020 +12 -12 45.3% +32.3 2900 
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors. After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard! I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different . Robert
Why do you care about a result of 9 Elo points for the entity with a deviation of >22 Elo points? Another aspect is that all tune against Houdini. In other words such results are crap. Now they are all desperastely waiting for H3 or even better R5, but Vas would be gepoudert with Wäscheklammertjes if he would deliver.

Howdy Rolfy....Long time no see

Houdini · Post by **Houdini** » Mon Jul 30, 2012 1:45 pm

Ajedrecista wrote:Sorry for the long output of my programme; I get around +18 ± 12 Elo for Critter with your provided data (given a reasonable confidence interval and 1920 games (with more less tied scores) and 41% of draws, my error bars should not differ a lot in comparison with BayesElo results, this is why I am confused).

My bad, the correct 95% confidence interval is indeed +18 ± 12 Elo.
Thanks for the correction!

Robert

Houdini · Post by **Houdini** » Mon Jul 30, 2012 5:46 pm

Ajedrecista wrote:People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!

Regards from Spain.

Ajedrecista.

The match just finished, match result was 1134-786 for Houdini 3 DEV (44% draws), scoring 59.1% or a performance of +64 ± 12 Elo.

Robert

MM · Post by MM » Mon Jul 30, 2012 9:15 pm

Houdini wrote:
Ajedrecista wrote:People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!

Regards from Spain.

Ajedrecista.
The match just finished, match result was 1134-786 for Houdini 3 DEV (44% draws), scoring 59.1% or a performance of +64 ± 12 Elo.

Robert

Thank you Robert,

perhaps i missed something, may i ask you the time control? Is it 2' + 2''?

Thxs

Best Regards

Houdini · Post by **Houdini** » Mon Jul 30, 2012 10:07 pm

Correct, like in the previous match 1920 games at 2'+2", single thread.

beram · Post by **beram** » Mon Jul 30, 2012 11:25 pm

Wow, that is very promising result indeed !

Good work Robert

Komodo 5 release now available!

Re: Komodo 5 release now available!

A little off-topic, sorry...

Re: Komodo 5 release now available!

Re: A little off-topic, sorry...

Re: A little off-topic, sorry...

Re: A little off-topic, sorry...

Re: A little off-topic, sorry...

Re: A little off-topic, sorry...