Komodo 5 release now available!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rolf
Posts: 6081
Joined: Fri Mar 10, 2006 11:14 pm
Location: Munster, Nuremberg, Princeton

Re: Komodo 5 release now available!

Post by Rolf »

Houdini wrote:
rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too :)... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.

Code: Select all

CCRL 404FRC Rating List - All engines, best versions only

Rank           Engine         ELO    +    -   Score  AvOp  Games
 [b]1 Critter 1.6 64-bit         3289  +22  -22 [/b] 76.7% -212.8   900
 [b]2 Houdini 2.0 64-bit         3280  +18  -18[/b]  69.4% -156.8  1200
 3 Stockfish 2.2.2 64-bit     3182  +17  -17  60.2%  -81.7  1300
 4 Rybka 4 64-bit             3170  +14  -14  61.4%  -87.0  1800
 5 Naum 4.2 64-bit            3029  +12  -11  48.8%   +6.8  3100
 6 Shredder 12                3020  +12  -12  45.3%  +32.3  2900
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors.
After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard!

I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different ;).

Robert
Why do you care about a result of 9 Elo points for the entity with a deviation of >22 Elo points? Another aspect is that all tune against Houdini. In other words such results are crap. Now they are all desperastely waiting for H3 or even better R5, but Vas would be gepoudert with Wäscheklammertjes if he would deliver. :wink:
-Popper and Lakatos are good but I'm stuck on Leibowitz
User avatar
Ajedrecista
Posts: 1971
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

A little off-topic, sorry...

Post by Ajedrecista »

Hello Robert:
Houdini wrote:
rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too :)... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.

Code: Select all

CCRL 404FRC Rating List - All engines, best versions only

Rank           Engine         ELO    +    -   Score  AvOp  Games
 1 Critter 1.6 64-bit         3289  +22  -22  76.7% -212.8   900
 2 Houdini 2.0 64-bit         3280  +18  -18  69.4% -156.8  1200
 3 Stockfish 2.2.2 64-bit     3182  +17  -17  60.2%  -81.7  1300
 4 Rybka 4 64-bit             3170  +14  -14  61.4%  -87.0  1800
 5 Naum 4.2 64-bit            3029  +12  -11  48.8%   +6.8  3100
 6 Shredder 12                3020  +12  -12  45.3%  +32.3  2900
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors.
After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard!

I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different ;).

Robert
May I ask you for the confidence interval of that error bar? For 1920 games and around 41% of draws, I get ± 9 Elo (using my own programme) with a confidence interval of around 86%, which is fairly low IMHO. I get ~ ± 12 Elo for a more common confidence interval of 95%. If I take +616 -516 =788 (which are very close to 1010 - 910 with 41% of draws), this is what I get for 95% confidence:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Maximum number of games supported: 2147483647.

Write down the number of wins (up to 1825361100):

616

Write down the number of loses (up to 1825361100):

516

Write down the number of draws (up to 2147482515):

788

 Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

---------------------------------------
Elo interval for 95.00 % confidence:

Elo rating difference:     18.11 Elo

Lower rating difference:    6.19 Elo
Upper rating difference:   30.08 Elo

Lower bound uncertainty:  -11.92 Elo
Upper bound uncertainty:   11.96 Elo
Average error:        +/-  11.94 Elo

K = (average error)*[sqrt(n)] =  523.29

Elo interval: ]   6.19,   30.08[
---------------------------------------

Number of games of the match:      1920
Score: 52.60 %
Elo rating difference:   18.11 Elo
Draw ratio: 41.04 %

*********************************************************
Standard deviation:  1.7133 % of the points of the match.
*********************************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS (taking into account draws) is always calculated, if possible.

LOS &#40;not taking into account draws&#41; is only calculated if wins + loses < 16001.

LOS &#40;average value&#41; is calculated only when LOS &#40;not taking into account draws&#41; is calculated.
______________________________________________

LOS&#58;  99.86 % &#40;taking into account draws&#41;.
LOS&#58;  99.85 % &#40;not taking into account draws&#41;.
LOS&#58;  99.85 % &#40;average value&#41;.
______________________________________________

These values of LOS are rounded up to 0.01%

End of the calculations. Approximated elapsed time&#58;   99 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
Sorry for the long output of my programme; I get around +18 ± 12 Elo for Critter with your provided data (given a reasonable confidence interval and 1920 games (with more less tied scores) and 41% of draws, my error bars should not differ a lot in comparison with BayesElo results, this is why I am confused). The LOS value is also telltale... that value should differ very little if you calculate it with a better programme as BayesElo is. By the way, have you calculated that error bar (± 9 Elo) with BayesElo or by yourself? Thanks in advance.

People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!

Regards from Spain.

Ajedrecista.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Komodo 5 release now available!

Post by Dr.Wael Deeb »

Rolf wrote:
Houdini wrote:
rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too :)... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.

Code: Select all

CCRL 404FRC Rating List - All engines, best versions only Rank Engine ELO + - Score AvOp Games &#91;b&#93;1 Critter 1.6 64-bit 3289 +22 -22 &#91;/b&#93; 76.7% -212.8 900 &#91;b&#93;2 Houdini 2.0 64-bit 3280 +18 -18&#91;/b&#93; 69.4% -156.8 1200 3 Stockfish 2.2.2 64-bit 3182 +17 -17 60.2% -81.7 1300 4 Rybka 4 64-bit 3170 +14 -14 61.4% -87.0 1800 5 Naum 4.2 64-bit 3029 +12 -11 48.8% +6.8 3100 6 Shredder 12 3020 +12 -12 45.3% +32.3 2900 
A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors. After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard! I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different ;). Robert
Why do you care about a result of 9 Elo points for the entity with a deviation of >22 Elo points? Another aspect is that all tune against Houdini. In other words such results are crap. Now they are all desperastely waiting for H3 or even better R5, but Vas would be gepoudert with Wäscheklammertjes if he would deliver. :wink:


Howdy Rolfy....Long time no see :D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: A little off-topic, sorry...

Post by Houdini »

Ajedrecista wrote:Sorry for the long output of my programme; I get around +18 ± 12 Elo for Critter with your provided data (given a reasonable confidence interval and 1920 games (with more less tied scores) and 41% of draws, my error bars should not differ a lot in comparison with BayesElo results, this is why I am confused).
My bad, the correct 95% confidence interval is indeed +18 ± 12 Elo.
Thanks for the correction!

Robert
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: A little off-topic, sorry...

Post by Houdini »

Ajedrecista wrote:People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!

Regards from Spain.

Ajedrecista.
The match just finished, match result was 1134-786 for Houdini 3 DEV (44% draws), scoring 59.1% or a performance of +64 ± 12 Elo.

Robert
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: A little off-topic, sorry...

Post by MM »

Houdini wrote:
Ajedrecista wrote:People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!

Regards from Spain.

Ajedrecista.
The match just finished, match result was 1134-786 for Houdini 3 DEV (44% draws), scoring 59.1% or a performance of +64 ± 12 Elo.

Robert
Thank you Robert,

perhaps i missed something, may i ask you the time control? Is it 2' + 2''?

Thxs

Best Regards
MM
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: A little off-topic, sorry...

Post by Houdini »

Correct, like in the previous match 1920 games at 2'+2", single thread.
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: A little off-topic, sorry...

Post by beram »

Wow, that is very promising result indeed !

Good work Robert