Why do you care about a result of 9 Elo points for the entity with a deviation of >22 Elo points? Another aspect is that all tune against Houdini. In other words such results are crap. Now they are all desperastely waiting for H3 or even better R5, but Vas would be gepoudert with Wäscheklammertjes if he would deliver.Houdini wrote:A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors.rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too ... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.
Code: Select all
CCRL 404FRC Rating List - All engines, best versions only Rank Engine ELO + - Score AvOp Games [b]1 Critter 1.6 64-bit 3289 +22 -22 [/b] 76.7% -212.8 900 [b]2 Houdini 2.0 64-bit 3280 +18 -18[/b] 69.4% -156.8 1200 3 Stockfish 2.2.2 64-bit 3182 +17 -17 60.2% -81.7 1300 4 Rybka 4 64-bit 3170 +14 -14 61.4% -87.0 1800 5 Naum 4.2 64-bit 3029 +12 -11 48.8% +6.8 3100 6 Shredder 12 3020 +12 -12 45.3% +32.3 2900
After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard!
I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different .
Robert
Komodo 5 release now available!
Moderators: hgm, Rebel, chrisw
-
- Posts: 6081
- Joined: Fri Mar 10, 2006 11:14 pm
- Location: Munster, Nuremberg, Princeton
Re: Komodo 5 release now available!
-Popper and Lakatos are good but I'm stuck on Leibowitz
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
A little off-topic, sorry...
Hello Robert:
Sorry for the long output of my programme; I get around +18 ± 12 Elo for Critter with your provided data (given a reasonable confidence interval and 1920 games (with more less tied scores) and 41% of draws, my error bars should not differ a lot in comparison with BayesElo results, this is why I am confused). The LOS value is also telltale... that value should differ very little if you calculate it with a better programme as BayesElo is. By the way, have you calculated that error bar (± 9 Elo) with BayesElo or by yourself? Thanks in advance.
People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!
Regards from Spain.
Ajedrecista.
May I ask you for the confidence interval of that error bar? For 1920 games and around 41% of draws, I get ± 9 Elo (using my own programme) with a confidence interval of around 86%, which is fairly low IMHO. I get ~ ± 12 Elo for a more common confidence interval of 95%. If I take +616 -516 =788 (which are very close to 1010 - 910 with 41% of draws), this is what I get for 95% confidence:Houdini wrote:A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors.rvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too ... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.
Code: Select all
CCRL 404FRC Rating List - All engines, best versions only Rank Engine ELO + - Score AvOp Games 1 Critter 1.6 64-bit 3289 +22 -22 76.7% -212.8 900 2 Houdini 2.0 64-bit 3280 +18 -18 69.4% -156.8 1200 3 Stockfish 2.2.2 64-bit 3182 +17 -17 60.2% -81.7 1300 4 Rybka 4 64-bit 3170 +14 -14 61.4% -87.0 1800 5 Naum 4.2 64-bit 3029 +12 -11 48.8% +6.8 3100 6 Shredder 12 3020 +12 -12 45.3% +32.3 2900
After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard!
I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different .
Robert
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Maximum number of games supported: 2147483647.
Write down the number of wins (up to 1825361100):
616
Write down the number of loses (up to 1825361100):
516
Write down the number of draws (up to 2147482515):
788
Write down the confidence level (in percentage) between 65% and 99.9% (it will be rounded up to 0.01%):
95
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
---------------------------------------
Elo interval for 95.00 % confidence:
Elo rating difference: 18.11 Elo
Lower rating difference: 6.19 Elo
Upper rating difference: 30.08 Elo
Lower bound uncertainty: -11.92 Elo
Upper bound uncertainty: 11.96 Elo
Average error: +/- 11.94 Elo
K = (average error)*[sqrt(n)] = 523.29
Elo interval: ] 6.19, 30.08[
---------------------------------------
Number of games of the match: 1920
Score: 52.60 %
Elo rating difference: 18.11 Elo
Draw ratio: 41.04 %
*********************************************************
Standard deviation: 1.7133 % of the points of the match.
*********************************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS (taking into account draws) is always calculated, if possible.
LOS (not taking into account draws) is only calculated if wins + loses < 16001.
LOS (average value) is calculated only when LOS (not taking into account draws) is calculated.
______________________________________________
LOS: 99.86 % (taking into account draws).
LOS: 99.85 % (not taking into account draws).
LOS: 99.85 % (average value).
______________________________________________
These values of LOS are rounded up to 0.01%
End of the calculations. Approximated elapsed time: 99 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!
Regards from Spain.
Ajedrecista.
-
- Posts: 9773
- Joined: Wed Mar 08, 2006 8:44 pm
- Location: Amman,Jordan
Re: Komodo 5 release now available!
Rolf wrote:Why do you care about a result of 9 Elo points for the entity with a deviation of >22 Elo points? Another aspect is that all tune against Houdini. In other words such results are crap. Now they are all desperastely waiting for H3 or even better R5, but Vas would be gepoudert with Wäscheklammertjes if he would deliver.Houdini wrote:A late reaction, but I've just run a test match showing that Critter 1.6a indeed appears to be slightly stronger than Houdini 2.0 in FRC - playing without opening book from the initial 960 positions with reversed colors. After the 1920 games at 2'+2", single thread, the match result was 1010-910 for Critter (41% draws), scoring 52.6% or a performance of +18 Elo +/- 9 Elo. Congrats, Richard! I'm now running a similar match against a pre-beta Houdini 3 DEV, results are quite different . Robertrvida wrote:Why not implement ches960 support then? It would surely help to prove or disprove your hypothesis. Btw. looking at CCRL 40/4 FRC list, I might start spreading a hypothesis too ... Also note the 100 elo gap between #2 and #3 (and between #4 - #5). It would be nice if more strong engines supported FRC.Code: Select all
CCRL 404FRC Rating List - All engines, best versions only Rank Engine ELO + - Score AvOp Games [b]1 Critter 1.6 64-bit 3289 +22 -22 [/b] 76.7% -212.8 900 [b]2 Houdini 2.0 64-bit 3280 +18 -18[/b] 69.4% -156.8 1200 3 Stockfish 2.2.2 64-bit 3182 +17 -17 60.2% -81.7 1300 4 Rybka 4 64-bit 3170 +14 -14 61.4% -87.0 1800 5 Naum 4.2 64-bit 3029 +12 -11 48.8% +6.8 3100 6 Shredder 12 3020 +12 -12 45.3% +32.3 2900
Howdy Rolfy....Long time no see
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: A little off-topic, sorry...
My bad, the correct 95% confidence interval is indeed +18 ± 12 Elo.Ajedrecista wrote:Sorry for the long output of my programme; I get around +18 ± 12 Elo for Critter with your provided data (given a reasonable confidence interval and 1920 games (with more less tied scores) and 41% of draws, my error bars should not differ a lot in comparison with BayesElo results, this is why I am confused).
Thanks for the correction!
Robert
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: A little off-topic, sorry...
The match just finished, match result was 1134-786 for Houdini 3 DEV (44% draws), scoring 59.1% or a performance of +64 ± 12 Elo.Ajedrecista wrote:People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!
Regards from Spain.
Ajedrecista.
Robert
-
- Posts: 766
- Joined: Sun Oct 16, 2011 11:25 am
Re: A little off-topic, sorry...
Thank you Robert,Houdini wrote:The match just finished, match result was 1134-786 for Houdini 3 DEV (44% draws), scoring 59.1% or a performance of +64 ± 12 Elo.Ajedrecista wrote:People (including me) will thank you if you post the results of Critter vs. Houdini 3 DEV once this match is finished. Good luck with Houdini and your telescope!
Regards from Spain.
Ajedrecista.
Robert
perhaps i missed something, may i ask you the time control? Is it 2' + 2''?
Thxs
Best Regards
MM
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: A little off-topic, sorry...
Correct, like in the previous match 1920 games at 2'+2", single thread.
-
- Posts: 1187
- Joined: Wed Jan 06, 2010 3:11 pm
Re: A little off-topic, sorry...
Wow, that is very promising result indeed !
Good work Robert
Good work Robert