Hello again:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)-
The Battle has ended!
40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.
But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.
Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
10'+10"
Match=1000 games
Code: Select all
Houdini 2.0c x64 +25 +299/-228/=473 53.55% 535.5/1000
Engine 40x(2) -25 +228/-299/=473 46.45% 464.5/1000
Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.
I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.
Stay close by Jesus-
george
Good! The title of a Nelly Furtado's song comes to my mind: 'all good things come to an end'. At the end, Houdini was too much against 40x(2). But the final score is more than decent.
I have implemented likelihood of superiority (LOS) in LOS_and_Elo_uncertainties_calculator (which is Elo_uncertainties_calculator with the bonus of calculating LOS, always using the model of standard deviation taken from ImmortalChess Forum). Running from 40x(2) POV:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.
----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------
(The input and output data is referred to the first engine).
Please write down non-negative integers.
Write down the number of wins:
228
Write down the number of loses:
299
Write down the number of draws:
473
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************
---------------------------------------
Elo interval for 1-sigma confidence:
Elo rating difference: -24.71 Elo
Lower rating difference: -32.70 Elo
Upper rating difference: -16.74 Elo
Lower bound uncertainty: -7.99 Elo
Upper bound uncertainty: 7.97 Elo
Average error: +/- 7.98 Elo
K = (average error)*[sqrt(n)] = 252.33
Elo interval: ] -32.70, -16.74[
---------------------------------------
Elo interval for 2-sigma confidence:
Elo rating difference: -24.71 Elo
Lower rating difference: -40.73 Elo
Upper rating difference: -8.79 Elo
Lower bound uncertainty: -16.02 Elo
Upper bound uncertainty: 15.92 Elo
Average error: +/- 15.97 Elo
K = (average error)*[sqrt(n)] = 504.93
Elo interval: ] -40.73, -8.79[
---------------------------------------
Elo interval for 3-sigma confidence:
Elo rating difference: -24.71 Elo
Lower rating difference: -48.80 Elo
Upper rating difference: -0.85 Elo
Lower bound uncertainty: -24.09 Elo
Upper bound uncertainty: 23.85 Elo
Average error: +/- 23.97 Elo
K = (average error)*[sqrt(n)] = 758.07
Elo interval: ] -48.80, -0.85[
---------------------------------------
Number of games of the match: 1000
Score: 46.45 %
Elo rating difference: -24.71 Elo
Draw ratio: 47.30 %
**********************************************
1 sigma: 1.1423 % of the points of the match.
2 sigma: 2.2846 % of the points of the match.
3 sigma: 3.4270 % of the points of the match.
**********************************************
Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.
-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------
LOS: 0.09 %
This value of LOS is rounded up to 0.01%
End of the calculations. Approximated elapsed time: 53 ms.
Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
So, LOS ~ 99.91% for Houdini. I checked some results in a LOS table hosted at Chessprogramming Wikispace, and every result match with mine. Knowing that, this 0.09% seems correct. This value is verified with my other programme:
Code: Select all
Minimum_score_for_no_regression, ® 2012.
Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:
Write down the number of games of the match (it must be a positive integer, up to 1073741823):
1000
Write down the draw ratio (in percentage):
47.3
Write down the confidence level (in percentage) between 75% and 99.9%:
99.9
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:
3
Calculating...
Theoretical minimum score for no regression: 53.5302 %
Theoretical standard deviation in this case: 3.5302 %
Minimum number of won points for the engine in this match: 535.5 points.
Minimum Elo advantage, which is also the negative part of the error bar:
24.7095 Elo
End of the calculations. Approximated elapsed time: 513 ms.
Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
IMHO there is still a long way to catch Houdini 2.0c, but I wish good luck to the mysterious programmer/s of 40x(2) chess engine. I enjoyed a lot with this match.
Regards from Spain.
Ajedrecista.