TalkChess.com

Posted: **Mon Jun 18, 2012 9:21 am**

Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.

Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
10'+10"
Match=1000 games

Code: Select all

Houdini 2.0c x64   +25    +299/-228/=473   53.55%   535.5/1000            
Engine 40x&#40;2&#41;      -25    +228/-299/=473   46.45%   464.5/1000

Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.

I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.

Stay close by Jesus-

george

Posted: **Mon Jun 18, 2012 5:37 pm**

Hello again:

geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.

Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
10'+10"
Match=1000 games
Code: Select all
Houdini 2.0c x64   +25    +299/-228/=473   53.55%   535.5/1000            
Engine 40x&#40;2&#41;      -25    +228/-299/=473   46.45%   464.5/1000
Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.

I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.

Stay close by Jesus-

george

Good! The title of a Nelly Furtado's song comes to my mind: 'all good things come to an end'. At the end, Houdini was too much against 40x(2). But the final score is more than decent.

I have implemented likelihood of superiority (LOS) in LOS_and_Elo_uncertainties_calculator (which is Elo_uncertainties_calculator with the bonus of calculating LOS, always using the model of standard deviation taken from ImmortalChess Forum). Running from 40x(2) POV:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines&#58;
----------------------------------------------------------------

&#40;The input and output data is referred to the first engine&#41;.

Please write down non-negative integers.

Write down the number of wins&#58;

228

Write down the number of loses&#58;

299

Write down the number of draws&#58;

473

Write down the clock rate of the CPU &#40;in GHz&#41;, only for timing the elapsed time of the calculations&#58;

3

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence&#58;

Elo rating difference&#58;    -24.71 Elo

Lower rating difference&#58;  -32.70 Elo
Upper rating difference&#58;  -16.74 Elo

Lower bound uncertainty&#58;   -7.99 Elo
Upper bound uncertainty&#58;    7.97 Elo
Average error&#58;        +/-   7.98 Elo

K = &#40;average error&#41;*&#91;sqrt&#40;n&#41;&#93; =  252.33

Elo interval&#58; &#93; -32.70,  -16.74&#91;
---------------------------------------

Elo interval for 2-sigma confidence&#58;

Elo rating difference&#58;    -24.71 Elo

Lower rating difference&#58;  -40.73 Elo
Upper rating difference&#58;   -8.79 Elo

Lower bound uncertainty&#58;  -16.02 Elo
Upper bound uncertainty&#58;   15.92 Elo
Average error&#58;        +/-  15.97 Elo

K = &#40;average error&#41;*&#91;sqrt&#40;n&#41;&#93; =  504.93

Elo interval&#58; &#93; -40.73,   -8.79&#91;
---------------------------------------

Elo interval for 3-sigma confidence&#58;

Elo rating difference&#58;    -24.71 Elo

Lower rating difference&#58;  -48.80 Elo
Upper rating difference&#58;   -0.85 Elo

Lower bound uncertainty&#58;  -24.09 Elo
Upper bound uncertainty&#58;   23.85 Elo
Average error&#58;        +/-  23.97 Elo

K = &#40;average error&#41;*&#91;sqrt&#40;n&#41;&#93; =  758.07

Elo interval&#58; &#93; -48.80,   -0.85&#91;
---------------------------------------

Number of games of the match&#58;               1000
Score&#58; 46.45 %
Elo rating difference&#58;  -24.71 Elo
Draw ratio&#58; 47.30 %

**********************************************
1 sigma&#58;  1.1423 % of the points of the match.
2 sigma&#58;  2.2846 % of the points of the match.
3 sigma&#58;  3.4270 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority &#40;LOS&#41; in a one-sided test&#58;
-------------------------------------------------------------------

LOS&#58;   0.09 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time&#58;  53 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.

So, LOS ~ 99.91% for Houdini. I checked some results in a LOS table hosted at Chessprogramming Wikispace, and every result match with mine. Knowing that, this 0.09% seems correct. This value is verified with my other programme:

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression &#40;i.e. negative Elo gain&#41; in a match between two engines&#58;

 Write down the number of games of the match &#40;it must be a positive integer, up to 1073741823&#41;&#58;

1000

Write down the draw ratio &#40;in percentage&#41;&#58;

47.3

Write down the confidence level &#40;in percentage&#41; between 75% and 99.9%&#58;

99.9

Write down the clock rate of the CPU &#40;in GHz&#41;, only for timing the elapsed time of the calculations&#58;

3

Calculating...

Theoretical minimum score for no regression&#58; 53.5302 %
Theoretical standard deviation in this case&#58;  3.5302 %

Minimum number of won points for the engine in this match&#58;       535.5 points.

Minimum Elo advantage, which is also the negative part of the error bar&#58;
 24.7095 Elo

End of the calculations. Approximated elapsed time&#58;  513 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.

IMHO there is still a long way to catch Houdini 2.0c, but I wish good luck to the mysterious programmer/s of 40x(2) chess engine. I enjoyed a lot with this match.

Regards from Spain.

Ajedrecista.

Posted: **Mon Jun 18, 2012 11:27 pm**

Ajedrecista wrote:Hello again:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.

Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit
10'+10"
Match=1000 games
Code: Select all
Houdini 2.0c x64   +25    +299/-228/=473   53.55%   535.5/1000            
Engine 40x&#40;2&#41;      -25    +228/-299/=473   46.45%   464.5/1000
Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.

I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.

Stay close by Jesus-

george
Good! The title of a Nelly Furtado's song comes to my mind: 'all good things come to an end'. At the end, Houdini was too much against 40x(2). But the final score is more than decent.

I have implemented likelihood of superiority (LOS) in LOS_and_Elo_uncertainties_calculator (which is Elo_uncertainties_calculator with the bonus of calculating LOS, always using the model of standard deviation taken from ImmortalChess Forum). Running from 40x(2) POV:
Code: Select all
LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines&#58;
----------------------------------------------------------------

&#40;The input and output data is referred to the first engine&#41;.

Please write down non-negative integers.

Write down the number of wins&#58;

228

Write down the number of loses&#58;

299

Write down the number of draws&#58;

473

Write down the clock rate of the CPU &#40;in GHz&#41;, only for timing the elapsed time of the calculations&#58;

3

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence&#58;

Elo rating difference&#58;    -24.71 Elo

Lower rating difference&#58;  -32.70 Elo
Upper rating difference&#58;  -16.74 Elo

Lower bound uncertainty&#58;   -7.99 Elo
Upper bound uncertainty&#58;    7.97 Elo
Average error&#58;        +/-   7.98 Elo

K = &#40;average error&#41;*&#91;sqrt&#40;n&#41;&#93; =  252.33

Elo interval&#58; &#93; -32.70,  -16.74&#91;
---------------------------------------

Elo interval for 2-sigma confidence&#58;

Elo rating difference&#58;    -24.71 Elo

Lower rating difference&#58;  -40.73 Elo
Upper rating difference&#58;   -8.79 Elo

Lower bound uncertainty&#58;  -16.02 Elo
Upper bound uncertainty&#58;   15.92 Elo
Average error&#58;        +/-  15.97 Elo

K = &#40;average error&#41;*&#91;sqrt&#40;n&#41;&#93; =  504.93

Elo interval&#58; &#93; -40.73,   -8.79&#91;
---------------------------------------

Elo interval for 3-sigma confidence&#58;

Elo rating difference&#58;    -24.71 Elo

Lower rating difference&#58;  -48.80 Elo
Upper rating difference&#58;   -0.85 Elo

Lower bound uncertainty&#58;  -24.09 Elo
Upper bound uncertainty&#58;   23.85 Elo
Average error&#58;        +/-  23.97 Elo

K = &#40;average error&#41;*&#91;sqrt&#40;n&#41;&#93; =  758.07

Elo interval&#58; &#93; -48.80,   -0.85&#91;
---------------------------------------

Number of games of the match&#58;               1000
Score&#58; 46.45 %
Elo rating difference&#58;  -24.71 Elo
Draw ratio&#58; 47.30 %

**********************************************
1 sigma&#58;  1.1423 % of the points of the match.
2 sigma&#58;  2.2846 % of the points of the match.
3 sigma&#58;  3.4270 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority &#40;LOS&#41; in a one-sided test&#58;
-------------------------------------------------------------------

LOS&#58;   0.09 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time&#58;  53 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
So, LOS ~ 99.91% for Houdini. I checked some results in a LOS table hosted at Chessprogramming Wikispace, and every result match with mine. Knowing that, this 0.09% seems correct. This value is verified with my other programme:
Code: Select all
Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression &#40;i.e. negative Elo gain&#41; in a match between two engines&#58;

 Write down the number of games of the match &#40;it must be a positive integer, up to 1073741823&#41;&#58;

1000

Write down the draw ratio &#40;in percentage&#41;&#58;

47.3

Write down the confidence level &#40;in percentage&#41; between 75% and 99.9%&#58;

99.9

Write down the clock rate of the CPU &#40;in GHz&#41;, only for timing the elapsed time of the calculations&#58;

3

Calculating...

Theoretical minimum score for no regression&#58; 53.5302 %
Theoretical standard deviation in this case&#58;  3.5302 %

Minimum number of won points for the engine in this match&#58;       535.5 points.

Minimum Elo advantage, which is also the negative part of the error bar&#58;
 24.7095 Elo

End of the calculations. Approximated elapsed time&#58;  513 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
IMHO there is still a long way to catch Houdini 2.0c, but I wish good luck to the mysterious programmer/s of 40x(2) chess engine. I enjoyed a lot with this match.

Regards from Spain.

Ajedrecista.

Thank you Jesus for your interest and effort. Stay with me. We shall have bigger fish to fry.

george

Posted: **Tue Jun 19, 2012 12:57 am**

geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.

Which game did you enjoy most and why?

Posted: **Tue Jun 19, 2012 3:34 am**

marcelk wrote:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.
Which game did you enjoy most and why?

I want to ask you if you are serious, but quite possibly it is not a bad question. Problem is this was a time control of 10'+10"- and those can be long games. No one has the time to sit there and watch even 30% of them as they are being played. Not from start to finish.

I watched many more games in my 40/3 repeating matches. But in general I will tell you the ones that are in both that I for sure enjoy the most:

Engine A thinks he has a line that will win- he maybe shows +1.45- but Engine B has found a winning line of his own- and starts at +1.50 and 2 or 3 moves later his "plus figure" continues to rise. By now Engine A sees his line doesn't win and shows 0.00- But it is still a couple more moves before he realizes he is waist deep in quicksand- from which there is no escape. I just love it when his 5 or so thinking lines are all in red!

Hope this will do-

george

Posted: **Tue Jun 19, 2012 11:53 pm**

geots wrote:
marcelk wrote:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.
Which game did you enjoy most and why?

I want to ask you if you are serious, but quite possibly it is not a bad question. Problem is this was a time control of 10'+10"- and those can be long games. No one has the time to sit there and watch even 30% of them as they are being played. Not from start to finish.

I watched many more games in my 40/3 repeating matches. But in general I will tell you the ones that are in both that I for sure enjoy the most:

Engine A thinks he has a line that will win- he maybe shows +1.45- but Engine B has found a winning line of his own- and starts at +1.50 and 2 or 3 moves later his "plus figure" continues to rise. By now Engine A sees his line doesn't win and shows 0.00- But it is still a couple more moves before he realizes he is waist deep in quicksand- from which there is no escape. I just love it when his 5 or so thinking lines are all in red!

Hope this will do-

george

Those games are like curveballs. I like them a lot esp when there are imbalances in material.

TalkChess.com

40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Re: 40x(2) vs. Houdini 2.0c has ended! FINAL RESULTS!

Re: 40x(2) vs. Houdini 2.0c has ended! FINAL RESULTS!

Re: 40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Re: 40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Re: 40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!