40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Post by geots »

Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.


Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

10'+10"
Match=1000 games



Code: Select all

Houdini 2.0c x64   +25    +299/-228/=473   53.55%   535.5/1000            
Engine 40x(2)      -25    +228/-299/=473   46.45%   464.5/1000


Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.

I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.



Stay close by Jesus-

george
User avatar
Ajedrecista
Posts: 1966
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: 40x(2) vs. Houdini 2.0c has ended! FINAL RESULTS!

Post by Ajedrecista »

Hello again:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.


Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

10'+10"
Match=1000 games



Code: Select all

Houdini 2.0c x64   +25    +299/-228/=473   53.55%   535.5/1000            
Engine 40x(2)      -25    +228/-299/=473   46.45%   464.5/1000


Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.

I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.



Stay close by Jesus-

george
Good! The title of a Nelly Furtado's song comes to my mind: 'all good things come to an end'. At the end, Houdini was too much against 40x(2). But the final score is more than decent.

I have implemented likelihood of superiority (LOS) in LOS_and_Elo_uncertainties_calculator (which is Elo_uncertainties_calculator with the bonus of calculating LOS, always using the model of standard deviation taken from ImmortalChess Forum). Running from 40x(2) POV:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

228

Write down the number of loses:

299

Write down the number of draws:

473

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -32.70 Elo
Upper rating difference:  -16.74 Elo

Lower bound uncertainty:   -7.99 Elo
Upper bound uncertainty:    7.97 Elo
Average error:        +/-   7.98 Elo

K = (average error)*[sqrt(n)] =  252.33

Elo interval: ] -32.70,  -16.74[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -40.73 Elo
Upper rating difference:   -8.79 Elo

Lower bound uncertainty:  -16.02 Elo
Upper bound uncertainty:   15.92 Elo
Average error:        +/-  15.97 Elo

K = (average error)*[sqrt(n)] =  504.93

Elo interval: ] -40.73,   -8.79[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -48.80 Elo
Upper rating difference:   -0.85 Elo

Lower bound uncertainty:  -24.09 Elo
Upper bound uncertainty:   23.85 Elo
Average error:        +/-  23.97 Elo

K = (average error)*[sqrt(n)] =  758.07

Elo interval: ] -48.80,   -0.85[
---------------------------------------

Number of games of the match:               1000
Score: 46.45 %
Elo rating difference:  -24.71 Elo
Draw ratio: 47.30 %

**********************************************
1 sigma:  1.1423 % of the points of the match.
2 sigma:  2.2846 % of the points of the match.
3 sigma:  3.4270 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:   0.09 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  53 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
So, LOS ~ 99.91% for Houdini. I checked some results in a LOS table hosted at Chessprogramming Wikispace, and every result match with mine. Knowing that, this 0.09% seems correct. This value is verified with my other programme:

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

1000

Write down the draw ratio (in percentage):

47.3

Write down the confidence level (in percentage) between 75% and 99.9%:

99.9

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Calculating...

Theoretical minimum score for no regression: 53.5302 %
Theoretical standard deviation in this case:  3.5302 %

Minimum number of won points for the engine in this match:       535.5 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 24.7095 Elo

End of the calculations. Approximated elapsed time:  513 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
IMHO there is still a long way to catch Houdini 2.0c, but I wish good luck to the mysterious programmer/s of 40x(2) chess engine. I enjoyed a lot with this match.

Regards from Spain.

Ajedrecista.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: 40x(2) vs. Houdini 2.0c has ended! FINAL RESULTS!

Post by geots »

Ajedrecista wrote:Hello again:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.


Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

10'+10"
Match=1000 games



Code: Select all

Houdini 2.0c x64   +25    +299/-228/=473   53.55%   535.5/1000            
Engine 40x(2)      -25    +228/-299/=473   46.45%   464.5/1000


Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.

I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.



Stay close by Jesus-

george
Good! The title of a Nelly Furtado's song comes to my mind: 'all good things come to an end'. At the end, Houdini was too much against 40x(2). But the final score is more than decent.

I have implemented likelihood of superiority (LOS) in LOS_and_Elo_uncertainties_calculator (which is Elo_uncertainties_calculator with the bonus of calculating LOS, always using the model of standard deviation taken from ImmortalChess Forum). Running from 40x(2) POV:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

228

Write down the number of loses:

299

Write down the number of draws:

473

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -32.70 Elo
Upper rating difference:  -16.74 Elo

Lower bound uncertainty:   -7.99 Elo
Upper bound uncertainty:    7.97 Elo
Average error:        +/-   7.98 Elo

K = (average error)*[sqrt(n)] =  252.33

Elo interval: ] -32.70,  -16.74[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -40.73 Elo
Upper rating difference:   -8.79 Elo

Lower bound uncertainty:  -16.02 Elo
Upper bound uncertainty:   15.92 Elo
Average error:        +/-  15.97 Elo

K = (average error)*[sqrt(n)] =  504.93

Elo interval: ] -40.73,   -8.79[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -48.80 Elo
Upper rating difference:   -0.85 Elo

Lower bound uncertainty:  -24.09 Elo
Upper bound uncertainty:   23.85 Elo
Average error:        +/-  23.97 Elo

K = (average error)*[sqrt(n)] =  758.07

Elo interval: ] -48.80,   -0.85[
---------------------------------------

Number of games of the match:               1000
Score: 46.45 %
Elo rating difference:  -24.71 Elo
Draw ratio: 47.30 %

**********************************************
1 sigma:  1.1423 % of the points of the match.
2 sigma:  2.2846 % of the points of the match.
3 sigma:  3.4270 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:   0.09 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  53 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
So, LOS ~ 99.91% for Houdini. I checked some results in a LOS table hosted at Chessprogramming Wikispace, and every result match with mine. Knowing that, this 0.09% seems correct. This value is verified with my other programme:

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

1000

Write down the draw ratio (in percentage):

47.3

Write down the confidence level (in percentage) between 75% and 99.9%:

99.9

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Calculating...

Theoretical minimum score for no regression: 53.5302 %
Theoretical standard deviation in this case:  3.5302 %

Minimum number of won points for the engine in this match:       535.5 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 24.7095 Elo

End of the calculations. Approximated elapsed time:  513 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
IMHO there is still a long way to catch Houdini 2.0c, but I wish good luck to the mysterious programmer/s of 40x(2) chess engine. I enjoyed a lot with this match.

Regards from Spain.

Ajedrecista.


Thank you Jesus for your interest and effort. Stay with me. We shall have bigger fish to fry.

george
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: 40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Post by marcelk »

geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.
Which game did you enjoy most and why?
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: 40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Post by geots »

marcelk wrote:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.
Which game did you enjoy most and why?

I want to ask you if you are serious, but quite possibly it is not a bad question. Problem is this was a time control of 10'+10"- and those can be long games. No one has the time to sit there and watch even 30% of them as they are being played. Not from start to finish.

I watched many more games in my 40/3 repeating matches. But in general I will tell you the ones that are in both that I for sure enjoy the most:

Engine A thinks he has a line that will win- he maybe shows +1.45- but Engine B has found a winning line of his own- and starts at +1.50 and 2 or 3 moves later his "plus figure" continues to rise. By now Engine A sees his line doesn't win and shows 0.00- But it is still a couple more moves before he realizes he is waist deep in quicksand- from which there is no escape. I just love it when his 5 or so thinking lines are all in red!



Hope this will do-

george
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: 40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Post by marcelk »

geots wrote:
marcelk wrote:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.
Which game did you enjoy most and why?

I want to ask you if you are serious, but quite possibly it is not a bad question. Problem is this was a time control of 10'+10"- and those can be long games. No one has the time to sit there and watch even 30% of them as they are being played. Not from start to finish.

I watched many more games in my 40/3 repeating matches. But in general I will tell you the ones that are in both that I for sure enjoy the most:

Engine A thinks he has a line that will win- he maybe shows +1.45- but Engine B has found a winning line of his own- and starts at +1.50 and 2 or 3 moves later his "plus figure" continues to rise. By now Engine A sees his line doesn't win and shows 0.00- But it is still a couple more moves before he realizes he is waist deep in quicksand- from which there is no escape. I just love it when his 5 or so thinking lines are all in red!



Hope this will do-

george
Those games are like curveballs. I like them a lot esp when there are imbalances in material.