Page 1 of 1

40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Posted: Mon Jun 18, 2012 9:21 am
by geots
Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.


Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

10'+10"
Match=1000 games



Code: Select all

Houdini 2.0c x64   +25    +299/-228/=473   53.55%   535.5/1000            
Engine 40x(2)      -25    +228/-299/=473   46.45%   464.5/1000


Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.

I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.



Stay close by Jesus-

george

Re: 40x(2) vs. Houdini 2.0c has ended! FINAL RESULTS!

Posted: Mon Jun 18, 2012 5:37 pm
by Ajedrecista
Hello again:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.


Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

10'+10"
Match=1000 games



Code: Select all

Houdini 2.0c x64   +25    +299/-228/=473   53.55%   535.5/1000            
Engine 40x(2)      -25    +228/-299/=473   46.45%   464.5/1000


Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.

I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.



Stay close by Jesus-

george
Good! The title of a Nelly Furtado's song comes to my mind: 'all good things come to an end'. At the end, Houdini was too much against 40x(2). But the final score is more than decent.

I have implemented likelihood of superiority (LOS) in LOS_and_Elo_uncertainties_calculator (which is Elo_uncertainties_calculator with the bonus of calculating LOS, always using the model of standard deviation taken from ImmortalChess Forum). Running from 40x(2) POV:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

228

Write down the number of loses:

299

Write down the number of draws:

473

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -32.70 Elo
Upper rating difference:  -16.74 Elo

Lower bound uncertainty:   -7.99 Elo
Upper bound uncertainty:    7.97 Elo
Average error:        +/-   7.98 Elo

K = (average error)*[sqrt(n)] =  252.33

Elo interval: ] -32.70,  -16.74[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -40.73 Elo
Upper rating difference:   -8.79 Elo

Lower bound uncertainty:  -16.02 Elo
Upper bound uncertainty:   15.92 Elo
Average error:        +/-  15.97 Elo

K = (average error)*[sqrt(n)] =  504.93

Elo interval: ] -40.73,   -8.79[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -48.80 Elo
Upper rating difference:   -0.85 Elo

Lower bound uncertainty:  -24.09 Elo
Upper bound uncertainty:   23.85 Elo
Average error:        +/-  23.97 Elo

K = (average error)*[sqrt(n)] =  758.07

Elo interval: ] -48.80,   -0.85[
---------------------------------------

Number of games of the match:               1000
Score: 46.45 %
Elo rating difference:  -24.71 Elo
Draw ratio: 47.30 %

**********************************************
1 sigma:  1.1423 % of the points of the match.
2 sigma:  2.2846 % of the points of the match.
3 sigma:  3.4270 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:   0.09 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  53 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
So, LOS ~ 99.91% for Houdini. I checked some results in a LOS table hosted at Chessprogramming Wikispace, and every result match with mine. Knowing that, this 0.09% seems correct. This value is verified with my other programme:

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

1000

Write down the draw ratio (in percentage):

47.3

Write down the confidence level (in percentage) between 75% and 99.9%:

99.9

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Calculating...

Theoretical minimum score for no regression: 53.5302 %
Theoretical standard deviation in this case:  3.5302 %

Minimum number of won points for the engine in this match:       535.5 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 24.7095 Elo

End of the calculations. Approximated elapsed time:  513 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
IMHO there is still a long way to catch Houdini 2.0c, but I wish good luck to the mysterious programmer/s of 40x(2) chess engine. I enjoyed a lot with this match.

Regards from Spain.

Ajedrecista.

Re: 40x(2) vs. Houdini 2.0c has ended! FINAL RESULTS!

Posted: Mon Jun 18, 2012 11:27 pm
by geots
Ajedrecista wrote:Hello again:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.


Intel i5 w/4TCs
Fritz 11 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

10'+10"
Match=1000 games



Code: Select all

Houdini 2.0c x64   +25    +299/-228/=473   53.55%   535.5/1000            
Engine 40x(2)      -25    +228/-299/=473   46.45%   464.5/1000


Don't get me wrong- I enjoyed running the match. But you don't really think about how that extra 10 seconds per move can drag things out for what seems like forever when running a 50 game match or similar.

I have an Ivanhoe version I received by email- a test version that I am interested in. And what better way to find out than to run him against a strong something or other. Maybe ag. one of the engines that played all the Ivanhoes when they were being rated. Maybe 40x- the original version- which beat 13 0f 15 Ivanhoes it faced. Sounds good to me. At the same time I will try to see if I can talk this guy out of something a bit stronger than 40x(2). I know damn well he has it. We shall see.



Stay close by Jesus-

george
Good! The title of a Nelly Furtado's song comes to my mind: 'all good things come to an end'. At the end, Houdini was too much against 40x(2). But the final score is more than decent.

I have implemented likelihood of superiority (LOS) in LOS_and_Elo_uncertainties_calculator (which is Elo_uncertainties_calculator with the bonus of calculating LOS, always using the model of standard deviation taken from ImmortalChess Forum). Running from 40x(2) POV:

Code: Select all

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

228

Write down the number of loses:

299

Write down the number of draws:

473

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -32.70 Elo
Upper rating difference:  -16.74 Elo

Lower bound uncertainty:   -7.99 Elo
Upper bound uncertainty:    7.97 Elo
Average error:        +/-   7.98 Elo

K = (average error)*[sqrt(n)] =  252.33

Elo interval: ] -32.70,  -16.74[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -40.73 Elo
Upper rating difference:   -8.79 Elo

Lower bound uncertainty:  -16.02 Elo
Upper bound uncertainty:   15.92 Elo
Average error:        +/-  15.97 Elo

K = (average error)*[sqrt(n)] =  504.93

Elo interval: ] -40.73,   -8.79[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:    -24.71 Elo

Lower rating difference:  -48.80 Elo
Upper rating difference:   -0.85 Elo

Lower bound uncertainty:  -24.09 Elo
Upper bound uncertainty:   23.85 Elo
Average error:        +/-  23.97 Elo

K = (average error)*[sqrt(n)] =  758.07

Elo interval: ] -48.80,   -0.85[
---------------------------------------

Number of games of the match:               1000
Score: 46.45 %
Elo rating difference:  -24.71 Elo
Draw ratio: 47.30 %

**********************************************
1 sigma:  1.1423 % of the points of the match.
2 sigma:  2.2846 % of the points of the match.
3 sigma:  3.4270 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:   0.09 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  53 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.
So, LOS ~ 99.91% for Houdini. I checked some results in a LOS table hosted at Chessprogramming Wikispace, and every result match with mine. Knowing that, this 0.09% seems correct. This value is verified with my other programme:

Code: Select all

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

1000

Write down the draw ratio (in percentage):

47.3

Write down the confidence level (in percentage) between 75% and 99.9%:

99.9

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Calculating...

Theoretical minimum score for no regression: 53.5302 %
Theoretical standard deviation in this case:  3.5302 %

Minimum number of won points for the engine in this match:       535.5 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 24.7095 Elo

End of the calculations. Approximated elapsed time:  513 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.
IMHO there is still a long way to catch Houdini 2.0c, but I wish good luck to the mysterious programmer/s of 40x(2) chess engine. I enjoyed a lot with this match.

Regards from Spain.

Ajedrecista.


Thank you Jesus for your interest and effort. Stay with me. We shall have bigger fish to fry.

george

Re: 40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Posted: Tue Jun 19, 2012 12:57 am
by marcelk
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.
Which game did you enjoy most and why?

Re: 40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Posted: Tue Jun 19, 2012 3:34 am
by geots
marcelk wrote:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.
Which game did you enjoy most and why?

I want to ask you if you are serious, but quite possibly it is not a bad question. Problem is this was a time control of 10'+10"- and those can be long games. No one has the time to sit there and watch even 30% of them as they are being played. Not from start to finish.

I watched many more games in my 40/3 repeating matches. But in general I will tell you the ones that are in both that I for sure enjoy the most:

Engine A thinks he has a line that will win- he maybe shows +1.45- but Engine B has found a winning line of his own- and starts at +1.50 and 2 or 3 moves later his "plus figure" continues to rise. By now Engine A sees his line doesn't win and shows 0.00- But it is still a couple more moves before he realizes he is waist deep in quicksand- from which there is no escape. I just love it when his 5 or so thinking lines are all in red!



Hope this will do-

george

Re: 40x(2) v Houdini 2.0c Has Ended! FINAL RESULTS!

Posted: Tue Jun 19, 2012 11:53 pm
by marcelk
geots wrote:
marcelk wrote:
geots wrote:Houdini 2.0c x64 vs. Engine 40x(2)- The Battle has ended!

40x(2) held his own thru game 718. And had actually cut Houdini's lead from 49 games to 39 games. From there to game 880- Houdini was able to make those 10 games back up.

But games 881 thru 1000 Houdini went on one of the runs that, well.... make him Houdini. What can I say. He stretched a 49 game lead, by the conclusion- to 71 games. It is what it is.
Which game did you enjoy most and why?

I want to ask you if you are serious, but quite possibly it is not a bad question. Problem is this was a time control of 10'+10"- and those can be long games. No one has the time to sit there and watch even 30% of them as they are being played. Not from start to finish.

I watched many more games in my 40/3 repeating matches. But in general I will tell you the ones that are in both that I for sure enjoy the most:

Engine A thinks he has a line that will win- he maybe shows +1.45- but Engine B has found a winning line of his own- and starts at +1.50 and 2 or 3 moves later his "plus figure" continues to rise. By now Engine A sees his line doesn't win and shows 0.00- But it is still a couple more moves before he realizes he is waist deep in quicksand- from which there is no escape. I just love it when his 5 or so thinking lines are all in red!



Hope this will do-

george
Those games are like curveballs. I like them a lot esp when there are imbalances in material.