ChessUSA.com TalkChess.com
Hosted by Your Move Chess & Games
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Houdini & Rainbow Ltd.- beta 2: TOE TO TOE!
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Tournaments and Matches Flat
View previous topic :: View next topic  
Author Message
Jesús Muñoz



Joined: 13 Jul 2011
Posts: 690
Location: Madrid, Spain.

PostPost subject: Re: Houdini & Rainbow Ltd. - beta 2: TOE TO TOE!    Posted: Wed Jul 04, 2012 10:25 am Reply to topic Reply with quote

Hello!

geots wrote:
Houdini 2.0c x64 vs Rainbow Limited- beta 2


Houdini has been able to slightly increase his lead, and at some point in time Limited- beta 2 needs to make a run at him. "Holding his own" and "playing even with him" won't get the job done now. If "Limited" wants to have any chance at all- he is going to have to soon make a run at Houdini. He can't afford to get any further behind.


Intel i5 w/4TCs
Fritz 13 gui
1CPU/64bit
128MB hash
Bases=NONE
Ponder_Learning=OFF
Perfect 12.32 book w/12-move limit

5'+5"
Match=500 games



[after 245 games]

Code:
Houdini 2.0c x64           +23    +74/-58/=113   53.50%   130.5/245
Rainbow Limited- beta 2    -23    +58/-74/=113   46.70%   114.5/245




Close enough to call this the halfway mark. Hopefully Limited- beta 2 can begin to make this a close match again. It's not out of reach yet.




Back soon-

george


Beta 2 seems is holding a little more than beta 1, although they are playing in different time controls. Here are my results regarding error bars and LOS:

Code:

LOS_and_Elo_uncertainties_calculator, ® 2012.

----------------------------------------------------------------
Calculation of Elo uncertainties in a match between two engines:
----------------------------------------------------------------

(The input and output data is referred to the first engine).

Please write down non-negative integers.

Write down the number of wins:

74

Write down the number of loses:

58

Write down the number of draws:

113

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

(Only 1, 2 and 3-sigma confidence error bars are calculated, if possible).

***************************************
1-sigma confidence ~ 68.27% confidence.
2-sigma confidence ~ 95.45% confidence.
3-sigma confidence ~ 99.73% confidence.
***************************************

---------------------------------------

Elo interval for 1-sigma confidence:

Elo rating difference:     22.72 Elo

Lower rating difference:    6.46 Elo
Upper rating difference:   39.08 Elo

Lower bound uncertainty:  -16.26 Elo
Upper bound uncertainty:   16.36 Elo
Average error:        +/-  16.31 Elo

K = (average error)*[sqrt(n)] =  255.29

Elo interval: ]   6.46,   39.08[
---------------------------------------

Elo interval for 2-sigma confidence:

Elo rating difference:     22.72 Elo

Lower rating difference:   -9.77 Elo
Upper rating difference:   55.62 Elo

Lower bound uncertainty:  -32.49 Elo
Upper bound uncertainty:   32.89 Elo
Average error:        +/-  32.69 Elo

K = (average error)*[sqrt(n)] =  511.72

Elo interval: ]  -9.77,   55.62[
---------------------------------------

Elo interval for 3-sigma confidence:

Elo rating difference:     22.72 Elo

Lower rating difference:  -26.04 Elo
Upper rating difference:   72.40 Elo

Lower bound uncertainty:  -48.77 Elo
Upper bound uncertainty:   49.68 Elo
Average error:        +/-  49.22 Elo

K = (average error)*[sqrt(n)] =  770.48

Elo interval: ] -26.04,   72.40[
---------------------------------------

Number of games of the match:                245
Score: 53.27 %
Elo rating difference:   22.72 Elo
Draw ratio: 46.12 %

**********************************************
1 sigma:  2.3354 % of the points of the match.
2 sigma:  4.6708 % of the points of the match.
3 sigma:  7.0063 % of the points of the match.
**********************************************

 Error bars were calculated with two-sided tests; values are rounded up to 0.01 Elo, or 0.01 in the case of K.

-------------------------------------------------------------------
Calculation of likelihood of superiority (LOS) in a one-sided test:
-------------------------------------------------------------------

LOS:  91.90 %

This value of LOS is rounded up to 0.01%

End of the calculations. Approximated elapsed time:  57 ms.

Thanks for using LOS_and_Elo_uncertainties_calculator. Press Enter to exit.


After 245 games, Houdini is in the lead with ~ +23 ± 33 Elo (with ~ 95.45% confidence, more less 21 out of 22 times) and a LOS value of 91.9% more less, which is not very significant IMHO. Anyway, I think that Houdini will win this match, and this is not a surprise at all.

With the model I use, the score of Houdini for ensuring a 95% of LOS should be:

Code:

Minimum_score_for_no_regression, ® 2012.

 Calculation of the minimum score for no regression (i.e. negative Elo gain) in a match between two engines:

 Write down the number of games of the match (it must be a positive integer, up to 1073741823):

245

Write down the draw ratio (in percentage):

46.1224489795

Write down the confidence level (in percentage) between 75% and 99.9%:

95

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3

Theoretical minimum score for no regression: 53.8356 %
Theoretical standard deviation in this case:  3.8356 %

Minimum number of won points for the engine in this match:       132.0 points.

Minimum Elo advantage, which is also the negative part of the error bar:
 26.9982 Elo

End of the calculations. Approximated elapsed time:  19 ms.

Thanks for using Minimum_score_for_no_regression. Press Enter to exit.


The score should be 132/245, which is very near to the actual 130.5/245; running again Minimum_score_for_no_regression with 500 games and a draw ratio of 46%, a LOS of 95% (in a one-sided test) by Houdini will be reached with a score of 263.5/500 = 52.7% (it implies an advantage of ~ 19 Elo, with error bars of around ± 22 or ± 23 Elo with 95% confidence in a two-sided test). So, it looks reasonably that Houdini is stronger that this beta 2 IMHO. Thanks for running this match!

Regards from Spain.

Ajedrecista.
_________________
Six Fortran 95 tools.

Chess will never be solved.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Subject Author Date/Time
Houdini & Rainbow Ltd.- beta 2: TOE TO TOE! George Speight Wed Jul 04, 2012 9:39 am
      Re: Houdini & Rainbow Ltd. - beta 2: TOE TO TOE! Jesús Muñoz Wed Jul 04, 2012 10:25 am
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Tournaments and Matches

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Powered by phpBB © 2001, 2005 phpBB Group
Enhanced with Moby Threads