Poll:Which participant is stronger ?

Sedat Canbaz · Post by **Sedat Canbaz** » Mon Nov 14, 2011 9:52 pm

Hello Dear Chess Friends !

What is your estimation:
-Which participant is stronger:Deep Rybka 4.1 x64 6c or Houdini 2.0b Pro x64 4c ?
http://www.sedatcanbaz.com/chess/scct-auto232-poll/

Some Notes:
-Deep Rybka 4.1 x64 6c’s testing is just began in SCCT Auto232:
http://www.sedatcanbaz.com/chess/ratings/scct-auto232/
-Deep Rybka 4.1 x64 6 cores will be played on i7 980X @4.33 GHz (HT OFF/Large Pages ON)
-Houdini 2.0b Pro x64 4 cores is played on i7 920 @3.0GHz/QX9650 @3.66GHz (HT OFF/Large Pages OFF)

Thanks in advance,
Sedat Canbaz

Sedat Canbaz · Post by **Sedat Canbaz** » Tue Nov 15, 2011 10:21 am

Current standings:Deep Rybka 4.1 x64 6 cores is the Leader (in both tables

)

Best Wishes,
Sedat

Sedat Canbaz · Post by **Sedat Canbaz** » Thu Nov 17, 2011 12:01 am

Deep Rybka 4.1 x64 6 core's performance so far:
http://www.sedatcanbaz.com/chess/ratings/scct-auto232/

Some notes about the current match (its too early for any conclusions-more games are needed to be played out):
-But anyway,Deep Rybka 4.1 x64 6 cores seems to be slightly stronger than Houdini 2.0b Pro x64 4 cores
-The current score shows that Deep Rybka 4.1 x64 6 cores is approximately 70 elo stronger than Deep Rybka 4.1 x64 4 cores
-The latest release:Houdini 2.0b Pro x64's performance is around 10 elo weaker than Houdini 2.0 Pro x64 (probably due to its tb bug)
-Deep Rybka 4.1 x64 and Houdini 2.0b Pro x64 are playing with 4-MEN EGTB (DVD Endgame Turbo 3/Gaviota TB are disabled)

And here the current voting table

Best,
Sedat

Ajedrecista · Post by **Ajedrecista** » Thu Nov 17, 2011 9:03 pm

Hello Sedat:

Sedat Canbaz wrote:Current standings:Deep Rybka 4.1 x64 6 cores is the Leader (in both tables )

Best Wishes,
Sedat

I agree with you: more games are needed.

I took a look on the PGN file and I found this (hoping no typos and/or repeated games):

Code: Select all

Deep Rybka 4.1 x64 6c vs. Houdini 2.0b Pro x64 4c.

Rybka - Houdini: +40 -18 =45 (103 games).
Houdini - Rybka: +32 -18 =52 (102 games).

TOTAL: +58 -50 =97 (205 games). Rybka is ahead by ~ 14 Elo.

I hope I have counted correctly. I read in ImmortalChess Forum a long time ago a way for calculate 'uncertainties' in rating difference according with number of games and draw ratio (only the formula of the standard deviation). Here I go with the complete method, as I usually implement it (almost everything written below the standard deviation is of my own, so who knows if I do wrong assumptions):

Code: Select all

Number of games = n = 58 + 50 + 97 = 205
Draw_ratio = (number of draws)/n = 97/205
Score_Rybka = (58 + 97/2)/n = 106.5/205 = 1 - Score_Houdini
Score_Houdini = (50 + 97/2)/n = 98.5/205 = 1 - Score_Rybka

Rating_difference = rd = 400·log(Score_Rybka/Score_Houdini) ~ 13.57
Standard_deviation = sd = sqrt{[(Score_Rybka)·(Score_Houdini) - (Draw_ratio)/4]/n} ~ 0.02531 ~ 2.531% of the n games.

(Referring to the n games, and taking a confidence interval of 95.45%, that is 2·sd):

2n·(sd) ~ 10.3773

rd(+) ~ 400·log[(106.5 + 10.3773)/(98.5 - 10.3773)] ~ 49.06
rd(-) ~ 400·log[(106.5 - 10.3773)/(98.5 + 10.3773)] ~ -21.64

Error(+) = rd(+) - rd ~ 35.49
Error(-) = rd(-) - rd ~ -35.21
|Error| = [|error(+)| + |error(-)|]/2 ~ 35.35 ; error ~ ± 35.35

With confidence ~ 95.45%:

Deep Rybka 4.1 x64 6c is stronger than Houdini 2.0b x64 4c by the Elo points given in the following interval:

13.57 (+35.49, -35.21) ~ ]-21.64, +49.06[

Or taking the average error (± 35.35):

13.57 ± 35.35 ~ ]-21.78, +48.92[ (Few changes, as expected).

(Sorry for my unpleasant notation). So, around ± 35 Elo of 'uncertainty' and an interval of ~ ]-22, +49[ with 205 games, a draw ratio of ~ 47.32% and Rybka scoring ~ 51.95% (with 2-sigma confidence). Is it true? Or am I wrong? I do not know if the formula I have applied is from BayesElo, EloStat or something that someone 'invented' in ImmortalChess and posted it. Could you give the intervals of rating difference with +58 -50 =97 (the result of this (unfinished?) match), please?

I think I have not got any programmes to calculate this; the numbers I give in the code box were calculated with a normal Casio calculator (and results are rounded), so please keep in mind that they can be full of errors... although it seems that an uncertainty of ± 35 Elo is a logical error for 205 games. Thanks in advance and please keep up this great work!

Regards from Spain.

Ajedrecista.

Sedat Canbaz · Post by **Sedat Canbaz** » Thu Nov 17, 2011 10:05 pm

Hello dear Jesús Muñoz,

You are welcome,its my pleasure...

Probably this test will be helpful for those who are considering to update their hardwares

And now about the current issue:
Yes...agreed with you,your countings are right (as far as i've checked)

Individual statistics:

Code: Select all

2 Deep Rybka 4.1 x64 6c     : 3380  328 (+101,=173,- 54), 57.2 %

IvanHoe 47c GH x64 4c         : 123 (+ 43,= 76,-  4), 65.9 %
Houdini 2.0b Pro x64 4c       : 205 (+ 58,= 97,- 50), 52.0 %

Deep Rybka 4.1 x64 6 cores test is still running and for a better conclusion is needed at least 400-500 games

BTW, the current its opponent is Deep Rybka 4.1 x64 4 cores
And after 38 games,Rybka 6 cores is clearly ahead:+11=25-2 (+84 elo)

Another interesting note is that when we compare the both calculation programs (Elostat and Bayeselo),
then we see approx.15-20 elo difference between each other of the both calculation programs

Calculation by Bayeselo 0056 (start elo:3270):approx.70 elo difference between i7 980X @4.33GHz and i7 920 @3.0GHz

Code: Select all

Rank Name                        Elo    +    - games score oppo. draws 

   1 Houdini 2.0b Pro x64 6c    3417   18   18  1009   69%  3298   46% 
   2 Deep Rybka 4.1 x64 6c      3364   29   29   328   57%  3323   53% 
   3 Houdini 2.0 Pro x64 4c     3360   16   16  1131   61%  3291   49% 
   4 Houdini 2.0b Pro x64 4c    3349   19   19   851   53%  3329   47% 
   5 Houdini 1.5a x64 4c        3342   17   17  1086   62%  3270   48% 
   6 Deep Rybka 4.1 x64 4c      3290   15   15  1318   49%  3297   56% 
   7 Critter 1.2 x64 4c         3286   16   16  1037   50%  3287   58% 
   8 IvanHoe 47c GH x64 4c      3279   16   16  1110   49%  3286   60% 
   9 Fire 2.2 xTreme x64 4c     3273   15   15  1165   41%  3326   59% 
  10 IvanHoe 0B.09.18 x64 4c    3269   17   17   924   47%  3286   58% 
  11 DeepSaros 2.3i x64 4c      3266   17   17   888   50%  3268   61% 
  12 IvanHoe B47d x64 4c        3265   18   18   886   44%  3302   57% 
  13 IvanHoe B47f02 x64 4c      3258   17   17   984   48%  3270   59% 
  14 Houdini 2.0 Pro x64 1c     3252   14   14  1543   54%  3228   48% 
  15 Stockfish 2.1.1 JA x64 4c  3251   17   17   941   47%  3266   55% 
  16 Strelka 5.1 x64 1c         3248   20   20   715   57%  3205   52% 
  17 Rybka 4.1 x64 1c           3194   20   20   677   49%  3202   54% 
  18 Komodo 3.0 x64 1c          3184   15   15  1360   40%  3250   44% 
  19 Ivanhoe B46a x64 1c        3176   20   20   716   44%  3214   54% 
  20 Stockfish 111026 x64 1c    3175   22   22   613   45%  3207   50% 
  21 Naum 4.2 x64 4c            3172   18   18   898   36%  3256   46%

Calculation by EloStat 1.3 (start elo:3270):approx.80 elo difference between i7 980X @4.33GHz and i7 920 @3.0GHz

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Houdini 2.0b Pro x64 6c        : 3440   16  16  1009    69.0 %   3301   46.0 %
  2 Deep Rybka 4.1 x64 6c          : 3380   26  26   328    57.2 %   3330   52.7 %
  3 Houdini 2.0 Pro x64 4c         : 3371   14  14  1131    61.1 %   3293   49.2 %
  4 Houdini 2.0b Pro x64 4c        : 3361   17  17   851    53.2 %   3339   47.4 %
  5 Houdini 1.5a x64 4c            : 3351   15  15  1086    61.6 %   3269   47.7 %
  6 Deep Rybka 4.1 x64 4c          : 3295   12  12  1318    49.2 %   3300   56.1 %
  7 Critter 1.2 x64 4c             : 3287   14  14  1037    49.8 %   3289   57.6 %
  8 IvanHoe 47c GH x64 4c          : 3280   13  13  1110    48.8 %   3288   60.4 %
  9 Fire 2.2 xTreme x64 4c         : 3273   13  13  1165    41.3 %   3333   59.0 %
 10 IvanHoe 0B.09.18 x64 4c        : 3268   15  15   924    47.3 %   3287   57.7 %
 11 DeepSaros 2.3i x64 4c          : 3265   14  14   888    49.7 %   3267   60.7 %
 12 IvanHoe B47d x64 4c            : 3264   15  15   886    44.0 %   3306   56.5 %
 13 IvanHoe B47f02 x64 4c          : 3255   14  14   984    48.0 %   3269   59.0 %
 14 Houdini 2.0 Pro x64 1c         : 3246   13  13  1543    53.8 %   3220   48.0 %
 15 Stockfish 2.1.1 JA x64 4c      : 3246   15  15   941    47.3 %   3265   54.8 %
 16 Strelka 5.1 x64 1c             : 3243   18  18   715    57.1 %   3193   51.6 %
 17 Rybka 4.1 x64 1c               : 3180   18  18   677    48.7 %   3189   53.6 %
 18 Komodo 3.0 x64 1c              : 3171   14  14  1360    39.6 %   3245   44.5 %
 19 Stockfish 111026 x64 1c        : 3159   19  19   613    44.8 %   3195   50.1 %
 20 Ivanhoe B46a x64 1c            : 3158   17  17   716    43.6 %   3203   53.9 %
 21 Naum 4.2 x64 4c                : 3155   17  17   898    36.4 %   3252   45.8 %

Best Regards,
Sedat

Ajedrecista · Post by **Ajedrecista** » Fri Nov 18, 2011 10:41 am

Hello again:

Sedat Canbaz wrote:Hello dear Jesús Muñoz,

You are welcome,its my pleasure...

Probably this test will be helpful for those who are considering to update their hardwares

And now about the current issue:
Yes...agreed with you,your countings are right (as far as i've checked)

Individual statistics:

Code: Select all

2 Deep Rybka 4.1 x64 6c     : 3380  328 (+101,=173,- 54), 57.2 %

IvanHoe 47c GH x64 4c         : 123 (+ 43,= 76,-  4), 65.9 %
Houdini 2.0b Pro x64 4c       : 205 (+ 58,= 97,- 50), 52.0 %

Deep Rybka 4.1 x64 6 cores test is still running and for a better conclusion is needed at least 400-500 games

BTW, the current its opponent is Deep Rybka 4.1 x64 4 cores
And after 38 games,Rybka 6 cores is clearly ahead:+11=25-2 (+84 elo)

Another interesting note is that when we compare the both calculation programs (Elostat and Bayeselo),
then we see approx.15-20 elo difference between each other of the both calculation programs

Calculation by Bayeselo 0056 (start elo:3270):approx.70 elo difference between i7 980X @4.33GHz and i7 920 @3.0GHz

Code: Select all

Rank Name                        Elo    +    - games score oppo. draws 

   1 Houdini 2.0b Pro x64 6c    3417   18   18  1009   69%  3298   46% 
   2 Deep Rybka 4.1 x64 6c      3364   29   29   328   57%  3323   53% 
   3 Houdini 2.0 Pro x64 4c     3360   16   16  1131   61%  3291   49% 
   4 Houdini 2.0b Pro x64 4c    3349   19   19   851   53%  3329   47% 
   5 Houdini 1.5a x64 4c        3342   17   17  1086   62%  3270   48% 
   6 Deep Rybka 4.1 x64 4c      3290   15   15  1318   49%  3297   56% 
   7 Critter 1.2 x64 4c         3286   16   16  1037   50%  3287   58% 
   8 IvanHoe 47c GH x64 4c      3279   16   16  1110   49%  3286   60% 
   9 Fire 2.2 xTreme x64 4c     3273   15   15  1165   41%  3326   59% 
  10 IvanHoe 0B.09.18 x64 4c    3269   17   17   924   47%  3286   58% 
  11 DeepSaros 2.3i x64 4c      3266   17   17   888   50%  3268   61% 
  12 IvanHoe B47d x64 4c        3265   18   18   886   44%  3302   57% 
  13 IvanHoe B47f02 x64 4c      3258   17   17   984   48%  3270   59% 
  14 Houdini 2.0 Pro x64 1c     3252   14   14  1543   54%  3228   48% 
  15 Stockfish 2.1.1 JA x64 4c  3251   17   17   941   47%  3266   55% 
  16 Strelka 5.1 x64 1c         3248   20   20   715   57%  3205   52% 
  17 Rybka 4.1 x64 1c           3194   20   20   677   49%  3202   54% 
  18 Komodo 3.0 x64 1c          3184   15   15  1360   40%  3250   44% 
  19 Ivanhoe B46a x64 1c        3176   20   20   716   44%  3214   54% 
  20 Stockfish 111026 x64 1c    3175   22   22   613   45%  3207   50% 
  21 Naum 4.2 x64 4c            3172   18   18   898   36%  3256   46%

Calculation by EloStat 1.3 (start elo:3270):approx.80 elo difference between i7 980X @4.33GHz and i7 920 @3.0GHz

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Houdini 2.0b Pro x64 6c        : 3440   16  16  1009    69.0 %   3301   46.0 %
  2 Deep Rybka 4.1 x64 6c          : 3380   26  26   328    57.2 %   3330   52.7 %
  3 Houdini 2.0 Pro x64 4c         : 3371   14  14  1131    61.1 %   3293   49.2 %
  4 Houdini 2.0b Pro x64 4c        : 3361   17  17   851    53.2 %   3339   47.4 %
  5 Houdini 1.5a x64 4c            : 3351   15  15  1086    61.6 %   3269   47.7 %
  6 Deep Rybka 4.1 x64 4c          : 3295   12  12  1318    49.2 %   3300   56.1 %
  7 Critter 1.2 x64 4c             : 3287   14  14  1037    49.8 %   3289   57.6 %
  8 IvanHoe 47c GH x64 4c          : 3280   13  13  1110    48.8 %   3288   60.4 %
  9 Fire 2.2 xTreme x64 4c         : 3273   13  13  1165    41.3 %   3333   59.0 %
 10 IvanHoe 0B.09.18 x64 4c        : 3268   15  15   924    47.3 %   3287   57.7 %
 11 DeepSaros 2.3i x64 4c          : 3265   14  14   888    49.7 %   3267   60.7 %
 12 IvanHoe B47d x64 4c            : 3264   15  15   886    44.0 %   3306   56.5 %
 13 IvanHoe B47f02 x64 4c          : 3255   14  14   984    48.0 %   3269   59.0 %
 14 Houdini 2.0 Pro x64 1c         : 3246   13  13  1543    53.8 %   3220   48.0 %
 15 Stockfish 2.1.1 JA x64 4c      : 3246   15  15   941    47.3 %   3265   54.8 %
 16 Strelka 5.1 x64 1c             : 3243   18  18   715    57.1 %   3193   51.6 %
 17 Rybka 4.1 x64 1c               : 3180   18  18   677    48.7 %   3189   53.6 %
 18 Komodo 3.0 x64 1c              : 3171   14  14  1360    39.6 %   3245   44.5 %
 19 Stockfish 111026 x64 1c        : 3159   19  19   613    44.8 %   3195   50.1 %
 20 Ivanhoe B46a x64 1c            : 3158   17  17   716    43.6 %   3203   53.9 %
 21 Naum 4.2 x64 4c                : 3155   17  17   898    36.4 %   3252   45.8 %

Best Regards,
Sedat

Thanks for your fast answer. I did a quick calculation with 328 games and my average error is ~ ± 26.4 Elo (for 2-sigma confidence, ~ 95.45% confidence) an ~ ± 25.86 Elo (for 95% confidence, ~ 1.96-sigma confidence), so I guess that my uncertainties match more with EloStat than with BayesElo (I read a while ago that BayesElo is a bit better than EloStat). For getting ± 29 Elo with the formula I apply, the confidence should be (hoping no errors in my calculation) around 2.2-sigma confidence (or around 97.25% confidence, more less).

Regards from Spain.

Ajedrecista.

Poll:Which participant is stronger ?

Poll:Which participant is stronger ?

Re: Poll:Which participant is stronger ?

Re: Poll:Which participant is stronger ?

Re: Poll:Which participant is stronger ?

Re: Poll:Which participant is stronger ?

Re: Poll:Which participant is stronger ?