AH_LTC Made In Heaven Class (1st. Dec. 2013)

Discussion of computer chess matches and engine tournaments.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
Aser Huerga
Posts: 812
Joined: Tue Jun 16, 2009 8:09 am
Location: Spain

AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by Aser Huerga » Sun Dec 01, 2013 9:07 am

AH_LTC Stairway to Heaven Competition

AH_LTC Made In Heaven Class

Code: Select all

01/12/2013 9:44:18 :

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish_SZ 13110122          :   10   18  17   600    52.1 %     -5   60.5 %
  2 Houdini 4                      :    5   19  19   600    51.1 %     -3   55.5 %
  3 Komodo 6                       :  -15   18  18   600    46.8 %      7   57.7 %

Code: Select all

01/12/2013 9:44:18 :
Individual statistics:

1 Stockfish_SZ 13110122     :   10  600 (+131,=363,-106), 52.1 %

Houdini_4_x64A                : 300 (+ 61,=175,- 64), 49.5 %
Komodo 6                      : 300 (+ 70,=188,- 42), 54.7 %

2 Houdini_4_x64A            :    5  600 (+140,=333,-127), 51.1 %

Komodo 6                      : 300 (+ 76,=158,- 66), 51.7 %
Stockfish_SZ 13110122         : 300 (+ 64,=175,- 61), 50.5 %

3 Komodo 6                  :  -15  600 (+108,=346,-146), 46.8 %

Houdini 4                     : 300 (+ 66,=158,- 76), 48.3 %
Stockfish_SZ 13110122         : 300 (+ 42,=188,- 70), 45.3 %
AH_LTC results suggest:

Stockfish_SZ 13110122 is 5 ELO points ahead Houdini 4
Houdini 4 is 20 ELO points ahead Komodo 6

(under AH_LTC conditions)

lkaufman
Posts: 4201
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by lkaufman » Sun Dec 01, 2013 3:47 pm

Aser Huerga wrote:AH_LTC Stairway to Heaven Competition

AH_LTC Made In Heaven Class

Code: Select all

01/12/2013 9:44:18 :

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish_SZ 13110122          :   10   18  17   600    52.1 %     -5   60.5 %
  2 Houdini 4                      :    5   19  19   600    51.1 %     -3   55.5 %
  3 Komodo 6                       :  -15   18  18   600    46.8 %      7   57.7 %

Code: Select all

01/12/2013 9:44:18 :
Individual statistics:

1 Stockfish_SZ 13110122     :   10  600 (+131,=363,-106), 52.1 %

Houdini_4_x64A                : 300 (+ 61,=175,- 64), 49.5 %
Komodo 6                      : 300 (+ 70,=188,- 42), 54.7 %

2 Houdini_4_x64A            :    5  600 (+140,=333,-127), 51.1 %

Komodo 6                      : 300 (+ 76,=158,- 66), 51.7 %
Stockfish_SZ 13110122         : 300 (+ 64,=175,- 61), 50.5 %

3 Komodo 6                  :  -15  600 (+108,=346,-146), 46.8 %

Houdini 4                     : 300 (+ 66,=158,- 76), 48.3 %
Stockfish_SZ 13110122         : 300 (+ 42,=188,- 70), 45.3 %
AH_LTC results suggest:

Stockfish_SZ 13110122 is 5 ELO points ahead Houdini 4
Houdini 4 is 20 ELO points ahead Komodo 6

(under AH_LTC conditions)
I'm glad to see your new list. I have some modest suggestions:

1. I think you should include all the matches you ran (including for example Houdini 3 matches) and rate them all together, because your sample sizes are too small for reliable ratings if you only include the top three. Maybe eventually prune old versions so as to limit the number of versions of one engine to 3.
2. Pick some engine as your reference version and define it as 3000 (or whatever number you like). It's much easier to compare and remember ratings this way.
3. Set some minimum time interval between versions of one engine that you rate. Rate all official releases, and include dev. versions if that interval has passed and you have reason to believe that the newer version is significantly stronger than the older one.

Finally, one question: Which rating program did you use, EloStat, BayesElo, Ordo, or some other one? If you use BayesElo, you should specify the parameters used.

Best regards,
Larry

User avatar
Aser Huerga
Posts: 812
Joined: Tue Jun 16, 2009 8:09 am
Location: Spain

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by Aser Huerga » Sun Dec 01, 2013 4:45 pm

Thanks Larry, I will take in account your points. I'm using ELOStat 1.3

lkaufman
Posts: 4201
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by lkaufman » Sun Dec 01, 2013 5:01 pm

Aser Huerga wrote:Thanks Larry, I will take in account your points. I'm using ELOStat 1.3
ELOStat works well enough when the engines are fairly close in rating, as is the case here. As long as the opposing engines in matches are separated by less than about fifty elo I think it's okay. It is however quite unsound if you plan to run matches between engines far apart in strength, but I don't think you will be doing that. If you do, I would recommend Ordo.

User avatar
Aser Huerga
Posts: 812
Joined: Tue Jun 16, 2009 8:09 am
Location: Spain

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by Aser Huerga » Sun Dec 01, 2013 6:07 pm

lkaufman wrote: 2. Pick some engine as your reference version and define it as 3000 (or whatever number you like). It's much easier to compare and remember ratings this way.
What about using 3100, 3000 and 2900 (instead of 0) as "Start Rating" for each class? I only want to stand out the strength differences between same class engines (engines will be of same strength).

lkaufman
Posts: 4201
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by lkaufman » Sun Dec 01, 2013 6:28 pm

Aser Huerga wrote:
lkaufman wrote: 2. Pick some engine as your reference version and define it as 3000 (or whatever number you like). It's much easier to compare and remember ratings this way.
What about using 3100, 3000 and 2900 (instead of 0) as "Start Rating" for each class? I only want to stand out the strength differences between same class engines (engines will be of same strength).
That's reasonable as long as there are no matches between different classes, but once there are such matches it is best to rate everything together, even if you choose to display the data as three separate lists. Also, there is some advantage to picking one version as a fixed reference point as long as all data is being rated, regardless of what is actually displayed. That way newer and stronger versions get higher and higher ratings as they should. But that isn't essential.

User avatar
Aser Huerga
Posts: 812
Joined: Tue Jun 16, 2009 8:09 am
Location: Spain

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by Aser Huerga » Mon Dec 02, 2013 8:52 pm

lkaufman wrote:Also, there is some advantage to picking one version as a fixed reference point as long as all data is being rated, regardless of what is actually displayed. That way newer and stronger versions get higher and higher ratings as they should. But that isn't essential.
I'm gonna use a starting reference for the Start Rating in ELOStat based on H3 and K6 CCRL 40/40 ELOs and then in every list the reference will be updated with the subsequent ELOs generated, so stronger versions will show a greater ELO.

This way results till now are:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish_SZ 13110122          : 3178   15  15   900    54.0 %   3151   58.0 %
  2 Houdini 4                      : 3172   19  19   600    51.1 %   3165   55.5 %
  3 Komodo 6                       : 3151   15  15   900    48.8 %   3160   57.1 %
  4 Houdini 3                      : 3128   19  19   600    44.8 %   3165   54.5 %

User avatar
Laskos
Posts: 10240
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by Laskos » Tue Dec 03, 2013 11:50 am

Aser Huerga wrote:
lkaufman wrote:Also, there is some advantage to picking one version as a fixed reference point as long as all data is being rated, regardless of what is actually displayed. That way newer and stronger versions get higher and higher ratings as they should. But that isn't essential.
I'm gonna use a starting reference for the Start Rating in ELOStat based on H3 and K6 CCRL 40/40 ELOs and then in every list the reference will be updated with the subsequent ELOs generated, so stronger versions will show a greater ELO.

This way results till now are:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish_SZ 13110122          : 3178   15  15   900    54.0 %   3151   58.0 %
  2 Houdini 4                      : 3172   19  19   600    51.1 %   3165   55.5 %
  3 Komodo 6                       : 3151   15  15   900    48.8 %   3160   57.1 %
  4 Houdini 3                      : 3128   19  19   600    44.8 %   3165   54.5 %
Thanks for the test.

ouachita
Posts: 454
Joined: Tue Jan 15, 2013 3:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by ouachita » Sat Dec 14, 2013 7:25 pm

Aser,
Does AH_LTC for this event = 90'+30"
SIM, PhD, MBA, PE

User avatar
Aser Huerga
Posts: 812
Joined: Tue Jun 16, 2009 8:09 am
Location: Spain

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Post by Aser Huerga » Sat Dec 14, 2013 10:33 pm

ouachita wrote:Aser,
Does AH_LTC for this event = 90'+30"
Yes, all my AH_LTC games are at 90'+30"

Post Reply