AH_LTC Made In Heaven Class (1st. Dec. 2013)

Aser Huerga · Post by **Aser Huerga** » Sun Dec 01, 2013 10:07 am

AH_LTC Stairway to Heaven Competition

AH_LTC Made In Heaven Class

01/12/2013 9&#58;44&#58;18 &#58;

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish_SZ 13110122          &#58;   10   18  17   600    52.1 %     -5   60.5 %
  2 Houdini 4                      &#58;    5   19  19   600    51.1 %     -3   55.5 %
  3 Komodo 6                       &#58;  -15   18  18   600    46.8 %      7   57.7 %

Code: Select all

01/12/2013 9&#58;44&#58;18 &#58;
Individual statistics&#58;

1 Stockfish_SZ 13110122     &#58;   10  600 (+131,=363,-106&#41;, 52.1 %

Houdini_4_x64A                &#58; 300 (+ 61,=175,- 64&#41;, 49.5 %
Komodo 6                      &#58; 300 (+ 70,=188,- 42&#41;, 54.7 %

2 Houdini_4_x64A            &#58;    5  600 (+140,=333,-127&#41;, 51.1 %

Komodo 6                      &#58; 300 (+ 76,=158,- 66&#41;, 51.7 %
Stockfish_SZ 13110122         &#58; 300 (+ 64,=175,- 61&#41;, 50.5 %

3 Komodo 6                  &#58;  -15  600 (+108,=346,-146&#41;, 46.8 %

Houdini 4                     &#58; 300 (+ 66,=158,- 76&#41;, 48.3 %
Stockfish_SZ 13110122         &#58; 300 (+ 42,=188,- 70&#41;, 45.3 %

AH_LTC results suggest:

Stockfish_SZ 13110122 is 5 ELO points ahead Houdini 4
Houdini 4 is 20 ELO points ahead Komodo 6

(under AH_LTC conditions)

lkaufman · Post by **lkaufman** » Sun Dec 01, 2013 4:47 pm

Aser Huerga wrote:AH_LTC Stairway to Heaven Competition

AH_LTC Made In Heaven Class

Code: Select all

01/12/2013 9&#58;44&#58;18 &#58;

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish_SZ 13110122          &#58;   10   18  17   600    52.1 %     -5   60.5 %
  2 Houdini 4                      &#58;    5   19  19   600    51.1 %     -3   55.5 %
  3 Komodo 6                       &#58;  -15   18  18   600    46.8 %      7   57.7 %

Code: Select all

01/12/2013 9&#58;44&#58;18 &#58;
Individual statistics&#58;

1 Stockfish_SZ 13110122     &#58;   10  600 (+131,=363,-106&#41;, 52.1 %

Houdini_4_x64A                &#58; 300 (+ 61,=175,- 64&#41;, 49.5 %
Komodo 6                      &#58; 300 (+ 70,=188,- 42&#41;, 54.7 %

2 Houdini_4_x64A            &#58;    5  600 (+140,=333,-127&#41;, 51.1 %

Komodo 6                      &#58; 300 (+ 76,=158,- 66&#41;, 51.7 %
Stockfish_SZ 13110122         &#58; 300 (+ 64,=175,- 61&#41;, 50.5 %

3 Komodo 6                  &#58;  -15  600 (+108,=346,-146&#41;, 46.8 %

Houdini 4                     &#58; 300 (+ 66,=158,- 76&#41;, 48.3 %
Stockfish_SZ 13110122         &#58; 300 (+ 42,=188,- 70&#41;, 45.3 %

AH_LTC results suggest:

Stockfish_SZ 13110122 is 5 ELO points ahead Houdini 4
Houdini 4 is 20 ELO points ahead Komodo 6

(under AH_LTC conditions)

I'm glad to see your new list. I have some modest suggestions:

1. I think you should include all the matches you ran (including for example Houdini 3 matches) and rate them all together, because your sample sizes are too small for reliable ratings if you only include the top three. Maybe eventually prune old versions so as to limit the number of versions of one engine to 3.
2. Pick some engine as your reference version and define it as 3000 (or whatever number you like). It's much easier to compare and remember ratings this way.
3. Set some minimum time interval between versions of one engine that you rate. Rate all official releases, and include dev. versions if that interval has passed and you have reason to believe that the newer version is significantly stronger than the older one.

Finally, one question: Which rating program did you use, EloStat, BayesElo, Ordo, or some other one? If you use BayesElo, you should specify the parameters used.

Best regards,
Larry

Aser Huerga · Post by **Aser Huerga** » Sun Dec 01, 2013 5:45 pm

Thanks Larry, I will take in account your points. I'm using ELOStat 1.3

lkaufman · Post by **lkaufman** » Sun Dec 01, 2013 6:01 pm

Aser Huerga wrote:Thanks Larry, I will take in account your points. I'm using ELOStat 1.3

ELOStat works well enough when the engines are fairly close in rating, as is the case here. As long as the opposing engines in matches are separated by less than about fifty elo I think it's okay. It is however quite unsound if you plan to run matches between engines far apart in strength, but I don't think you will be doing that. If you do, I would recommend Ordo.

Aser Huerga · Post by **Aser Huerga** » Sun Dec 01, 2013 7:07 pm

lkaufman wrote: 2. Pick some engine as your reference version and define it as 3000 (or whatever number you like). It's much easier to compare and remember ratings this way.

What about using 3100, 3000 and 2900 (instead of 0) as "Start Rating" for each class? I only want to stand out the strength differences between same class engines (engines will be of same strength).

lkaufman · Post by **lkaufman** » Sun Dec 01, 2013 7:28 pm

Aser Huerga wrote:
lkaufman wrote: 2. Pick some engine as your reference version and define it as 3000 (or whatever number you like). It's much easier to compare and remember ratings this way.
What about using 3100, 3000 and 2900 (instead of 0) as "Start Rating" for each class? I only want to stand out the strength differences between same class engines (engines will be of same strength).

That's reasonable as long as there are no matches between different classes, but once there are such matches it is best to rate everything together, even if you choose to display the data as three separate lists. Also, there is some advantage to picking one version as a fixed reference point as long as all data is being rated, regardless of what is actually displayed. That way newer and stronger versions get higher and higher ratings as they should. But that isn't essential.

Aser Huerga · Post by **Aser Huerga** » Mon Dec 02, 2013 9:52 pm

lkaufman wrote:Also, there is some advantage to picking one version as a fixed reference point as long as all data is being rated, regardless of what is actually displayed. That way newer and stronger versions get higher and higher ratings as they should. But that isn't essential.

I'm gonna use a starting reference for the Start Rating in ELOStat based on H3 and K6 CCRL 40/40 ELOs and then in every list the reference will be updated with the subsequent ELOs generated, so stronger versions will show a greater ELO.

This way results till now are:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish_SZ 13110122          &#58; 3178   15  15   900    54.0 %   3151   58.0 %
  2 Houdini 4                      &#58; 3172   19  19   600    51.1 %   3165   55.5 %
  3 Komodo 6                       &#58; 3151   15  15   900    48.8 %   3160   57.1 %
  4 Houdini 3                      &#58; 3128   19  19   600    44.8 %   3165   54.5 %

Laskos · Post by **Laskos** » Tue Dec 03, 2013 12:50 pm

Aser Huerga wrote:
lkaufman wrote:Also, there is some advantage to picking one version as a fixed reference point as long as all data is being rated, regardless of what is actually displayed. That way newer and stronger versions get higher and higher ratings as they should. But that isn't essential.
I'm gonna use a starting reference for the Start Rating in ELOStat based on H3 and K6 CCRL 40/40 ELOs and then in every list the reference will be updated with the subsequent ELOs generated, so stronger versions will show a greater ELO.

This way results till now are:
Code: Select all
    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish_SZ 13110122          &#58; 3178   15  15   900    54.0 %   3151   58.0 %
  2 Houdini 4                      &#58; 3172   19  19   600    51.1 %   3165   55.5 %
  3 Komodo 6                       &#58; 3151   15  15   900    48.8 %   3160   57.1 %
  4 Houdini 3                      &#58; 3128   19  19   600    44.8 %   3165   54.5 %

Thanks for the test.

ouachita · Post by **ouachita** » Sat Dec 14, 2013 8:25 pm

Aser,
Does AH_LTC for this event = 90'+30"

Aser Huerga · Post by **Aser Huerga** » Sat Dec 14, 2013 11:33 pm

ouachita wrote:Aser,
Does AH_LTC for this event = 90'+30"

Yes, all my AH_LTC games are at 90'+30"

AH_LTC Made In Heaven Class (1st. Dec. 2013)

AH_LTC Made In Heaven Class (1st. Dec. 2013)

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)

Re: AH_LTC Made In Heaven Class (1st. Dec. 2013)