CCRL 40/4 lists updated (11th August 2012)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: CCRL 40/4 lists updated (11th August 2012).

Post by ernest »

Ajedrecista wrote: this is why I consider the worst case.
OK, now you make it clear.
But I still find more useful 3 results, for 40%, 50% and 60% draw rates.

Of course it's very simple to multiply your "worst case" result by sqrt (1-d), d being the draw rate.
User avatar
Ajedrecista
Posts: 2126
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Rule of thumb posted by Kai.

Post by Ajedrecista »

Hi again!
ernest wrote:
Ajedrecista wrote: this is why I consider the worst case.
OK, now you make it clear.
But I still find more useful 3 results, for 40%, 50% and 60% draw rates.

Of course it's very simple to multiply your "worst case" result by sqrt (1-d), d being the draw rate.
You were faster than me. Taking pencil and paper, I reach to an expression valid for small Elo gains (I mean, scores very near to 50%-50%). If I call K = gain · sqrt(n) (the rule of thumb posted by Kai):

Code: Select all

z: parameter of the confidence interval in a normal distribution.
D: the draw ratio.

K(z, D) = 800z·sqrt(1 - D)/ln(10)
For 95% confidence ~ 1.96-sigma confidence and different draw ratios:

Code: Select all

K(z = 1.96, D = 0) ~ 681
...
K(z = 1.96, D = 0.2) ~ 609.1
K(z = 1.96, D = 0.3) ~ 569.7
K(z = 1.96, D = 0.4) ~ 527.4
K(z = 1.96, D = 0.5) ~ 481.5
K(z = 1.96, D = 0.6) ~ 430.7
...
K(z = 1.96, D = 1) = 0 (bad model for high draw ratios).

n = [K(z, D)/gain]²
I slightly differ from Kai's numbers (in numerators) but they are all good overall. Sorry for these off-topic posts! Thanks for your understanding.

Regards from Spain.

Ajedrecista.
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: Rule of thumb posted by Kai.

Post by Modern Times »

Well, for what it is worth, I ran just the Komodo 5 40/40 games through EloStat (because I'm not sure how to use bayeselo) with AMD and Intel separated out, and got this:

Code: Select all

 Program                          Elo    +   -   Games   Score   Av.Op.  Draws

 
    Komodo 5 64-bit Intel Non-SSE  : 2462   21  21   457    58.4 %   2403   56.5 %
    Komodo 5 64-bit AMD-SSE4       : 2422   19  19   600    62.5 %   2333   51.7 %
Quite a big difference, but very big error margins too. Can't draw any conclusions, but it tells me that the AMD factor is worth exploring more.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Rule of thumb posted by Kai.

Post by Laskos »

Modern Times wrote:Well, for what it is worth, I ran just the Komodo 5 40/40 games through EloStat (because I'm not sure how to use bayeselo) with AMD and Intel separated out, and got this:

Code: Select all

 Program                          Elo    +   -   Games   Score   Av.Op.  Draws

 
    Komodo 5 64-bit Intel Non-SSE  : 2462   21  21   457    58.4 %   2403   56.5 %
    Komodo 5 64-bit AMD-SSE4       : 2422   19  19   600    62.5 %   2333   51.7 %
Quite a big difference, but very big error margins too.
The difference is significant at >95% confidence level, in fact something like 99%, could you detail the test conditions?
lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Rule of thumb posted by Kai.

Post by lkaufman »

Modern Times wrote:Well, for what it is worth, I ran just the Komodo 5 40/40 games through EloStat (because I'm not sure how to use bayeselo) with AMD and Intel separated out, and got this:

Code: Select all

 Program                          Elo    +   -   Games   Score   Av.Op.  Draws

 
    Komodo 5 64-bit Intel Non-SSE  : 2462   21  21   457    58.4 %   2403   56.5 %
    Komodo 5 64-bit AMD-SSE4       : 2422   19  19   600    62.5 %   2333   51.7 %
Quite a big difference, but very big error margins too. Can't draw any conclusions, but it tells me that the AMD factor is worth exploring more.
What ratings are you using for the opponents? They show averages of 2403 and 2333, obviously they are not normal CCRL ratings. Are you sure that the opponents are rated consistently between these two runs?

Larry
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: Rule of thumb posted by Kai.

Post by Modern Times »

Of course they aren't normal CCRL ratings, they use Elostat's default 2400 start rating. The numbers come from a single pgn of just the K5 games.
lkaufman
Posts: 6259
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Rule of thumb posted by Kai.

Post by lkaufman »

Modern Times wrote:Of course they aren't normal CCRL ratings, they use Elostat's default 2400 start rating. The numbers come from a single pgn of just the K5 games.
So just to be clear, the same opposing engine has the same rating on both lists, right? Then this is highly significant, though not in agreement with our own observations. It is not only 99% significant, but even more so because the Intel machines did not have SSE4 and the AMD machines did. Could you perhaps do the same thing with Komodo 4 data to confirm your finding? If confirmed, we need to investigate this further.
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: Rule of thumb posted by Kai.

Post by Modern Times »

One list of 1057 games. It simply may not be valid to do an Elo calculation on that. Few or no common opponents, just two gauntlets in one pgn. But what I know is this, on 4040 Komodo 5 only started to show good Elo performance once the Intel games were added to the mix. It is worth you investigating further, doing some proper tests. I'm not spending any more time on it.
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: Rule of thumb posted by Kai.

Post by Modern Times »

The only way to do this is to run an off-line calculation on the entire CCRL database, with Komodo 5 separated out between AMD and Intel. But with just 500 games roughly for each, the statistical error margins will be huge. If I get time I will post the result here, but in any case there is no substitute for some proper, controlled testing on this issue.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Rule of thumb posted by Kai.

Post by Sven »

Modern Times wrote:One list of 1057 games. It simply may not be valid to do an Elo calculation on that. Few or no common opponents, just two gauntlets in one pgn.
No common opponents would indeed invalidate any Elo comparison between the two K5 versions since their games would not be connected, you would have two disjoint rating pools.

Sven