First run, new data

bob · Post by **bob** » Tue Aug 12, 2008 8:46 am

Here is the first cluster run with the new positions. I have three more queued up so that I can compare the elo ratings from each to see how closely they fall...

Code: Select all


Tue Aug 12 00:49:44 CDT 2008
time control = 1+1
crafty-22.2R4
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   108    7    7  7782   67%   -21   20%
   2 Fruit 2.1               62    7    6  7782   61%   -21   23%
   3 opponent-21.7           25    6    6  7780   57%   -21   33%
   4 Glaurung 1.1 SMP        10    6    6  7782   54%   -21   20%
   5 Crafty-22.2            -21    4    4 38908   46%     4   23%
   6 Arasan 10.0           -185    7    7  7782   29%   -21   19%

Takes almost exactly 12 hours to run this test which was 38,908 games in total. I'll make the starting positions available on my ftp machine if anyone wants to see them. But my first plan is to see if the 4 runs are close. If so, then it should be possible to reduce the size of this somewhat. I had thought about running odd vs even positions since that would be reasonably random, and seeing how they compare to the original complete set. But first, the thing needs another 36 hours or so to finish the other three runs. All assuming we don't run into another A/C problem where the thing auto-shutdowns quickly.

Uri Blass · Post by **Uri Blass** » Tue Aug 12, 2008 8:54 am

I hope that this time you save the pgn of all the games.

Uri

hgm · Post by **hgm** » Tue Aug 12, 2008 11:17 am

For comparison, the earlier 25,000-game runs were:

Code: Select all

Rank Name                   Elo    +    - games score oppo. draws 
   1 Glaurung 2-epsilon/5   123    8    8  5120   66%     2   15% 
   2 Fruit 2.1               38    8    7  5119   55%     2   19% 
   3 opponent-21.7           28    7    7  5119   54%     2   34% 
   4 Crafty-22.2              2    4    4 25597   50%     0   19% 
   5 Glaurung 1.1 SMP         2    8    8  5120   50%     2   14% 
   6 Arasan 10.0           -193    8    9  5119   26%     2   15% 
Rank Name                   Elo    +    - games score oppo. draws 
   1 Glaurung 2-epsilon/5   118    8    8  5120   67%   -19   13% 
   2 Fruit 2.1               42    8    8  5120   58%   -19   17% 
   3 opponent-21.7           32    7    7  5115   58%   -19   36% 
   4 Glaurung 1.1 SMP        20    8    8  5120   55%   -19   12% 
   5 Crafty-22.2            -19    4    4 25595   47%     4   19% 
   6 Arasan 10.0           -193    8    8  5120   28%   -19   16%

If we compare the new result to that of the old second run, we get:

Code: Select all

Rank Name                   Elo    +    -   2nd diff 
   1 Glaurung 2-epsilon/5   108    7    7   118  +10
   2 Fruit 2.1               62    7    6    42  -20
   3 opponent-21.7           25    6    6   +32   +7
   4 Glaurung 1.1 SMP        10    6    6   +20  +10
   5 Crafty-22.2            -21    4    4   -19   +2
   6 Arasan 10.0           -185    7    7  -193   -8

We see that the differences, listed in the last column, are not far from sqrt(2) = 1.41 times the uncertainties quoted by BayesElo. The standard deviation of the differences between the run of the 5 Crafty opponents is sqrt((100+400+49+100+64)/5) ~12.

This is about twice of what is expected (as the quoted BayesElo uncertainties are 95% confidence intervals, i.e. ~2 sigma). The main contribution to this variance is due to Fruit 2.1.

The first of the two old runs is way off.

bob · Post by **bob** » Tue Aug 12, 2008 6:34 pm

hgm wrote:For comparison, the earlier 25,000-game runs were:
Code: Select all
Rank Name                   Elo    +    - games score oppo. draws 
   1 Glaurung 2-epsilon/5   123    8    8  5120   66%     2   15% 
   2 Fruit 2.1               38    8    7  5119   55%     2   19% 
   3 opponent-21.7           28    7    7  5119   54%     2   34% 
   4 Crafty-22.2              2    4    4 25597   50%     0   19% 
   5 Glaurung 1.1 SMP         2    8    8  5120   50%     2   14% 
   6 Arasan 10.0           -193    8    9  5119   26%     2   15% 
Rank Name                   Elo    +    - games score oppo. draws 
   1 Glaurung 2-epsilon/5   118    8    8  5120   67%   -19   13% 
   2 Fruit 2.1               42    8    8  5120   58%   -19   17% 
   3 opponent-21.7           32    7    7  5115   58%   -19   36% 
   4 Glaurung 1.1 SMP        20    8    8  5120   55%   -19   12% 
   5 Crafty-22.2            -19    4    4 25595   47%     4   19% 
   6 Arasan 10.0           -193    8    8  5120   28%   -19   16% 
If we compare the new result to that of the old second run, we get:
Code: Select all
Rank Name                   Elo    +    -   2nd diff 
   1 Glaurung 2-epsilon/5   108    7    7   118  +10
   2 Fruit 2.1               62    7    6    42  -20
   3 opponent-21.7           25    6    6   +32   +7
   4 Glaurung 1.1 SMP        10    6    6   +20  +10
   5 Crafty-22.2            -21    4    4   -19   +2
   6 Arasan 10.0           -185    7    7  -193   -8
We see that the differences, listed in the last column, are not far from sqrt(2) = 1.41 times the uncertainties quoted by BayesElo. The standard deviation of the differences between the run of the 5 Crafty opponents is sqrt((100+400+49+100+64)/5) ~12.

This is about twice of what is expected (as the quoted BayesElo uncertainties are 95% confidence intervals, i.e. ~2 sigma). The main contribution to this variance is due to Fruit 2.1.

The first of the two old runs is way off.

And if so, that's an issue. But it is an issue that has to be addressed since the two runs, regardless of opinion, were back-to-back with everything the same as far as that can be guaranteed.

bob · Post by **bob** » Tue Aug 12, 2008 6:34 pm

Uri Blass wrote:I hope that this time you save the pgn of all the games.

Uri

yep...

First run, new data

First run, new data

Re: First run, new data

Re: First run, new data

Re: First run, new data

Re: First run, new data