First run, new data

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

First run, new data

Post by bob »

Here is the first cluster run with the new positions. I have three more queued up so that I can compare the elo ratings from each to see how closely they fall...

Code: Select all


Tue Aug 12 00:49:44 CDT 2008
time control = 1+1
crafty-22.2R4
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   108    7    7  7782   67%   -21   20%
   2 Fruit 2.1               62    7    6  7782   61%   -21   23%
   3 opponent-21.7           25    6    6  7780   57%   -21   33%
   4 Glaurung 1.1 SMP        10    6    6  7782   54%   -21   20%
   5 Crafty-22.2            -21    4    4 38908   46%     4   23%
   6 Arasan 10.0           -185    7    7  7782   29%   -21   19%
Takes almost exactly 12 hours to run this test which was 38,908 games in total. I'll make the starting positions available on my ftp machine if anyone wants to see them. But my first plan is to see if the 4 runs are close. If so, then it should be possible to reduce the size of this somewhat. I had thought about running odd vs even positions since that would be reasonably random, and seeing how they compare to the original complete set. But first, the thing needs another 36 hours or so to finish the other three runs. All assuming we don't run into another A/C problem where the thing auto-shutdowns quickly.
Uri Blass
Posts: 10816
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: First run, new data

Post by Uri Blass »

I hope that this time you save the pgn of all the games.

Uri
User avatar
hgm
Posts: 28356
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: First run, new data

Post by hgm »

For comparison, the earlier 25,000-game runs were:

Code: Select all

Rank Name                   Elo    +    - games score oppo. draws 
   1 Glaurung 2-epsilon/5   123    8    8  5120   66%     2   15% 
   2 Fruit 2.1               38    8    7  5119   55%     2   19% 
   3 opponent-21.7           28    7    7  5119   54%     2   34% 
   4 Crafty-22.2              2    4    4 25597   50%     0   19% 
   5 Glaurung 1.1 SMP         2    8    8  5120   50%     2   14% 
   6 Arasan 10.0           -193    8    9  5119   26%     2   15% 
Rank Name                   Elo    +    - games score oppo. draws 
   1 Glaurung 2-epsilon/5   118    8    8  5120   67%   -19   13% 
   2 Fruit 2.1               42    8    8  5120   58%   -19   17% 
   3 opponent-21.7           32    7    7  5115   58%   -19   36% 
   4 Glaurung 1.1 SMP        20    8    8  5120   55%   -19   12% 
   5 Crafty-22.2            -19    4    4 25595   47%     4   19% 
   6 Arasan 10.0           -193    8    8  5120   28%   -19   16% 
If we compare the new result to that of the old second run, we get:

Code: Select all

Rank Name                   Elo    +    -   2nd diff 
   1 Glaurung 2-epsilon/5   108    7    7   118  +10
   2 Fruit 2.1               62    7    6    42  -20
   3 opponent-21.7           25    6    6   +32   +7
   4 Glaurung 1.1 SMP        10    6    6   +20  +10
   5 Crafty-22.2            -21    4    4   -19   +2
   6 Arasan 10.0           -185    7    7  -193   -8
We see that the differences, listed in the last column, are not far from sqrt(2) = 1.41 times the uncertainties quoted by BayesElo. The standard deviation of the differences between the run of the 5 Crafty opponents is sqrt((100+400+49+100+64)/5) ~12.

This is about twice of what is expected (as the quoted BayesElo uncertainties are 95% confidence intervals, i.e. ~2 sigma). The main contribution to this variance is due to Fruit 2.1.

The first of the two old runs is way off.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: First run, new data

Post by bob »

hgm wrote:For comparison, the earlier 25,000-game runs were:

Code: Select all

Rank Name                   Elo    +    - games score oppo. draws 
   1 Glaurung 2-epsilon/5   123    8    8  5120   66%     2   15% 
   2 Fruit 2.1               38    8    7  5119   55%     2   19% 
   3 opponent-21.7           28    7    7  5119   54%     2   34% 
   4 Crafty-22.2              2    4    4 25597   50%     0   19% 
   5 Glaurung 1.1 SMP         2    8    8  5120   50%     2   14% 
   6 Arasan 10.0           -193    8    9  5119   26%     2   15% 
Rank Name                   Elo    +    - games score oppo. draws 
   1 Glaurung 2-epsilon/5   118    8    8  5120   67%   -19   13% 
   2 Fruit 2.1               42    8    8  5120   58%   -19   17% 
   3 opponent-21.7           32    7    7  5115   58%   -19   36% 
   4 Glaurung 1.1 SMP        20    8    8  5120   55%   -19   12% 
   5 Crafty-22.2            -19    4    4 25595   47%     4   19% 
   6 Arasan 10.0           -193    8    8  5120   28%   -19   16% 
If we compare the new result to that of the old second run, we get:

Code: Select all

Rank Name                   Elo    +    -   2nd diff 
   1 Glaurung 2-epsilon/5   108    7    7   118  +10
   2 Fruit 2.1               62    7    6    42  -20
   3 opponent-21.7           25    6    6   +32   +7
   4 Glaurung 1.1 SMP        10    6    6   +20  +10
   5 Crafty-22.2            -21    4    4   -19   +2
   6 Arasan 10.0           -185    7    7  -193   -8
We see that the differences, listed in the last column, are not far from sqrt(2) = 1.41 times the uncertainties quoted by BayesElo. The standard deviation of the differences between the run of the 5 Crafty opponents is sqrt((100+400+49+100+64)/5) ~12.

This is about twice of what is expected (as the quoted BayesElo uncertainties are 95% confidence intervals, i.e. ~2 sigma). The main contribution to this variance is due to Fruit 2.1.

The first of the two old runs is way off.
And if so, that's an issue. But it is an issue that has to be addressed since the two runs, regardless of opinion, were back-to-back with everything the same as far as that can be guaranteed.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: First run, new data

Post by bob »

Uri Blass wrote:I hope that this time you save the pgn of all the games.

Uri
yep...