reducing the testing workload

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

reducing the testing workload

Post by bob »

Here are the first three runs I have tried. The only thing significant is that these positions are still producing results that are sane with respect to SD overlap. These runs represent using almost exactly 1/2 of the initial positoins. I simply skipped every other one in this test to cut the workload by 1/2. I have a total of 4 runs set up, three have completed. BTW these are all the same version. I have to use different "names" so that each match produces a directory with that name that contains all the PGN, which I am keeping for the moment.

next thing I am interested in is original positions again, but playing only one game each. It is hard in the current testing approach to alternate colors at one game per position, so I'll have to modify the referee if I decide to try that. But I am going to make 2 runs with crafty white in every position, then two more with crafty black, which will be interesting data as well. I'll run those once the current test is finished. Here is the partial results so far (three total runs):

Code: Select all

Tue Aug 19 10:12:42 CDT 2008
time control = 1+1
crafty-22.2R4e
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   112    9    9  3894   67%   -17   21%
   2 Fruit 2.1               67    8    8  3894   62%   -17   24%
   3 opponent-21.7           20    8    8  3894   56%   -17   33%
   4 Glaurung 1.1 SMP        13    9    9  3894   54%   -17   21%
   5 Crafty-22.2            -17    5    5 19470   47%     3   23%
   6 Arasan 10.0           -193    9    9  3894   27%   -17   18%
Tue Aug 19 21:23:56 CDT 2008
time control = 1+1
crafty-22.2R4f
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   105    9    9  3894   67%   -20   21%
   2 Fruit 2.1               70    9    9  3894   63%   -20   25%
   3 opponent-21.7           15    8    8  3894   55%   -20   34%
   4 Glaurung 1.1 SMP        10    9    9  3894   54%   -20   20%
   5 Crafty-22.2            -20    5    4 19470   46%     4   24%
   6 Arasan 10.0           -179    9    9  3894   29%   -20   19%
Wed Aug 20 08:55:58 CDT 2008
time control = 1+1
crafty-22.2R4g
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   110    9    9  3894   68%   -21   21%
   2 Fruit 2.1               67    8    8  3886   62%   -21   22%
   3 opponent-21.7           17    8    8  3894   56%   -21   34%
   4 Glaurung 1.1 SMP        15    9    9  3894   55%   -21   20%
   5 Crafty-22.2            -21    4    4 19462   46%     4   23%
   6 Arasan 10.0           -187    9    9  3894   28%   -21   19%
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: reducing the testing workload

Post by bob »

4 runs have finished:

Code: Select all

crafty-22.2R4e
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   112    9    9  3894   67%   -17   21%
   2 Fruit 2.1               67    8    8  3894   62%   -17   24%
   3 opponent-21.7           20    8    8  3894   56%   -17   33%
   4 Glaurung 1.1 SMP        13    9    9  3894   54%   -17   21%
   5 Crafty-22.2            -17    5    5 19470   47%     3   23%
   6 Arasan 10.0           -193    9    9  3894   27%   -17   18%
Tue Aug 19 21:23:56 CDT 2008
time control = 1+1
crafty-22.2R4f
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   105    9    9  3894   67%   -20   21%
   2 Fruit 2.1               70    9    9  3894   63%   -20   25%
   3 opponent-21.7           15    8    8  3894   55%   -20   34%
   4 Glaurung 1.1 SMP        10    9    9  3894   54%   -20   20%
   5 Crafty-22.2            -20    5    4 19470   46%     4   24%
   6 Arasan 10.0           -179    9    9  3894   29%   -20   19%
Wed Aug 20 08:55:58 CDT 2008
time control = 1+1
crafty-22.2R4g
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   110    9    9  3894   68%   -21   21%
   2 Fruit 2.1               67    8    8  3886   62%   -21   22%
   3 opponent-21.7           17    8    8  3894   56%   -21   34%
   4 Glaurung 1.1 SMP        15    9    9  3894   55%   -21   20%
   5 Crafty-22.2            -21    4    4 19462   46%     4   23%
   6 Arasan 10.0           -187    9    9  3894   28%   -21   19%
Wed Aug 20 20:46:27 CDT 2008
time control = 1+1
crafty-22.2R4h
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5    99    9    9  3894   66%   -21   21%
   2 Fruit 2.1               66    9    9  3894   62%   -21   23%
   3 opponent-21.7           25    8    8  3894   57%   -21   33%
   4 Glaurung 1.1 SMP        12    9    9  3894   54%   -21   20%
   5 Crafty-22.2            -21    4    4 19470   46%     4   23%
   6 Arasan 10.0           -180    9    9  3894   29%   -21   18%
Error bar was increased by +/- 1 roughly, for 1/2 the work. And again the results look very stable. I still plan on another couple of tests, the first is two rounds just playing white in all positions, the two rounds just playing black, which will be interesting to look at. Then another round with 1/4 the work rather than 1/2 to see how 4 runs like that will look...