stopped typing too soon. My main issue with the above is that each run was attempting to determine whether 21.7 is better than 22.2 (which we knew already since 22.2 has parts of the evaluation completely missing). But the numbers, giving -46, -45 and -39. the 7 elo difference (not to mention the error bar of course) for three versions that are exactly the same would make it very difficult to detect a small improvement it would seem...bob wrote:Third run is complete and given above. I also have a 15-minute "snapshot" program running that will grab the elo data every 15 minutes so that I can see if the numbers stabilize before the test finishes, in any sort of usable way...Code: Select all
Tue Aug 12 00:49:44 CDT 2008 time control = 1+1 crafty-22.2R4 Rank Name Elo + - games score oppo. draws 1 Glaurung 2-epsilon/5 108 7 7 7782 67% -21 20% 2 Fruit 2.1 62 7 6 7782 61% -21 23% 3 opponent-21.7 25 6 6 7780 57% -21 33% 4 Glaurung 1.1 SMP 10 6 6 7782 54% -21 20% 5 Crafty-22.2 -21 4 4 38908 46% 4 23% 6 Arasan 10.0 -185 7 7 7782 29% -21 19% Tue Aug 12 11:36:10 CDT 2008 time control = 1+1 crafty-22.2R4 Rank Name Elo + - games score oppo. draws 1 Glaurung 2-epsilon/5 110 6 7 7782 67% -19 21% 2 Fruit 2.1 63 6 7 7782 61% -19 23% 3 opponent-21.7 26 6 6 7782 57% -19 33% 4 Glaurung 1.1 SMP 7 6 7 7782 54% -19 20% 5 Crafty-22.2 -19 4 3 38910 47% 4 23% 6 Arasan 10.0 -187 6 7 7782 28% -19 19% Wed Aug 13 00:53:43 CDT 2008 time control = 1+1 crafty-22.2R4 Rank Name Elo + - games score oppo. draws 1 Glaurung 2-epsilon/5 109 7 6 7782 67% -16 20% 2 Fruit 2.1 63 6 7 7782 61% -16 24% 3 opponent-21.7 23 6 6 7781 56% -16 32% 4 Glaurung 1.1 SMP 3 6 7 7782 53% -16 21% 5 Crafty-22.2 -16 4 3 38909 47% 3 23% 6 Arasan 10.0 -182 7 7 7782 28% -16 19%
More tomorrow when test 4 is done. But this certainly does look better, with one minor issue...
It appears to be a difficult task to measure small changes. Crafty's rating varies from -16 to -21, which is more noise than what I am actually hoping to measure... might be a hopeless idea to try to measure very small changes in strength.
comments??
More games at shorter time control? can certainly do that...