If testing was done like this ...

Michael Sherwin · Post by **Michael Sherwin** » Sat Aug 16, 2008 1:45 am

ver_A plays 80 game matches against 5 stronger opponents and the number of different positions won is counted. So any number of wins from one side of one position is only counted as one. Pick opponent engines so that anywhere from 40 to 60 points are aquired.

Then ver_B plays the same.

If ver_B scores more points (based on getting at least one win in each position) then could that be a good indication that ver_B is better? How many more points is needed?

The idea is to ignore the random accumulation of pure score in favor of seeing if a new version can win positions that the earlier version could not win.

My gut feeling is that less games might be needed in a test like this.

bob · Post by **bob** » Sat Aug 16, 2008 2:56 am

Michael Sherwin wrote:ver_A plays 80 game matches against 5 stronger opponents and the number of different positions won is counted. So any number of wins from one side of one position is only counted as one. Pick opponent engines so that anywhere from 40 to 60 points are aquired.

Then ver_B plays the same.

If ver_B scores more points (based on getting at least one win in each position) then could that be a good indication that ver_B is better? How many more points is needed?

The first thing to try is to run a few of these 80 game matches and measure the variability of the results, which will be _amazingly_ large. That is where I started in late 2006/early 2007.

The idea is to ignore the random accumulation of pure score in favor of seeing if a new version can win positions that the earlier version could not win.

My gut feeling is that less games might be needed in a test like this.

Unfortunately, once you test this, reality sets in, and you will discover you need to add three more zeros or so to the total number of games required unless your changes being tested are _huge_ improvements.

If testing was done like this ...

If testing was done like this ...

Re: If testing was done like this ...