Testing methodology

Discussion of chess software programming and technical issues.

Moderator: Ras

Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Testing methodology

Post by Milos »

One thought that might need some statistic experts to answer.
When testing engines we now that the appropriate stochastic process is both static and ergodic.
Now let's assume we are testing only 2 engines against each other, trying to determine the eventual winning percentage.
Lets say these 2 engines played 1000 games and the result is known.
Now, just by taking these 1000 games there is a certain error margin, and we can be sure with a given certainty (like 95%) that the final result will be within these margins.
However, we are only using a small portion of available data. Actually, after any number of games (up to 1000) we know a current winning percentage and error margins (or more precisely the distribution of the expected winning percentage).
Now, if we pick up those winning percentages and make a distribution (in time) of them, this will be again a normal distribution with expected value equal to the real value and standard deviation that is much smaller than what we have if we take only a single value (e.g. after 1000 games).
In that way we could get much smaller error margin with less games.
However, it might be really complex to calculate the error margins.

Any thoughts?
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Testing methodology

Post by abulmo »

Milos wrote: However, we are only using a small portion of available data. Actually, after any number of games (up to 1000) we know a current winning percentage and error margins (or more precisely the distribution of the expected winning percentage).
Now, if we pick up those winning percentages and make a distribution (in time) of them, this will be again a normal distribution with expected value equal to the real value and standard deviation that is much smaller than what we have if we take only a single value (e.g. after 1000 games).
I am not sure what you want to do ? Do you want to compute winning percentages & errors from subsamples, like the bootstrapping statistical method ?
Richard
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Testing methodology

Post by Milos »

abulmo wrote:I am not sure what you want to do ? Do you want to compute winning percentages & errors from subsamples, like the bootstrapping statistical method ?
Yes exactly, to perform resampling in order to reduce error margins.
Aaron Becker
Posts: 292
Joined: Tue Jul 07, 2009 4:56 am

Re: Testing methodology

Post by Aaron Becker »

Any information you extract in this way is derived from the order of your results. But for a stationary, ergodic process there's no order dependence--your sequence of N trials is statistically the same as N trials performed simultaneously. So any information you pull out using this method that can't be derived from looking at the unordered whole data set is pure noise.

Of course this isn't true if your test data isn't ergodic, but in that case your test data itself is the problem.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Testing methodology

Post by abulmo »

Milos wrote:Yes exactly, to perform resampling in order to reduce error margins.
Resampling methods (bootstrapping, jackknife, etc.) are interesting to compute easily the error margins of parameters which are otherwise too complex to compute. In no way it will reduce the error margin, or, if it does, it is a defect of the method.

--
Richard
Richard