Testing methodology

Milos · Post by **Milos** » Wed Apr 21, 2010 2:43 pm

One thought that might need some statistic experts to answer.
When testing engines we now that the appropriate stochastic process is both static and ergodic.
Now let's assume we are testing only 2 engines against each other, trying to determine the eventual winning percentage.
Lets say these 2 engines played 1000 games and the result is known.
Now, just by taking these 1000 games there is a certain error margin, and we can be sure with a given certainty (like 95%) that the final result will be within these margins.
However, we are only using a small portion of available data. Actually, after any number of games (up to 1000) we know a current winning percentage and error margins (or more precisely the distribution of the expected winning percentage).
Now, if we pick up those winning percentages and make a distribution (in time) of them, this will be again a normal distribution with expected value equal to the real value and standard deviation that is much smaller than what we have if we take only a single value (e.g. after 1000 games).
In that way we could get much smaller error margin with less games.
However, it might be really complex to calculate the error margins.

Any thoughts?

abulmo · Post by **abulmo** » Wed Apr 21, 2010 2:57 pm

Milos wrote: However, we are only using a small portion of available data. Actually, after any number of games (up to 1000) we know a current winning percentage and error margins (or more precisely the distribution of the expected winning percentage).
Now, if we pick up those winning percentages and make a distribution (in time) of them, this will be again a normal distribution with expected value equal to the real value and standard deviation that is much smaller than what we have if we take only a single value (e.g. after 1000 games).

I am not sure what you want to do ? Do you want to compute winning percentages & errors from subsamples, like the bootstrapping statistical method ?

Milos · Post by **Milos** » Wed Apr 21, 2010 3:09 pm

abulmo wrote:I am not sure what you want to do ? Do you want to compute winning percentages & errors from subsamples, like the bootstrapping statistical method ?

Yes exactly, to perform resampling in order to reduce error margins.

Aaron Becker · Post by **Aaron Becker** » Wed Apr 21, 2010 6:13 pm

Any information you extract in this way is derived from the order of your results. But for a stationary, ergodic process there's no order dependence--your sequence of N trials is statistically the same as N trials performed simultaneously. So any information you pull out using this method that can't be derived from looking at the unordered whole data set is pure noise.

Of course this isn't true if your test data isn't ergodic, but in that case your test data itself is the problem.

abulmo · Post by **abulmo** » Thu Apr 22, 2010 4:31 pm

Milos wrote:Yes exactly, to perform resampling in order to reduce error margins.

Resampling methods (bootstrapping, jackknife, etc.) are interesting to compute easily the error margins of parameters which are otherwise too complex to compute. In no way it will reduce the error margin, or, if it does, it is a defect of the method.

--
Richard

Testing methodology

Testing methodology

Re: Testing methodology

Re: Testing methodology

Re: Testing methodology

Re: Testing methodology