Testing A against B by playing a pool of others
Posted: Sat Jun 24, 2017 9:02 am
So I've gone through the trouble of writing a nice web based testing framework. The only part I am missing, or at least not sure of, is my process for terminating tests.
Originally, this was roughly my process:
I will have engines, one called test and one called base. They will both play a number of games against engines A, B, C, ....
I decide that I want 95% conf that there will not be a false positive / false negative. Based on this I get some bounds, [-X, X]. If the Z value I calculate falls outside those bounds, I will terminate the test. Otherwise I will play more games.
To calculate Z, I do the following
I don't have anywhere near enough stats knowledge to say whether or not this is right. I question whether I should replace all the stds with variances. Should I have a divded by two on the (upperstd+lowerstd)/2?
Any help, or a pointer torwards some helpful reading materials would be appreciated greatly.
Thanks,
Andrew Grant
Originally, this was roughly my process:
I will have engines, one called test and one called base. They will both play a number of games against engines A, B, C, ....
I decide that I want 95% conf that there will not be a false positive / false negative. Based on this I get some bounds, [-X, X]. If the Z value I calculate falls outside those bounds, I will terminate the test. Otherwise I will play more games.
To calculate Z, I do the following
Code: Select all
testmean = 0; teststd= 0;
for matchup in test matchups:
w, d, l = matchup.results
n = w + d + l
s = w + d/2
p = s / n
diff = -400 * log10(1/p - 1)
testmean += matchup.opponentsELO + diff
std = sqrt(p * (1-p) / n)
upperstd = -400 * log10(1/(p+std) - 1)
lowerstd = -400 * log10(1/(p-std) - 1)
teststd += (upperstd + lowerstd) / 2
Repeat again for basemean and basestd
Z = ((testmean - basemean) / numOpponents) / sqrt((testvar + basevar) / numOpponents)
Any help, or a pointer torwards some helpful reading materials would be appreciated greatly.
Thanks,
Andrew Grant