Originally, this was roughly my process:
I will have engines, one called test and one called base. They will both play a number of games against engines A, B, C, ....
I decide that I want 95% conf that there will not be a false positive / false negative. Based on this I get some bounds, [-X, X]. If the Z value I calculate falls outside those bounds, I will terminate the test. Otherwise I will play more games.
To calculate Z, I do the following
Code: Select all
testmean = 0; teststd= 0; for matchup in test matchups: w, d, l = matchup.results n = w + d + l s = w + d/2 p = s / n diff = -400 * log10(1/p - 1) testmean += matchup.opponentsELO + diff std = sqrt(p * (1-p) / n) upperstd = -400 * log10(1/(p+std) - 1) lowerstd = -400 * log10(1/(p-std) - 1) teststd += (upperstd + lowerstd) / 2 Repeat again for basemean and basestd Z = ((testmean - basemean) / numOpponents) / sqrt((testvar + basevar) / numOpponents)
Any help, or a pointer torwards some helpful reading materials would be appreciated greatly.