## Testing A against B by playing a pool of others

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
AndrewGrant
Posts: 588
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

### Testing A against B by playing a pool of others

So I've gone through the trouble of writing a nice web based testing framework. The only part I am missing, or at least not sure of, is my process for terminating tests.

Originally, this was roughly my process:

I will have engines, one called test and one called base. They will both play a number of games against engines A, B, C, ....

I decide that I want 95% conf that there will not be a false positive / false negative. Based on this I get some bounds, [-X, X]. If the Z value I calculate falls outside those bounds, I will terminate the test. Otherwise I will play more games.

To calculate Z, I do the following

Code: Select all

``````testmean = 0; teststd= 0;
for matchup in test matchups&#58;
w, d, l = matchup.results
n = w + d + l
s = w + d/2
p = s / n

diff = -400 * log10&#40;1/p - 1&#41;
testmean += matchup.opponentsELO + diff

std = sqrt&#40;p * &#40;1-p&#41; / n&#41;
upperstd = -400 * log10&#40;1/&#40;p+std&#41; - 1&#41;
lowerstd = -400 * log10&#40;1/&#40;p-std&#41; - 1&#41;
teststd += &#40;upperstd + lowerstd&#41; / 2

Repeat again for basemean and basestd

Z = (&#40;testmean - basemean&#41; / numOpponents&#41; / sqrt&#40;&#40;testvar + basevar&#41; / numOpponents&#41;

``````
I don't have anywhere near enough stats knowledge to say whether or not this is right. I question whether I should replace all the stds with variances. Should I have a divded by two on the (upperstd+lowerstd)/2?

Any help, or a pointer torwards some helpful reading materials would be appreciated greatly.

Thanks,
Andrew Grant

AndrewGrant
Posts: 588
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

### Re: Testing A against B by playing a pool of others

This was my other though, which gave me slighly better results, but still seems wrong, even for only 1 opponent.

Code: Select all

``````
for matchup in testmatchups&#58;
wins = matchup.wins
draws = matchup.draws
losses = matchup.losses
games = wins + draws + losses
points = wins + draws / 2
score = points / games

diff = -400 * log10&#40;1/score - 1&#41;
elo = matchup.opponent.elo + diff
testmean += elo

std = sqrt&#40;score * &#40;1 - score&#41; / games&#41;
upperstd = (-400 * log10&#40;1/&#40;score+std&#41; - 1&#41;) - diff
lowerstd = diff - (-400 * log10&#40;1/&#40;score-std&#41; - 1&#41;)
elostd = upperstd + lowerstd
testvar += elostd ** 2``````

Ajedrecista
Posts: 1456
Joined: Wed Jul 13, 2011 7:04 pm
Contact:

### Re: Testing A against B by playing a pool of others.

Hello Andrew:

I am not a expert in Statistics, but anyway I will give you my opinion.

I have a doubt: since you are using a trinomial model (wins, draws, loses), why do you use the binomial form of the standard deviation instead of the trinomial form?

``std = sqrt&#40; &#40;score * &#40;1 - score&#41; - 0.25 * d / n&#41; / n&#41;``