A question about SPRT

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
AndrewGrant
Posts: 439
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

A question about SPRT

Post by AndrewGrant » Sun Dec 25, 2016 5:02 am

I'm looking at the implementation of SPRT posted here https://chessprogramming.wikispaces.com ... stics#toc7

I through that the number of games needed would be related to the bounds of the test, elo0 and elo1. So if we ran a test using the bounds [0,3], we would be trying to prove that the new version was at least 3 elo better than the second version. The same idea for the using the bounds of say, [0,5]. Now I thought that since it seems like it would be easier to prove a 3 elo gain than a 5 elo gain, the test using [0,3] should take fewer games. However, in practice this appears wrong. Additionally, looking at that implementation, my assumption is also wrong mathematically.

So my question is, where am I going wrong?

Is this implementation flawed?
Do I missunderstand the meaning of elo0 and elo1?

Thanks
Andrew

kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 3:48 am

Re: A question about SPRT

Post by kbhearn » Sun Dec 25, 2016 5:33 am

simply put what SPRT is testing is 'is it more likely that this is a elo0 patch or an elo1 patch?' - patches that have true elo that falls between the bounds will tend to take more games than patches outside the bounds but variance being what it is there is chances that any patch runs long or returns a false positive. Predicting the effect of changing bounds on various elo patches (likelihood they get accepted/rejected, average number of games to reach a conclusion) is difficult - you're best off running some simulations at a few different typical elos.

Michel
Posts: 2003
Joined: Sun Sep 28, 2008 11:50 pm

Re: A question about SPRT

Post by Michel » Sun Dec 25, 2016 7:22 am

Predicting the effect of changing bounds on various elo patches (likelihood they get accepted/rejected, average number of games to reach a conclusion) is difficult
No it is not difficult at all. There are standard formulas. They are implemented for example in this script

http://hardy.uhasselt.be/Toga/sprta.py

It has been translated to javascript here

http://chess-sprt-calc.azurewebsites.net/

Note this script can not immediately be applied to the OP's problem since it takes BayesElo inputs. However it can be easily modified to take logistic elo inputs.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

AndrewGrant
Posts: 439
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

Re: A question about SPRT

Post by AndrewGrant » Mon Dec 26, 2016 2:52 am

Thanks for the clarification.

So, as a general note, I should be choosing the elo bounds based on how well I think the change might be? IE, tweak a few values, use [-1, 1], or I make some large change (add a bunch of evaluation terms for pawns) and use [0, 20]?

Post Reply