TalkChess.com

Posted: **Sun Dec 25, 2016 6:02 am**

I'm looking at the implementation of SPRT posted here https://chessprogramming.wikispaces.com ... stics#toc7

I through that the number of games needed would be related to the bounds of the test, elo0 and elo1. So if we ran a test using the bounds [0,3], we would be trying to prove that the new version was at least 3 elo better than the second version. The same idea for the using the bounds of say, [0,5]. Now I thought that since it seems like it would be easier to prove a 3 elo gain than a 5 elo gain, the test using [0,3] should take fewer games. However, in practice this appears wrong. Additionally, looking at that implementation, my assumption is also wrong mathematically.

So my question is, where am I going wrong?

Is this implementation flawed?
Do I missunderstand the meaning of elo0 and elo1?

Thanks
Andrew

Posted: **Sun Dec 25, 2016 6:33 am**

simply put what SPRT is testing is 'is it more likely that this is a elo0 patch or an elo1 patch?' - patches that have true elo that falls between the bounds will tend to take more games than patches outside the bounds but variance being what it is there is chances that any patch runs long or returns a false positive. Predicting the effect of changing bounds on various elo patches (likelihood they get accepted/rejected, average number of games to reach a conclusion) is difficult - you're best off running some simulations at a few different typical elos.

Posted: **Sun Dec 25, 2016 8:22 am**

Predicting the effect of changing bounds on various elo patches (likelihood they get accepted/rejected, average number of games to reach a conclusion) is difficult

No it is not difficult at all. There are standard formulas. They are implemented for example in this script

http://hardy.uhasselt.be/Toga/sprta.py

It has been translated to javascript here

http://chess-sprt-calc.azurewebsites.net/

Note this script can not immediately be applied to the OP's problem since it takes BayesElo inputs. However it can be easily modified to take logistic elo inputs.

Posted: **Mon Dec 26, 2016 3:52 am**

Thanks for the clarification.

So, as a general note, I should be choosing the elo bounds based on how well I think the change might be? IE, tweak a few values, use [-1, 1], or I make some large change (add a bunch of evaluation terms for pawns) and use [0, 20]?

TalkChess.com

A question about SPRT

A question about SPRT

Re: A question about SPRT

Re: A question about SPRT

Re: A question about SPRT