The SPRT without draw model, elo model or whatever...
Moderators: hgm, Rebel, chrisw
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
The SPRT without draw model, elo model or whatever...
A typical implementation of the SPRT (e.g. cutechess) now runs as follows:
(1) Estimate the draw_elo parameter of the BayesElo model from the sample.
(2) Use a "scale" given by a standard formula to convert logistic elo's (input to the SPRT) to Bayes elo's.
(3) Perform the SPRT using the BayesElo model.
This is partially the result of my own suggestion long ago to use the Bayes Elo model for the SPRT.
However I have now come to realize that this procedure is ridiculously circuitous! My suggestion was based on a lack of experience in statistics.
After all a match between two engines just follows a trinomial distribution. Thus we may perform directly a GSPRT(*) for H0:expected_score=score0 against H1:expected_score=score1 with unknown parameter the draw ratio. This is mathematically much cleaner and also much easier to implement....
(*) The GSPRT is a variant on the SPRT when there are unknown parameters. One replaces the log likelihood used in the SPRT by its maximum over the parameter space subject to the conditions H0 and H1. The GSPRT satisfies the same optimality properties as the SPRT, at least asymptotically.
(1) Estimate the draw_elo parameter of the BayesElo model from the sample.
(2) Use a "scale" given by a standard formula to convert logistic elo's (input to the SPRT) to Bayes elo's.
(3) Perform the SPRT using the BayesElo model.
This is partially the result of my own suggestion long ago to use the Bayes Elo model for the SPRT.
However I have now come to realize that this procedure is ridiculously circuitous! My suggestion was based on a lack of experience in statistics.
After all a match between two engines just follows a trinomial distribution. Thus we may perform directly a GSPRT(*) for H0:expected_score=score0 against H1:expected_score=score1 with unknown parameter the draw ratio. This is mathematically much cleaner and also much easier to implement....
(*) The GSPRT is a variant on the SPRT when there are unknown parameters. One replaces the log likelihood used in the SPRT by its maximum over the parameter space subject to the conditions H0 and H1. The GSPRT satisfies the same optimality properties as the SPRT, at least asymptotically.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: The SPRT without draw model, elo model or whatever...
Good. It always seemed to me that this Bayeselo draw model for SPRT is a useless pain in the neck of using a draw model when we do know the trinomial distribution. Didn't know about GSPRT, and to avoid the pain, in analytical derivations I simply used draw_elo=0, using binomial only. I also used binomial distribution in order to approximate the trinomial distribution, it fits very well in most cases (draw_elo now is not 0, but I get rid of it).Michel wrote:A typical implementation of the SPRT (e.g. cutechess) now runs as follows:
(1) Estimate the draw_elo parameter of the BayesElo model from the sample.
(2) Use a "scale" given by a standard formula to convert logistic elo's (input to the SPRT) to Bayes elo's.
(3) Perform the SPRT using the BayesElo model.
This is partially the result of my own suggestion long ago to use the Bayes Elo model for the SPRT.
However I have now come to realize that this procedure is ridiculously circuitous! My suggestion was based on a lack of experience in statistics.
After all a match between two engines just follows a trinomial distribution. Thus we may perform directly a GSPRT(*) for H0:expected_score=score0 against H1:expected_score=score1 with unknown parameter the draw ratio. This is mathematically much cleaner and also much easier to implement....
(*) The GSPRT is a variant on the SPRT when there are unknown parameters. One replaces the log likelihood used in the SPRT by its maximum over the parameter space subject to the conditions H0 and H1. The GSPRT satisfies the same optimality properties as the SPRT, at least asymptotically.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: The SPRT without draw model, elo model or whatever...
Here is some quick code that implements this. I have kept things as simple possible.
Code: Select all
from __future__ import division
import math
bb=math.log(10)/400
def L(x):
return 1/(1+math.exp(-bb*x))
def DrawRatioMLE(s,results):
"""
We compute the MLE for the draw ratio, subject to the condition score=s.
This analytically correct formula was computed with Sage. Probably
D/(W+D+L) works just as well.
"""
L=results[0]
D=results[1]
W=results[2]
return (L*s - W*s + D + W - math.sqrt(4*D**2*s**2 + 4*D*L*s**2 + 4*D*W*s**2 + L**2*s**2 - 2*L*W*s**2 + W**2*s**2 - 4*D**2*s - 2*D*L*s - 6*D*W*s + 2*L*W*s - 2*W**2*s + D**2 + 2*D*W + W**2))/(W+D+L)
def LL(s,results):
"""
Compute the log likelihood for score=s.
"""
d=DrawRatioMLE(s,results)
w=s-(1/2)*d
l=1-s-(1/2)*d
L=results[0]
D=results[1]
W=results[2]
# sanity
if W==0 or D==0 or L==0:
return 0
else:
return W*math.log(w)+D*math.log(d)+L*math.log(l)
#####################################################################################################
class SimpleSPRT:
"""
This class performs a GSPRT for H0:elo=elo0 versus H1:elo=elo1
See here for a description of the GSPRT as well as theoretical (asymptotic) results.
http://stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf
To record the outcome of a game use the method self.record(result) where "result" is one of 0,1,2,
corresponding respectively to a loss, a draw and a win.
"""
def __init__(self,alpha=0.05,beta=0.05,elo0=0,elo1=5):
self.score0=L(elo0)
self.score1=L(elo1)
self.LA=math.log(beta/(1-alpha))
self.LB=math.log((1-beta)/alpha)
self.results=3*[0]
self._status=''
def record(self,result):
self.results[result]+=1
LLR=LL(self.score1,self.results)-LL(self.score0,self.results)
if LLR>self.LB:
self._status='H1'
elif LLR<self.LA:
self._status='H0'
def status(self):
return self._status
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: The SPRT without draw model, elo model or whatever...
Try to make use of your class and input my previous sprt records.Michel wrote:Here is some quick code that implements this. I have kept things as simple possible.
Code: Select all
from __future__ import division import math bb=math.log(10)/400 def L(x): return 1/(1+math.exp(-bb*x)) def DrawRatioMLE(s,results): """ We compute the MLE for the draw ratio, subject to the condition score=s. This analytically correct formula was computed with Sage. Probably D/(W+D+L) works just as well. """ L=results[0] D=results[1] W=results[2] return (L*s - W*s + D + W - math.sqrt(4*D**2*s**2 + 4*D*L*s**2 + 4*D*W*s**2 + L**2*s**2 - 2*L*W*s**2 + W**2*s**2 - 4*D**2*s - 2*D*L*s - 6*D*W*s + 2*L*W*s - 2*W**2*s + D**2 + 2*D*W + W**2))/(W+D+L) def LL(s,results): """ Compute the log likelihood for score=s. """ d=DrawRatioMLE(s,results) w=s-(1/2)*d l=1-s-(1/2)*d L=results[0] D=results[1] W=results[2] # sanity if W==0 or D==0 or L==0: return 0 else: return W*math.log(w)+D*math.log(d)+L*math.log(l) ##################################################################################################### class SimpleSPRT: """ This class performs a GSPRT for H0:elo=elo0 versus H1:elo=elo1 See here for a description of the GSPRT as well as theoretical (asymptotic) results. http://stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf To record the outcome of a game use the method self.record(result) where "result" is one of 0,1,2, corresponding respectively to a loss, a draw and a win. """ def __init__(self,alpha=0.05,beta=0.05,elo0=0,elo1=5): self.score0=L(elo0) self.score1=L(elo1) self.LA=math.log(beta/(1-alpha)) self.LB=math.log((1-beta)/alpha) self.results=3*[0] self._status='' def record(self,result): self.results[result]+=1 LLR=LL(self.score1,self.results)-LL(self.score0,self.results) if LLR>self.LB: self._status='H1' elif LLR<self.LA: self._status='H0' def status(self): return self._status
My sprt record, formula based from Sf,
Code: Select all
TestEngine: D2015.1.35.239.tuned
BaseEngine: D2015.1.35.239
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 2467, W: 637, L: 559, D: 1271, Score: 51.6%, NetW: +78
Elo: +10, err95: +/-9, LOS: 0.98816
LLR: 1.84, [-2.94, +2.94]
Code: Select all
enter alpha? 0.05
enter beta? 0.05
enter elo0? -1.5
enter elo1? 4.5
enter losses? 559
enter draws? 1271
enter wins? 637
status: H0
Here is the interface,
Code: Select all
from gsprt import SimpleSPRT
import sys
def main(argv):
alpha = float(raw_input('enter alpha? '))
beta = float(raw_input('enter beta? '))
print
elo0 = float(raw_input('enter elo0? '))
elo1 = float(raw_input('enter elo1? '))
print
losses = int(raw_input('enter losses? '))
draws = int(raw_input('enter draws? '))
wins = int(raw_input('enter wins? '))
print
data = SimpleSPRT(alpha, beta, elo0, elo1)
# Losses
for _ in range(losses):
data.record(0)
# Draws
for _ in range(draws):
data.record(1)
# Wins
for _ in range(wins):
data.record(2)
print 'status: %s' % data.status()
if __name__ == "__main__":
main(sys.argv[1:])
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: The SPRT without draw model, elo model or whatever...
I have not yet checked your computation yet but the input to SimpleSPRT is in standard elo (as in cutechess-cli).So how to interpret that?
The input of the SPRT as used by Stockfish is in Bayes Elo, which is really a historical accident. It makes the interpretation of the paramaters needlessly difficult.
The conversion between Bayes Elo and standard elo can be done using a scale factor given by the following formula
(4*math.exp(-bb*de))/(1+math.exp(-bb*de))**2
where bb=math.log(10)/400 and de is draw_elo.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: The SPRT without draw model, elo model or whatever...
Sorry this is not how one uses the class. This is a sequential test. So one should check self.status() after every self.record() and bail out as soon as self.status() returns 'H0' or 'H1'.# Losses
for _ in range(losses):
data.record(0)
# Draws
for _ in range(draws):
data.record(1)
# Wins
for _ in range(wins):
data.record(2)
I thought this was clear from the way the code is written, but it seems not. Sorry about that.
Here is some code that uses the class to do simulations
Code: Select all
import random,sys
def pick(w,d,l):
s=random.random()
if s<=w:
return 2
elif s<=w+d:
return 1
else:
return 0
def simulate(draw_ratio,epsilon,alpha,elo):
"""
We simulate the test H0:elo_diff=0 versus H1:elo_diff=epsilon with elo_diff equal
to elo.
"""
sp=SimpleSPRT(alpha=alpha,beta=alpha,elo0=0,elo1=epsilon)
s=L(elo)
d=draw_ratio
w=s-(1/2)*d
l=1-s-(1/2)*d
while True:
r=pick(w,d,l)
sp.record(r)
status=sp.status()
if status!='':
return status
if __name__=='__main__':
epsilon=5
a={'H0':0,'H1':0}
cc=0
while True:
status=simulate(draw_ratio=0.33,epsilon=epsilon,alpha=0.05,elo=0)
a[status]+=1
cc+=1
print a['H0']/cc, a['H1']/cc
sys.stdout.flush()
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: The SPRT without draw model, elo model or whatever...
You are doing simulation, and I am inputing the values. But as I undertand the use of class is the same.Michel wrote:Sorry this is not how one uses the class. This is a sequential test. So one should check self.status() after every self.record() and bail out as soon as self.status() returns 'H0' or 'H1'.# Losses
for _ in range(losses):
data.record(0)
# Draws
for _ in range(draws):
data.record(1)
# Wins
for _ in range(wins):
data.record(2)
I thought this was clear from the way the code is written, but it seems not. Sorry about that.
Here is some code that uses the class to do simulations
Code: Select all
import random,sys def pick(w,d,l): s=random.random() if s<=w: return 2 elif s<=w+d: return 1 else: return 0 def simulate(draw_ratio,epsilon,alpha,elo): """ We simulate the test H0:elo_diff=0 versus H1:elo_diff=epsilon with elo_diff equal to elo. """ sp=SimpleSPRT(alpha=alpha,beta=alpha,elo0=0,elo1=epsilon) s=L(elo) d=draw_ratio w=s-(1/2)*d l=1-s-(1/2)*d while True: r=pick(w,d,l) sp.record(r) status=sp.status() if status!='': return status if __name__=='__main__': epsilon=5 a={'H0':0,'H1':0} cc=0 while True: status=simulate(draw_ratio=0.33,epsilon=epsilon,alpha=0.05,elo=0) a[status]+=1 cc+=1 print a['H0']/cc, a['H1']/cc sys.stdout.flush()
The pick() return 0, 1, 2, and you do sp.record(r).
Code: Select all
r=pick(w,d,l)
sp.record(r)
Code: Select all
# Losses
for _ in range(losses):
data.record(0)
# Draws
for _ in range(draws):
data.record(1)
# Wins
for _ in range(wins):
data.record(2)
I don't know if it was intended for simulation.
But still my input is valid, just consider that my last input is the last call to
self.record(). Then you see I print the status after that given L/D/W stats.
From the sample that I use of L/D/W it returns H0. So that would mean that I should have exited my test, but using SPRT is should continue.
Code: Select all
TestEngine: D2015.1.35.239.tuned
BaseEngine: D2015.1.35.239
Elo0: -1.50, Elo1: 4.50, alpha: 0.05, beta: 0.05
T: 2467, W: 637, L: 559, D: 1271, Score: 51.6%, NetW: +78
Elo: +10, err95: +/-9, LOS: 0.98816
LLR: 1.84, [-2.94, +2.94]
Code: Select all
enter alpha? 0.05
enter beta? 0.05
enter elo0? -1.5
enter elo1? 4.5
enter losses? 559
enter draws? 1271
enter wins? 637
status: H0
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: The SPRT without draw model, elo model or whatever...
I still do not see what your problem is. If you are worried that the log likelihood ratio (LLR) reported by SF is different from the LLR reported by SimpleSPRT then that is simply due to the fact that SF uses _Bayes Elo's_ as input. 4.5 elo for SF is perhaps 2.8 real elo.
=====================
Something else: I do not think your use of the SPRT is theoretically correct. The error probabilities for the SPRT are based on the assumption that the test is stopped when the LLR crosses one of the boundaries. If you truncated it after a fixed amount of time then the computation is different. For one thing, the LLR can be in the neutral region.
I checked the LLR in your example and it was actually in the neutral region (2.39957). So the code should have reported '' instead of 'H0'. But 'H0' had been true before (because you inputted the losses first) and the code never resets the status back to '', as in a correctly used sequential test this is never necessary.
=====================
Something else: I do not think your use of the SPRT is theoretically correct. The error probabilities for the SPRT are based on the assumption that the test is stopped when the LLR crosses one of the boundaries. If you truncated it after a fixed amount of time then the computation is different. For one thing, the LLR can be in the neutral region.
I checked the LLR in your example and it was actually in the neutral region (2.39957). So the code should have reported '' instead of 'H0'. But 'H0' had been true before (because you inputted the losses first) and the code never resets the status back to '', as in a correctly used sequential test this is never necessary.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: The SPRT without draw model, elo model or whatever...
I don't have a problem, I am just trying to use your posted class (check my first post). Then I get H0 based from the L/D/W that I use, and I let you interpret that.Michel wrote:I still do not see what your problem is.
You were saying my use of class is not really right, and I explain that it is just the same, you are using simulation updating record per run, and I am updating at a given sum of games with L/D/W.
Now I see the limitation of this function.Michel wrote: I checked the LLR in your example and it was actually in the neutral region (2.39957). So the code should have reported '' instead of 'H0'. But 'H0' had been true before (because you inputted the losses first) and the code never resets the status back to '', as in a correctly used sequential test this is never necessary.
An else should be added to cover the empty status. This is also becausedef record(self,result):
self.results[result]+=1
LLR=LL(self.score1,self.results)-LL(self.score0,self.results)
if LLR>self.LB:
self._status='H1'
elif LLR<self.LA:
self._status='H0'
that LLR changes as results changes affecting the status.
Tried again with that change and I got empty.def record(self,result):
self.results[result]+=1
LLR=LL(self.score1,self.results)-LL(self.score0,self.results)
if LLR>self.LB:
self._status='H1'
elif LLR<self.LA:
self._status='H0'
else:
self._status=''
Code: Select all
enter alpha? 0.05
enter beta? 0.05
enter elo0? -1.5
enter elo1? 4.5
enter losses? 559
enter draws? 1271
enter wins? 637
status:
-
- Posts: 1968
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: The SPRT without draw model, Elo model or whatever...
Hello:
I wrote a LLR calculator a while ago and I use Bayeselo units in SPRT bounds. My LLR calculator agrees with Ferd's LLR result of 1.84:
I never disagree with SF testing framework. I know that LLR ~ 1.84 means nothing because SPRT is sequential and LLR could cross bounds an even number of times to reach a neutral region where lower_bound < LLR < upper_bound.
I do not know how you got LLR ~ 2.4, Michel. If I do upper_bound/lower_bound = -3 and I use Bayeselo units, I get this LLR ~ 2.3996 with SPRT(-2.0409, 6.1227), using +637 =1271 -559, of course. With my drawelo result of 198.2956, then 1.5/{4*10^(198.2956/400)/[1 + 10^(198.2956/400)]²} ~ 2.044030841, which is similar to 2.0409. With SPRT(-2.0440, 6.1321) I obtain LLR ~ 2.4027.
OTOH I think that I understand you when you said that Ferd inputs loses first. My thought is the following one:
In my LLR calculator I only input the state +637 =1271 -559 without looking at the previous states.
I wonder what would happen if draws or wins are first input in Ferd's code. If wins are the first input, the code will return 'H1'? If draws are the first input, will the report depend on the second input (wins or losses)? The best thing is calculate LLR after each WDL state, just like every SPRT simulator.
I am glad that Ferd solved the issue.
Regards from Spain.
Ajedrecista.
I wrote a LLR calculator a while ago and I use Bayeselo units in SPRT bounds. My LLR calculator agrees with Ferd's LLR result of 1.84:
Code: Select all
Parameters found at LLR_parameters.txt file:
alpha: 0.0500
beta: 0.0500
bayeselo_0: -1.5000
bayeselo_1: 4.5000
----------------------------
Lower bound for LLR: -2.9444
Upper bound for LLR: 2.9444
----------------------------
Games: 2467
Wins: 637 (25.82 %).
Loses: 559 (22.66 %).
Draws: 1271 (51.52 %).
bayeselo: 14.9710
drawelo: 198.2956
----------------------------
LLR(wins): 16.6408
LLR(loses): -14.6643
LLR(draws): -0.1391
LLR: 1.8374
I do not know how you got LLR ~ 2.4, Michel. If I do upper_bound/lower_bound = -3 and I use Bayeselo units, I get this LLR ~ 2.3996 with SPRT(-2.0409, 6.1227), using +637 =1271 -559, of course. With my drawelo result of 198.2956, then 1.5/{4*10^(198.2956/400)/[1 + 10^(198.2956/400)]²} ~ 2.044030841, which is similar to 2.0409. With SPRT(-2.0440, 6.1321) I obtain LLR ~ 2.4027.
OTOH I think that I understand you when you said that Ferd inputs loses first. My thought is the following one:
Code: Select all
LLR computed after each state i):
1) +0 =0 -0
2) +0 =0 -1
3) +0 =0 -2
[...] // LLR could have crossed lower bound somewhere here, hence 'H0' report.
// IIRC, I had problems in the past when wins*draws*loses = 0 (that is, I needed non-zero values).
559) +0 =0 -558
560) +0 =0 -559
561) +0 =1 -559 // Probably here LLR < lower_bound according with my last comment.
562) +0 =2 -559
[...]
1830) +0 =1270 -559
1831) +0 =1271 -559
1832) +1 =1271 -559
1833) +2 =1271 -559
[...]
2467) +636 =1271 -559
2468) +637 =1271 -559
I wonder what would happen if draws or wins are first input in Ferd's code. If wins are the first input, the code will return 'H1'? If draws are the first input, will the report depend on the second input (wins or losses)? The best thing is calculate LLR after each WDL state, just like every SPRT simulator.
I am glad that Ferd solved the issue.
Regards from Spain.
Ajedrecista.