The SPRT without draw model, elo model or whatever...

Michel · Post by **Michel** » Tue Sep 01, 2015 9:29 am

A typical implementation of the SPRT (e.g. cutechess) now runs as follows:

(1) Estimate the draw_elo parameter of the BayesElo model from the sample.
(2) Use a "scale" given by a standard formula to convert logistic elo's (input to the SPRT) to Bayes elo's.
(3) Perform the SPRT using the BayesElo model.

This is partially the result of my own suggestion long ago to use the Bayes Elo model for the SPRT.

However I have now come to realize that this procedure is ridiculously circuitous! My suggestion was based on a lack of experience in statistics.

After all a match between two engines just follows a trinomial distribution. Thus we may perform directly a GSPRT(*) for H0:expected_score=score0 against H1:expected_score=score1 with unknown parameter the draw ratio. This is mathematically much cleaner and also much easier to implement....

(*) The GSPRT is a variant on the SPRT when there are unknown parameters. One replaces the log likelihood used in the SPRT by its maximum over the parameter space subject to the conditions H0 and H1. The GSPRT satisfies the same optimality properties as the SPRT, at least asymptotically.

Laskos · Post by **Laskos** » Tue Sep 01, 2015 11:06 am

Michel wrote:A typical implementation of the SPRT (e.g. cutechess) now runs as follows:

(1) Estimate the draw_elo parameter of the BayesElo model from the sample.
(2) Use a "scale" given by a standard formula to convert logistic elo's (input to the SPRT) to Bayes elo's.
(3) Perform the SPRT using the BayesElo model.

This is partially the result of my own suggestion long ago to use the Bayes Elo model for the SPRT.

However I have now come to realize that this procedure is ridiculously circuitous! My suggestion was based on a lack of experience in statistics.

After all a match between two engines just follows a trinomial distribution. Thus we may perform directly a GSPRT(*) for H0:expected_score=score0 against H1:expected_score=score1 with unknown parameter the draw ratio. This is mathematically much cleaner and also much easier to implement....

(*) The GSPRT is a variant on the SPRT when there are unknown parameters. One replaces the log likelihood used in the SPRT by its maximum over the parameter space subject to the conditions H0 and H1. The GSPRT satisfies the same optimality properties as the SPRT, at least asymptotically.

Good. It always seemed to me that this Bayeselo draw model for SPRT is a useless pain in the neck of using a draw model when we do know the trinomial distribution. Didn't know about GSPRT, and to avoid the pain, in analytical derivations I simply used draw_elo=0, using binomial only. I also used binomial distribution in order to approximate the trinomial distribution, it fits very well in most cases (draw_elo now is not 0, but I get rid of it).

Michel · Post by **Michel** » Tue Sep 01, 2015 1:19 pm

Here is some quick code that implements this. I have kept things as simple possible.

Code: Select all

from __future__ import division

import math

bb=math.log&#40;10&#41;/400

def L&#40;x&#41;&#58;
    return 1/&#40;1+math.exp&#40;-bb*x&#41;)

def DrawRatioMLE&#40;s,results&#41;&#58;
    """ 
We compute the MLE for the draw ratio, subject to the condition score=s.
This analytically correct formula was computed with Sage. Probably
D/&#40;W+D+L&#41; works just as well.
"""
    L=results&#91;0&#93;
    D=results&#91;1&#93;
    W=results&#91;2&#93;
    return &#40;L*s - W*s + D + W - math.sqrt&#40;4*D**2*s**2 + 4*D*L*s**2 + 4*D*W*s**2 + L**2*s**2 - 2*L*W*s**2 + W**2*s**2 - 4*D**2*s - 2*D*L*s - 6*D*W*s + 2*L*W*s - 2*W**2*s + D**2 + 2*D*W + W**2&#41;)/&#40;W+D+L&#41;

def LL&#40;s,results&#41;&#58;
    """
Compute the log likelihood for score=s.
"""
    d=DrawRatioMLE&#40;s,results&#41;
    w=s-&#40;1/2&#41;*d
    l=1-s-&#40;1/2&#41;*d
    L=results&#91;0&#93;
    D=results&#91;1&#93;
    W=results&#91;2&#93;
# sanity
    if W==0 or D==0 or L==0&#58;
        return 0
    else&#58;
        return W*math.log&#40;w&#41;+D*math.log&#40;d&#41;+L*math.log&#40;l&#41;

#####################################################################################################

class SimpleSPRT&#58;
    """
    This class performs a GSPRT for H0&#58;elo=elo0 versus H1&#58;elo=elo1 
    See here for a description of the GSPRT as well as theoretical &#40;asymptotic&#41; results.
    
    http&#58;//stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf
    
    To record the outcome of a game use the method self.record&#40;result&#41; where "result" is one of 0,1,2,
    corresponding respectively to a loss, a draw and a win.
"""

    def __init__&#40;self,alpha=0.05,beta=0.05,elo0=0,elo1=5&#41;&#58;
        self.score0=L&#40;elo0&#41;
        self.score1=L&#40;elo1&#41;
        self.LA=math.log&#40;beta/&#40;1-alpha&#41;)
        self.LB=math.log&#40;&#40;1-beta&#41;/alpha&#41;
        self.results=3*&#91;0&#93;
        self._status=''

    def record&#40;self,result&#41;&#58;
        self.results&#91;result&#93;+=1
        LLR=LL&#40;self.score1,self.results&#41;-LL&#40;self.score0,self.results&#41;
        if LLR>self.LB&#58;
            self._status='H1'
        elif LLR<self.LA&#58;
            self._status='H0'

    def status&#40;self&#41;&#58;
        return self._status

Ferdy · Post by **Ferdy** » Tue Sep 01, 2015 4:35 pm

Michel wrote:Here is some quick code that implements this. I have kept things as simple possible.

Code: Select all

from __future__ import division

import math

bb=math.log&#40;10&#41;/400

def L&#40;x&#41;&#58;
    return 1/&#40;1+math.exp&#40;-bb*x&#41;)

def DrawRatioMLE&#40;s,results&#41;&#58;
    """ 
We compute the MLE for the draw ratio, subject to the condition score=s.
This analytically correct formula was computed with Sage. Probably
D/&#40;W+D+L&#41; works just as well.
"""
    L=results&#91;0&#93;
    D=results&#91;1&#93;
    W=results&#91;2&#93;
    return &#40;L*s - W*s + D + W - math.sqrt&#40;4*D**2*s**2 + 4*D*L*s**2 + 4*D*W*s**2 + L**2*s**2 - 2*L*W*s**2 + W**2*s**2 - 4*D**2*s - 2*D*L*s - 6*D*W*s + 2*L*W*s - 2*W**2*s + D**2 + 2*D*W + W**2&#41;)/&#40;W+D+L&#41;

def LL&#40;s,results&#41;&#58;
    """
Compute the log likelihood for score=s.
"""
    d=DrawRatioMLE&#40;s,results&#41;
    w=s-&#40;1/2&#41;*d
    l=1-s-&#40;1/2&#41;*d
    L=results&#91;0&#93;
    D=results&#91;1&#93;
    W=results&#91;2&#93;
# sanity
    if W==0 or D==0 or L==0&#58;
        return 0
    else&#58;
        return W*math.log&#40;w&#41;+D*math.log&#40;d&#41;+L*math.log&#40;l&#41;

#####################################################################################################

class SimpleSPRT&#58;
    """
    This class performs a GSPRT for H0&#58;elo=elo0 versus H1&#58;elo=elo1 
    See here for a description of the GSPRT as well as theoretical &#40;asymptotic&#41; results.
    
    http&#58;//stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf
    
    To record the outcome of a game use the method self.record&#40;result&#41; where "result" is one of 0,1,2,
    corresponding respectively to a loss, a draw and a win.
"""

    def __init__&#40;self,alpha=0.05,beta=0.05,elo0=0,elo1=5&#41;&#58;
        self.score0=L&#40;elo0&#41;
        self.score1=L&#40;elo1&#41;
        self.LA=math.log&#40;beta/&#40;1-alpha&#41;)
        self.LB=math.log&#40;&#40;1-beta&#41;/alpha&#41;
        self.results=3*&#91;0&#93;
        self._status=''

    def record&#40;self,result&#41;&#58;
        self.results&#91;result&#93;+=1
        LLR=LL&#40;self.score1,self.results&#41;-LL&#40;self.score0,self.results&#41;
        if LLR>self.LB&#58;
            self._status='H1'
        elif LLR<self.LA&#58;
            self._status='H0'

    def status&#40;self&#41;&#58;
        return self._status

Try to make use of your class and input my previous sprt records.

My sprt record, formula based from Sf,

Code: Select all

TestEngine&#58; D2015.1.35.239.tuned
BaseEngine&#58; D2015.1.35.239
Elo0&#58; -1.50, Elo1&#58; 4.50, alpha&#58; 0.05, beta&#58; 0.05
T&#58; 2467, W&#58; 637, L&#58; 559, D&#58; 1271, Score&#58; 51.6%, NetW&#58; +78
Elo&#58; +10, err95&#58; +/-9, LOS&#58; 0.98816
LLR&#58; 1.84, &#91;-2.94, +2.94&#93;

Using your class,

Code: Select all

enter alpha? 0.05
enter beta? 0.05

enter elo0? -1.5
enter elo1? 4.5

enter losses? 559
enter draws? 1271
enter wins? 637

status&#58; H0

So how to interpret that?

Here is the interface,

Code: Select all

from gsprt import SimpleSPRT
import sys

def main&#40;argv&#41;&#58;

    alpha = float&#40;raw_input&#40;'enter alpha? '))
    beta = float&#40;raw_input&#40;'enter beta? '))
    print
    elo0 = float&#40;raw_input&#40;'enter elo0? '))
    elo1 = float&#40;raw_input&#40;'enter elo1? '))
    print
    losses = int&#40;raw_input&#40;'enter losses? '))
    draws = int&#40;raw_input&#40;'enter draws? '))
    wins = int&#40;raw_input&#40;'enter wins? '))
    print

    data = SimpleSPRT&#40;alpha, beta, elo0, elo1&#41;

    # Losses
    for _ in range&#40;losses&#41;&#58;
        data.record&#40;0&#41;

    # Draws
    for _ in range&#40;draws&#41;&#58;
        data.record&#40;1&#41;

    # Wins
    for _ in range&#40;wins&#41;&#58;
        data.record&#40;2&#41;
        
    print 'status&#58; %s' % data.status&#40;)
    

if __name__ == "__main__"&#58;
    main&#40;sys.argv&#91;1&#58;&#93;)

Michel · Post by **Michel** » Tue Sep 01, 2015 4:55 pm

So how to interpret that?

I have not yet checked your computation yet but the input to SimpleSPRT is in standard elo (as in cutechess-cli).

The input of the SPRT as used by Stockfish is in Bayes Elo, which is really a historical accident. It makes the interpretation of the paramaters needlessly difficult.

The conversion between Bayes Elo and standard elo can be done using a scale factor given by the following formula

(4*math.exp(-bb*de))/(1+math.exp(-bb*de))**2

where bb=math.log(10)/400 and de is draw_elo.

Michel · Post by **Michel** » Tue Sep 01, 2015 5:18 pm

# Losses
for _ in range(losses):
data.record(0)

# Draws
for _ in range(draws):
data.record(1)

# Wins
for _ in range(wins):
data.record(2)

Sorry this is not how one uses the class. This is a sequential test. So one should check self.status() after every self.record() and bail out as soon as self.status() returns 'H0' or 'H1'.

I thought this was clear from the way the code is written, but it seems not. Sorry about that.

Here is some code that uses the class to do simulations

Code: Select all

import random,sys

def pick&#40;w,d,l&#41;&#58;
    s=random.random&#40;)
    if s<=w&#58;
        return 2
    elif s<=w+d&#58;
        return 1
    else&#58;
        return 0

def simulate&#40;draw_ratio,epsilon,alpha,elo&#41;&#58;
    """
    We simulate the test H0&#58;elo_diff=0 versus H1&#58;elo_diff=epsilon with elo_diff equal
    to elo.
"""
    sp=SimpleSPRT&#40;alpha=alpha,beta=alpha,elo0=0,elo1=epsilon&#41;
    s=L&#40;elo&#41;
    d=draw_ratio
    w=s-&#40;1/2&#41;*d
    l=1-s-&#40;1/2&#41;*d
    while True&#58;
        r=pick&#40;w,d,l&#41;
        sp.record&#40;r&#41;
        status=sp.status&#40;)
        if status!=''&#58;
            return status

if __name__=='__main__'&#58;
    epsilon=5
    a=&#123;'H0'&#58;0,'H1'&#58;0&#125;
    cc=0
    while True&#58;
        status=simulate&#40;draw_ratio=0.33,epsilon=epsilon,alpha=0.05,elo=0&#41;
        a&#91;status&#93;+=1
        cc+=1
        print a&#91;'H0'&#93;/cc, a&#91;'H1'&#93;/cc
        sys.stdout.flush&#40;)

Ferdy · Post by **Ferdy** » Tue Sep 01, 2015 5:55 pm

Michel wrote:
# Losses
for _ in range(losses):
data.record(0)

# Draws
for _ in range(draws):
data.record(1)

# Wins
for _ in range(wins):
data.record(2)
Sorry this is not how one uses the class. This is a sequential test. So one should check self.status() after every self.record() and bail out as soon as self.status() returns 'H0' or 'H1'.

I thought this was clear from the way the code is written, but it seems not. Sorry about that.

Here is some code that uses the class to do simulations
Code: Select all
import random,sys

def pick&#40;w,d,l&#41;&#58;
    s=random.random&#40;)
    if s<=w&#58;
        return 2
    elif s<=w+d&#58;
        return 1
    else&#58;
        return 0

def simulate&#40;draw_ratio,epsilon,alpha,elo&#41;&#58;
    """
    We simulate the test H0&#58;elo_diff=0 versus H1&#58;elo_diff=epsilon with elo_diff equal
    to elo.
"""
    sp=SimpleSPRT&#40;alpha=alpha,beta=alpha,elo0=0,elo1=epsilon&#41;
    s=L&#40;elo&#41;
    d=draw_ratio
    w=s-&#40;1/2&#41;*d
    l=1-s-&#40;1/2&#41;*d
    while True&#58;
        r=pick&#40;w,d,l&#41;
        sp.record&#40;r&#41;
        status=sp.status&#40;)
        if status!=''&#58;
            return status

if __name__=='__main__'&#58;
    epsilon=5
    a=&#123;'H0'&#58;0,'H1'&#58;0&#125;
    cc=0
    while True&#58;
        status=simulate&#40;draw_ratio=0.33,epsilon=epsilon,alpha=0.05,elo=0&#41;
        a&#91;status&#93;+=1
        cc+=1
        print a&#91;'H0'&#93;/cc, a&#91;'H1'&#93;/cc
        sys.stdout.flush&#40;)

You are doing simulation, and I am inputing the values. But as I undertand the use of class is the same.
The pick() return 0, 1, 2, and you do sp.record(r).

Code: Select all

r=pick&#40;w,d,l&#41;
        sp.record&#40;r&#41;

Same as what I did, just the L/D/W were accumulated before it was applied to get the status.

Code: Select all

# Losses 
for _ in range&#40;losses&#41;&#58; 
data.record&#40;0&#41; 

# Draws 
for _ in range&#40;draws&#41;&#58; 
data.record&#40;1&#41; 

# Wins 
for _ in range&#40;wins&#41;&#58; 
data.record&#40;2&#41;

If there are 100 loses, I recorded it 100 times by looping thru 0 ... 99, and each time calls the record() too.

I don't know if it was intended for simulation.

But still my input is valid, just consider that my last input is the last call to
self.record(). Then you see I print the status after that given L/D/W stats.

From the sample that I use of L/D/W it returns H0. So that would mean that I should have exited my test, but using SPRT is should continue.

Code: Select all

TestEngine&#58; D2015.1.35.239.tuned 
BaseEngine&#58; D2015.1.35.239 
Elo0&#58; -1.50, Elo1&#58; 4.50, alpha&#58; 0.05, beta&#58; 0.05 
T&#58; 2467, W&#58; 637, L&#58; 559, D&#58; 1271, Score&#58; 51.6%, NetW&#58; +78 
Elo&#58; +10, err95&#58; +/-9, LOS&#58; 0.98816 
LLR&#58; 1.84, &#91;-2.94, +2.94&#93;

Code: Select all

enter alpha? 0.05 
enter beta? 0.05 

enter elo0? -1.5 
enter elo1? 4.5 

enter losses? 559 
enter draws? 1271 
enter wins? 637 

status&#58; H0

Michel · Post by **Michel** » Tue Sep 01, 2015 6:49 pm

I still do not see what your problem is. If you are worried that the log likelihood ratio (LLR) reported by SF is different from the LLR reported by SimpleSPRT then that is simply due to the fact that SF uses _Bayes Elo's_ as input. 4.5 elo for SF is perhaps 2.8 real elo.

=====================

Something else: I do not think your use of the SPRT is theoretically correct. The error probabilities for the SPRT are based on the assumption that the test is stopped when the LLR crosses one of the boundaries. If you truncated it after a fixed amount of time then the computation is different. For one thing, the LLR can be in the neutral region.

I checked the LLR in your example and it was actually in the neutral region (2.39957). So the code should have reported '' instead of 'H0'. But 'H0' had been true before (because you inputted the losses first) and the code never resets the status back to '', as in a correctly used sequential test this is never necessary.

Ferdy · Post by **Ferdy** » Tue Sep 01, 2015 7:53 pm

Michel wrote:I still do not see what your problem is.

I don't have a problem, I am just trying to use your posted class (check my first post). Then I get H0 based from the L/D/W that I use, and I let you interpret that.
You were saying my use of class is not really right, and I explain that it is just the same, you are using simulation updating record per run, and I am updating at a given sum of games with L/D/W.

Michel wrote: I checked the LLR in your example and it was actually in the neutral region (2.39957). So the code should have reported '' instead of 'H0'. But 'H0' had been true before (because you inputted the losses first) and the code never resets the status back to '', as in a correctly used sequential test this is never necessary.

Now I see the limitation of this function.

def record(self,result):
self.results[result]+=1
LLR=LL(self.score1,self.results)-LL(self.score0,self.results)
if LLR>self.LB:
self._status='H1'
elif LLR<self.LA:
self._status='H0'

An else should be added to cover the empty status. This is also because
that LLR changes as results changes affecting the status.

def record(self,result):
self.results[result]+=1
LLR=LL(self.score1,self.results)-LL(self.score0,self.results)
if LLR>self.LB:
self._status='H1'
elif LLR<self.LA:
self._status='H0'
else:
self._status=''

Tried again with that change and I got empty.

Code: Select all

enter alpha? 0.05
enter beta? 0.05

enter elo0? -1.5
enter elo1? 4.5

enter losses? 559
enter draws? 1271
enter wins? 637

status&#58;

So the sprt that I use agreed with your gsprt, meaning the test continues.

Ajedrecista · Post by **Ajedrecista** » Tue Sep 01, 2015 8:01 pm

Hello:

I wrote a LLR calculator a while ago and I use Bayeselo units in SPRT bounds. My LLR calculator agrees with Ferd's LLR result of 1.84:

Code: Select all

Parameters found at LLR_parameters.txt file&#58;
 
alpha&#58;        0.0500
beta&#58;         0.0500
 
bayeselo_0&#58;  -1.5000
bayeselo_1&#58;   4.5000
 
----------------------------
 
Lower bound for LLR&#58; -2.9444
Upper bound for LLR&#58;  2.9444
 
----------------------------
 
Games&#58;       2467
 
Wins&#58;         637 &#40;25.82 %).
Loses&#58;        559 &#40;22.66 %).
Draws&#58;       1271 &#40;51.52 %).
 
bayeselo&#58;    14.9710
drawelo&#58;    198.2956
 
----------------------------
 
  LLR&#40;wins&#41;&#58;       16.6408
  LLR&#40;loses&#41;&#58;     -14.6643
  LLR&#40;draws&#41;&#58;      -0.1391
 
         LLR&#58;       1.8374

I never disagree with SF testing framework. I know that LLR ~ 1.84 means nothing because SPRT is sequential and LLR could cross bounds an even number of times to reach a neutral region where lower_bound < LLR < upper_bound.

I do not know how you got LLR ~ 2.4, Michel. If I do upper_bound/lower_bound = -3 and I use Bayeselo units, I get this LLR ~ 2.3996 with SPRT(-2.0409, 6.1227), using +637 =1271 -559, of course. With my drawelo result of 198.2956, then 1.5/{4*10^(198.2956/400)/[1 + 10^(198.2956/400)]²} ~ 2.044030841, which is similar to 2.0409. With SPRT(-2.0440, 6.1321) I obtain LLR ~ 2.4027.

OTOH I think that I understand you when you said that Ferd inputs loses first. My thought is the following one:

Code: Select all

LLR computed after each state i&#41;&#58;

1&#41; +0 =0 -0
2&#41; +0 =0 -1
3&#41; +0 =0 -2

&#91;...&#93;  // LLR could have crossed lower bound somewhere here, hence 'H0' report.
// IIRC, I had problems in the past when wins*draws*loses = 0 &#40;that is, I needed non-zero values&#41;.

559&#41; +0 =0 -558
560&#41; +0 =0 -559
561&#41; +0 =1 -559  // Probably here LLR < lower_bound according with my last comment.
562&#41; +0 =2 -559

&#91;...&#93;

1830&#41; +0 =1270 -559
1831&#41; +0 =1271 -559
1832&#41; +1 =1271 -559
1833&#41; +2 =1271 -559

&#91;...&#93;

2467&#41; +636 =1271 -559
2468&#41; +637 =1271 -559

In my LLR calculator I only input the state +637 =1271 -559 without looking at the previous states.

I wonder what would happen if draws or wins are first input in Ferd's code. If wins are the first input, the code will return 'H1'? If draws are the first input, will the report depend on the second input (wins or losses)? The best thing is calculate LLR after each WDL state, just like every SPRT simulator.

I am glad that Ferd solved the issue.

Regards from Spain.

Ajedrecista.

The SPRT without draw model, elo model or whatever...

The SPRT without draw model, elo model or whatever...

Re: The SPRT without draw model, elo model or whatever...

Re: The SPRT without draw model, elo model or whatever...

Re: The SPRT without draw model, elo model or whatever...

Re: The SPRT without draw model, elo model or whatever...

Re: The SPRT without draw model, elo model or whatever...

Re: The SPRT without draw model, elo model or whatever...

Re: The SPRT without draw model, elo model or whatever...

Re: The SPRT without draw model, elo model or whatever...

Re: The SPRT without draw model, Elo model or whatever...