TalkChess.com

Posted: **Fri Aug 21, 2015 2:55 pm**

Michel wrote: It is not clear to me if and when the SPRT will terminate with probabilty one if the number of games of A is kept fixed (it seems like an easy problem, but I have not taken the time to consider it properly). Obviously if eloA is only vaguely known, and the difference between eloA and eloB is small, one will never be able to prove there is a difference no matter how many games B plays (recall that A and B are not playing each other).

A quick back of the envelope calculation shows that under the following conditions

(0) The number of games of A is kept constant.
(1) eloA=eloA' (i.e. the measured elo of A, which stays constant, is equal to the true elo).
(2) eloB=eloA+epsilon/2 (i.e. halfway between H0 and H1)

there is a non-zero probability that the SPRT will not terminate.

Even if H0 or H1 is true _and epsilon is below some easily calculated bound_, depending on the number of games played by A, there will be a non-zero probability that the SPRT does not terminate.

So the conclusion is that when using the SPRT in this fashion one should add games for both A and B (although probably not in the same ratio).

Posted: **Tue Aug 25, 2015 12:42 am**

The results in the paper I quoted are only of asymptotic nature so I did some simulations to make sure that they also apply in the non-asymptotic cases we are considering and the good news is that they do! At least as far as the Type I/II error probabilities are concerned.

Of course the number of games required to get a sensible elo resolution when playing against foreign opponents is horrible compared with self play.

For those that are interested: below is the code to extract the (bayes)elo values from matrices with W/D/L values. The code is inefficient since L is just the transpose of W but I am too lazy to fix this now.

Code: Select all

 from __future__ import division
import math,scipy,scipy.optimize
import random
import numpy as np

bb=math.log&#40;10&#41;/400

def L_&#40;x&#41;&#58;
    return 1/&#40;1+np.exp&#40;-bb*x&#41;)

def WDL&#40;de,elo&#41;&#58;
    size=elo.shape&#91;0&#93;
    elo_columns=elo&#91;np.newaxis&#93;  
    elo_rows=elo&#91;&#58;,np.newaxis&#93; 
    elo_diff=np.repeat&#40;elo_rows,size,1&#41;-np.repeat&#40;elo_columns,size,0&#41;
    W=L_&#40;elo_diff-de&#41;
    L=L_(-elo_diff-de&#41;
    D=1-W-L
    return &#40;W,D,L&#41;

def LL&#40;de,elo,w,d,l&#41;&#58;
    &#40;W,D,L&#41;=WDL&#40;de,elo&#41;
    return&#40;np.sum&#40;w*np.log&#40;W&#41;+d*np.log&#40;D&#41;+l*np.log&#40;L&#41;))/2

def elo&#40;de,elo_diff,w,d,l&#41;&#58;
    return BayesElo&#40;de,np.array&#40;&#91;-elo_diff/2,elo_diff/2&#93;),w,d,l&#41;

def LLR&#40;de,elo_diff1,elo_diff2,w,d,l&#41;&#58;
    elo1=elo&#40;de,elo_diff1,w,d,l&#41;
    elo2=elo&#40;de,elo_diff2,w,d,l&#41;
    return LL&#40;de,elo2,w,d,l&#41;-LL&#40;de,elo1,w,d,l&#41;

def BayesElo&#40;de,elo,w,d,l&#41;&#58;
    """ de is draw elo """
    """ elo is a numpy array of presets """
    """ w,d,l are square numpy arrays with win,draw,loss counts """
    if elo.shape&#91;0&#93;==0&#58;
        elo=np.array&#40;&#91;0&#93;)
    vars=w.shape&#91;0&#93;-elo.shape&#91;0&#93;
    ret=scipy.optimize.minimize&#40;lambda x&#58;-LL&#40;de,np.concatenate&#40;&#40;elo,x&#41;),w,d,l&#41;,np.zeros&#40;vars&#41;,options=&#123;'disp'&#58;False&#125;)
    return np.concatenate&#40;&#40;elo,ret.x&#41;)

Posted: **Tue Aug 25, 2015 12:14 pm**

Ok for possible further reference here is an implementation of the GSPRT

http://hardy.uhasselt.be/Toga/GSPRT.py

Here is the documentation of the main class

Code: Select all

class GSPRT&#58;
    """ 
This class performs a GSPRT for H0&#58;elo&#40;player1&#41;-elo&#40;player0&#41;=elo_diff1 versus H1&#58;elo&#40;player1&#41;-elo&#40;player0&#41;=elo_diff2. 
See here for a description of the GSPRT as well as theoretical &#40;asymptotic&#41; results.

http&#58;//stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf

To record a result use the method self.record&#40;player1,player2,result&#41; where "result" is one of 'w','d','l'. 

To check the status of the test at any time use self.status&#40;). This method return a tuple whose first entry 
is the number of games so far, whose second entry is a string which is either 'H0','H1' or '' 
and whose third parameter is a list of estimated elo values. Note that when the test terminates these estimates are
heavily biased. So their main purpose is entertainment.

The granularity parameter to the constructor controls how often the log likelihood ratio is recomputed. 
Making this parameter larger than one will speed up simulations. Note that making the granularity high will reduce 
the Type I/II parameters below their  design values because of overshooting. It is known how to correct for this &#40;by estimating
the overshoot&#41; but this has not been implemented.

When using this class with actual games one should use the default granularity (=1&#41;.
"""
    def __init__&#40;self,alpha=0.05,beta=0.05,elo_diff1=0,elo_diff2=5,draw_elo=100,players=3,granularity=1&#41;&#58;

To illustrate the use of this class the script performs a simulation.

Code: Select all

def simulate&#40;granularity,de,epsilon,alpha,elo1,elo2,elo3&#41;&#58;
    """
We simulate testing a new version of an engine with the help of one foreign engine.
"""
    g=GSPRT&#40;alpha=alpha,beta=alpha,draw_elo=de,granularity=granularity,elo_diff1=0,elo_diff2=epsilon,players=3&#41;
    elo=np.array&#40;&#91;elo1,elo2,elo3&#93;)
    W,D,L=WDL&#40;de,elo&#41;
    for i in xrange&#40;0,10000000&#41;&#58;
        r=i%2
        c=2
        p=pick&#40;W&#91;r,c&#93;,D&#91;r,c&#93;,L&#91;r,c&#93;)
        g.record&#40;r,c,p&#41;
        if g.status&#40;)&#91;1&#93;!=''&#58;
            return g.status&#40;)

TalkChess.com

SPRT question

Re: SPRT question

Re: SPRT question

Re: SPRT question