sprt tourney manager

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

Laskos wrote:
Michel wrote:
Laskos wrote:
Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.
It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.
Thanks for the explanation. I put a link for the script computing LLR's here.

http://hardy.uhasselt.be/Toga/computeLLR.py

Now that there is an implementation in the Amoebe tourney manager perhaps cutechess-cli can follow? The code is really trivial. Much easier than the current code in cutechess-cli.
Thanks for the script! Yes, this Richard's implementation is the most straightforward use of SPRT for 3- and 5-nomials, maybe Cutechess-Cli will follow. I tried to compute 3-nomial expression for LLR in closed form without substitutions, I hoped it will simplify a bit, but the most I get is still very long, and 5-nomial would be even longer. Just for fun, LLR for 3-nomial:
Image
In fact, without any Elo, expressing elo0 and elo1 as scores S0 and S1, for example elo0=0 => S0=0.5, elo1=5 =>S1=0.5073, one gets a simpler expression for 3-nomial LLR:
Image
LLR = (2 N*(N - 1)*(S1 - S0) [draws (1 - S0 - S1) - (losses + wins) (S0 + S1) + 2 wins])/(draws (losses + wins) + 4 losses wins)
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: sprt tourney manager

Post by Michel »

An analytically correct formula for the trinomial LLR is given in this thread

http://talkchess.com/forum/viewtopic.ph ... simplesprt

(see the highlighted code in the 3d post). Some time ago someone posted the same formula on the fishcooking forum.

I do not know if the formula is equivalent to yours.

I also do not know if there is an analytically correct formula for the 5-nomial model, however the exact value is easy to calculate using a numerical library such as scipy.

I am now experimenting with an approximation which appears to be more accurate for large elo differences and which is still easy to compute.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

Michel wrote:An analytically correct formula for the trinomial LLR is given in this thread

http://talkchess.com/forum/viewtopic.ph ... simplesprt

(see the highlighted code in the 3d post). Some time ago someone posted the same formula on the fishcooking forum.

I do not know if the formula is equivalent to yours.

I also do not know if there is an analytically correct formula for the 5-nomial model, however the exact value is easy to calculate using a numerical library such as scipy.

I am now experimenting with an approximation which appears to be more accurate for large elo differences and which is still easy to compute.
Ok, thanks, I was not careful reading at that time that thread. I remember I used some sort of LLR formula stripped of ELO, BayesELO and such things 2-3 years ago when studying my cat (:)), with the substitution: E0=10^(elo0/400), E1=10^(elo1/400), E0=1, E1=1.3 in this thread in Chess Thinkers Forum (you have to be logged in to see it):
http://www.talkchess.com/forum/viewtopic.php?t=54126
I checked now the LLR stop, it is OK with alpha, beta = 0.01.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

I came up with a surprisingly simple result for LLR for trinomial function of N, S0, S1, Win, Loss, where N is the number of tries, H0: S0=score 0, H1: S1=score 1, Win=win rate, Loss=loss rate. It is valid for fairly small differences in scores for hypotheses or Win/Loss rates.
Image
LLR = 2*N*(S1 - S0)*(1 - S0 - S1 + Win - Loss)/(Loss + Win).

It is off from master formula (which is in itself an approximation IIRC) by less than 1% as far as I checked.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: sprt tourney manager

Post by Michel »

I came up with a surprisingly simple result for LLR for trinomial function of N, S0, S1, Win, Loss, where N is the number of tries, H0: S0=score 0, H1: S1=score 1, Win=win rate, Loss=loss rate. It is valid for fairly small differences in scores for hypotheses or Win/Loss rates.

LLR = 2*N*(S1 - S0)*(1 - S0 - S1 + Win - Loss)/(Loss + Win).

It is off from master formula (which is in itself an approximation IIRC) by less than 1% as far as I checked.
It is indeed a simple formula. I have been looking at the 5-nomial model again and it appears that there are several reasonable approximations of the LLR. Of course they are all asymptotically equivalent.

I believe I know a method to identify the objectively best one but I have not worked it out yet (busy).
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: sprt tourney manager

Post by Michel »

A nice alternative (asymptotically equivalent) approximation is

Code: Select all

(1/2) N*ln((Var_s+(s-s0)^2/N)/(Var_s+(s-s1)^2/N))   (*)
where s is the score and Var_s is the empiric variance for the score.

You get it by assuming that the game outcomes form a normal distribution (instead of a 3/5-nomial one) with unknown mean and variance and then you perform a GSPRT with the unknown variance as a nuissance parameter.

So instead of approximating the LLR for the original problem (a 3/5-nomial distribution), you change the original problem (3/5-nomial->normal).

Using simulation I have some tentative evidence (not conclusive) that (*) behaves better than the original approximation for short tests although I see no theoretical reason for this. I need to write a fast simulator (not Python!).

Note: to accurately evaluate SPRT variants for short tests it is mandatory to perform overshoot correction (since time is discrete the LLR will not be precisely on the boundary of the continuation interval when the test stops, i.e. there is overshoot). It is known how to do this.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

Michel wrote:A nice alternative (asymptotically equivalent) approximation is

Code: Select all

(1/2) N*ln((Var_s+(s-s0)^2/N)/(Var_s+(s-s1)^2/N))   (*)
where s is the score and Var_s is the empiric variance for the score.

You get it by assuming that the game outcomes form a normal distribution (instead of a 3/5-nomial one) with unknown mean and variance and then you perform a GSPRT with the unknown variance as a nuissance parameter.

So instead of approximating the LLR for the original problem (a 3/5-nomial distribution), you change the original problem (3/5-nomial->normal).

Using simulation I have some tentative evidence (not conclusive) that (*) behaves better than the original approximation for short tests although I see no theoretical reason for this. I need to write a fast simulator (not Python!).

Note: to accurately evaluate SPRT variants for short tests it is mandatory to perform overshoot correction (since time is discrete the LLR will not be precisely on the boundary of the continuation interval when the test stops, i.e. there is overshoot). It is known how to do this.
Nice! So, it means it could be better for larger "Elo" (or whatever we call it) differences? Observing these Amoeba runs with the original "master formula", the LLR behaves so much better in short runs compared to LOS (p-value), being expressed in simple algebraic expressions, compared to Erf for p-value and undefined Type I,II errors. Generally, your work would be useful to wider audience, as even the American Statistical Association recently issued a warning in the journal "Nature" pointing out to the misuse of p-value as reliance estimator and stopping rule, with contradictory papers appearing apparently backed by serious statistics.

Meanwhile, from your "master formula" I came up with a reasonable expression for 5-nomial LLR for small "Elo" differences: N=number of pairs of tries (possibly correlated), P00,P05,P10,P15,P20 the rates of pairs with 0,0.5,1,1.5,2.0 outcomes, S0, S1 the value of scores for hypotheses.
Image
LLR = 4*N*(S1 - S0)*(P15 - P05 + 2*P20 - 2*P00 + 2*(1 - S0 - S1))/(4*P00 + 4*P20 + P05 + P15).
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

Laskos wrote:I came up with a surprisingly simple result for LLR for trinomial function of N, S0, S1, Win, Loss, where N is the number of tries, H0: S0=score 0, H1: S1=score 1, Win=win rate, Loss=loss rate. It is valid for fairly small differences in scores for hypotheses or Win/Loss rates.
Image
LLR = 2*N*(S1 - S0)*(1 - S0 - S1 + Win - Loss)/(Loss + Win).

It is off from master formula (which is in itself an approximation IIRC) by less than 1% as far as I checked.
I made plots for 1,000 simulations of LLR and t-value. T-value stop introduces an unbounded Type I error for N -> infinity, but over a chosen specific range of N, the errors can be set to be similar to LLR stop case, so the t-value can be vaguely compared to LLR. The plots are interesting also to see the dispersion of LLRs.

LLR = 2*N*(S1 - S0)*(1 - S0 - S1 + Win - Loss)/(Loss + Win)
t-value = (wins-losses)/sqrt(wins+losses).


First, the scaling of LLR is ~ N, of t-value ~ sqrt(N), so for sufficiently large number of games LLR will go much further. Draw rate in all plots is 60%. Here is the plot for small difference of 5 ELO to 40,000 games.

Image
We see that t-value stop is only comparable to LLR stop in the intermediate range. For small N up to 400, small ELO difference of 5 ELO the plot is:


Image
T-value is all over the place here, confirming my impression that for small N and small differences (the most common case when one wants to test fast small patches), LLR behaves much better. For small N up to 400 and large ELO difference of 50 ELO points, the plot is:


Image
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

With 3-nomial and 5-nomial LLRs expressed here (small ELO differences):

LLR 3-nomial = 2*N*(S1 - S0)*(Win - Loss + 1 - S0 - S1)/(Win + Loss),

LLR 5-nomial = 4*N *(S1 - S0)*(2*P20 - 2*P00 + P15 - P05 + 2*(1 - S0 - S1))/(4 *P00 + 4*P20 + P05 + P15)
,

introducing moderate correlation between games (in practice realized by the same and reversed opening), I get the following LLR distribution in 1000 simulated runs:

Image
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

Michel wrote:A nice alternative (asymptotically equivalent) approximation is

Code: Select all

(1/2) N*ln((Var_s+(s-s0)^2/N)/(Var_s+(s-s1)^2/N))   (*)
where s is the score and Var_s is the empiric variance for the score.

You get it by assuming that the game outcomes form a normal distribution (instead of a 3/5-nomial one) with unknown mean and variance and then you perform a GSPRT with the unknown variance as a nuissance parameter.

So instead of approximating the LLR for the original problem (a 3/5-nomial distribution), you change the original problem (3/5-nomial->normal).

Using simulation I have some tentative evidence (not conclusive) that (*) behaves better than the original approximation for short tests although I see no theoretical reason for this. I need to write a fast simulator (not Python!).

Note: to accurately evaluate SPRT variants for short tests it is mandatory to perform overshoot correction (since time is discrete the LLR will not be precisely on the boundary of the continuation interval when the test stops, i.e. there is overshoot). It is known how to do this.
Interesting. Without any overshoot correction, I numerically verified your new expression for LLR, it seems correct, and very close to the "master" LLR. On the graph of 1 run they overlap to high degree, one of the plots is partially invisible:

Image

In 1000 runs, the dots come in very close pairs (if not exact matches):

Image