Hello Kai:
Laskos wrote:Hello Jesus, I think you might do something interesting to me. IIRC you have a SPRT simulator, correct me if I am wrong. I got also that Type II error is smaller too for unbalanced openings, and I would like to check if SPRT with the same H0, H1, alpha, beta will stop with fewer number of games on average with unbalanced openings compared to balanced ones.
I took two Stockfishes, a recent SF_strong and an older SF_weak, the difference between them being of the order of 50 ELO points. I let them play for 2,000 games at 30''+0.3'' each match (balanced openings/unbalanced openings) and got the following performances:
- 1/ Balanced openings
 
 SF_strong as White:
 +30% =58% -12%
 
 SF_strong as Black:
 +27% =58% -15%
 
 And the opposite:
 
 SF_weak as White:
 +15% =58% -27%
 
 SF_weak as Black:
 +12% =58% -30%
 
 
 2/ Unbalanced openings:
 
 SF_strong as White:
 +59% =39% -2%
 
 SF_strong as Black:
 +5% =65% -30%
 
 And the opposite:
 
 SF_weak as White:
 +30% =65% -5%
 
 SF_weak as Black:
 +2% =39% -59%
Can you input these outcome probabilities for 2 separate SPRT simulations (balanced/unbalanced) to get the average (in many runs) number of games needed to stop for each of these cases? With say H0=0, H1=30, alpha, beta=0.05. As significance the matches were very similar, but as Type I,II errors I guess the unbalanced openings will favor a faster stop (smaller number of games for SPRT to stop, picking the hypothesis H1).
 
I programmed a SPRT simulator almost three years ago and I recently added parameter estimation to fit the cumulative distribution function of the length of simulations (number of games) to a log-normal distribution, which fits reasonably well.
My simulator works in a slightly different way that the one you propose: I input alpha, beta, lower and upper bounds of SPRT (in Bayeselo units) and two parameters: expected Elo gain (in Bayeselo units, but there is a known relationship of conversion) and drawelo (which is related to the draw ratio).
So, the overall input is summarized in prob.(A wins), prob.(B wins) and prob.(draw) = 1 - prob.(A wins) - prob.(B wins); instead of prob.(A wins with white), prob.(A wins with black), prob.(B wins with white), prob.(B wins with black), prob.(draw of A-B) = 1 - prob.(A wins with white) - prob.(B wins with black), prob.(draw of B-A) = 1 - prob.(A wins with black) - prob.(B wins with white).
I see that your scores with balanced and unbalanced openings are not the same (µ_SF_strong = 57.5% and µ_SF_strong = 58% respectively) although it is expected due to error bars. OTOH, draw_ratio(balanced) = 58% and draw_ratio(unbalanced) = 52%. More draws usually translate into a larger average value in the number of games.
I need to do changes because my usual way to proceed is game after game with the same set of probabilities, not round after round (A-B, B-A and repeat) with two sets of probabilities.
First of all, averaging white and black probabilities (I know that you do not want this, but it is to get a rough idea):
Code: Select all
bayeselo = 200*log10{W*(1 - L)/[L*(1 - W)]}
drawelo = 200*log10[(1 - L)*(1 - W)/(L*W)]  // Estimated from the sample of games.
From SF_strong POV:
Balanced:
W = (0.30 + 0.27)/2 = 0.285
L = (0.12 + 0.15)/2 = 0.135
Elo      ~  52.5116
bayeselo ~  81.4442
drawelo  ~ 241.2287
Unbalanced:
W = (0.59 + 0.05)/2 = 0.32
L = (0.02 + 0.30)/2 = 0.16
Elo      ~  56.0715
bayeselo ~  78.5601
drawelo  ~ 209.5036
I input these values into a SPRT tool by Michel van den Bergh that gives theoretical results (not simulations). I have the following doubt: I suppose that H0 and H1 have the meaning of a SPRT(H0, H1) test, but... [H0] = [H1] are Bayeselo or logistic Elo? There is a conversion formula between Bayeselo and logistic Elo that works for small numbers (let us say |value| < 10 Bayeselo for example), but I am not so sure about larger values. Anyway:
Code: Select all
x = 10^(-drawelo/400)
bayeselo_to_Elo_scale = 4*x/(1 + x)²  // Elo = (bayeselo_to_Elo_scale)*bayeselo.
H0 = 0 Bayeselo = 0 Elo
H1?
H1 = 30 Bayeselo
or
H1(30 Elo, drawelo ~ 241.2287) ~ 46.9406 Bayeselo
or
H1(30 Elo, drawelo ~ 209.5036) ~ 42.2962 Bayeselo
------------------------
C:\[...]\sprta>sprt_w32
Usage: sprta.py elo0 elo1 draw_elo elo
elo0,elo1 are expressed in BayesElo
elo is expressed in LogisticElo
Balanced:
C:\[...]\sprta>sprt_w32 0 30 241.2287 52.5116
elo0     =     0.00
elo1     =    30.00
draw_elo =   241.23
elo      =    52.51
pass probability:      100.03%
avg running time:        172
Unbalanced:
C:\[...]\sprta>sprt_w32 0 30 209.5036 56.0715
elo0     =     0.00
elo1     =    30.00
draw_elo =   209.50
elo      =    56.07
pass probability:      100.00%
avg running time:        170
************************
Balanced:
C:\[...]\sprta>sprt_w32 0 46.9406 241.2287 52.5116
elo0     =     0.00
elo1     =    46.94
draw_elo =   241.23
elo      =    52.51
pass probability:      99.94%
avg running time:        125
Unbalanced:
C:\[...]\sprta>sprt_w32 0 42.2962 209.5036 56.0715
elo0     =     0.00
elo1     =    42.30
draw_elo =   209.50
elo      =    56.07
pass probability:      99.97%
avg running time:        133
'avg running time' is the average number of games. It is funny to see 'pass probability: 100.03%' in one of the outputs.
Please realise that avg_games(balanced) > avg_games(unbalanced) with H1 = 30 Bayeselo but the opposite with H1 = 30 Elo. It could happen due to the changing ratio (elo - H0)/(H1 - H0) (all written in the same units).
Last but not least, please remember that there is a 
chess SPRT online calculator with theoretical results (not simulations), based on Michel's tool if I am not wrong.
------------------------
I do not run simulations in your proposed way right now because of two reasons:
a) I would like to know if you want H1 = 30 Elo (logistic Elo) or H1 = 30 Bayeselo.
b) I have not done the changes yet and I am not sure that I will have enough time and skills for make it work properly. Sorry.
So, reading b), it is unlikely that I can do what you requested though odds may change (like the 
great comeback of 1999 Champions League Final).
Regards from Spain.
Ajedrecista.