Hello Kai:
Laskos wrote:Hello Jesus, I think you might do something interesting to me. IIRC you have a SPRT simulator, correct me if I am wrong. I got also that Type II error is smaller too for unbalanced openings, and I would like to check if SPRT with the same H0, H1, alpha, beta will stop with fewer number of games on average with unbalanced openings compared to balanced ones.
I took two Stockfishes, a recent SF_strong and an older SF_weak, the difference between them being of the order of 50 ELO points. I let them play for 2,000 games at 30''+0.3'' each match (balanced openings/unbalanced openings) and got the following performances:
- 1/ Balanced openings
SF_strong as White:
+30% =58% -12%
SF_strong as Black:
+27% =58% -15%
And the opposite:
SF_weak as White:
+15% =58% -27%
SF_weak as Black:
+12% =58% -30%
2/ Unbalanced openings:
SF_strong as White:
+59% =39% -2%
SF_strong as Black:
+5% =65% -30%
And the opposite:
SF_weak as White:
+30% =65% -5%
SF_weak as Black:
+2% =39% -59%
Can you input these outcome probabilities for 2 separate SPRT simulations (balanced/unbalanced) to get the average (in many runs) number of games needed to stop for each of these cases? With say H0=0, H1=30, alpha, beta=0.05. As significance the matches were very similar, but as Type I,II errors I guess the unbalanced openings will favor a faster stop (smaller number of games for SPRT to stop, picking the hypothesis H1).
I programmed a SPRT simulator almost three years ago and I recently added parameter estimation to fit the cumulative distribution function of the length of simulations (number of games) to a log-normal distribution, which fits reasonably well.
My simulator works in a slightly different way that the one you propose: I input alpha, beta, lower and upper bounds of SPRT (in Bayeselo units) and two parameters: expected Elo gain (in Bayeselo units, but there is a known relationship of conversion) and drawelo (which is related to the draw ratio).
So, the overall input is summarized in prob.(A wins), prob.(B wins) and prob.(draw) = 1 - prob.(A wins) - prob.(B wins); instead of prob.(A wins with white), prob.(A wins with black), prob.(B wins with white), prob.(B wins with black), prob.(draw of A-B) = 1 - prob.(A wins with white) - prob.(B wins with black), prob.(draw of B-A) = 1 - prob.(A wins with black) - prob.(B wins with white).
I see that your scores with balanced and unbalanced openings are not the same (µ_SF_strong = 57.5% and µ_SF_strong = 58% respectively) although it is expected due to error bars. OTOH, draw_ratio(balanced) = 58% and draw_ratio(unbalanced) = 52%. More draws usually translate into a larger average value in the number of games.
I need to do changes because my usual way to proceed is game after game with the same set of probabilities, not round after round (A-B, B-A and repeat) with two sets of probabilities.
First of all, averaging white and black probabilities (I know that you do not want this, but it is to get a rough idea):
Code: Select all
bayeselo = 200*log10{W*(1 - L)/[L*(1 - W)]}
drawelo = 200*log10[(1 - L)*(1 - W)/(L*W)] // Estimated from the sample of games.
From SF_strong POV:
Balanced:
W = (0.30 + 0.27)/2 = 0.285
L = (0.12 + 0.15)/2 = 0.135
Elo ~ 52.5116
bayeselo ~ 81.4442
drawelo ~ 241.2287
Unbalanced:
W = (0.59 + 0.05)/2 = 0.32
L = (0.02 + 0.30)/2 = 0.16
Elo ~ 56.0715
bayeselo ~ 78.5601
drawelo ~ 209.5036
I input these values into a SPRT tool by Michel van den Bergh that gives theoretical results (not simulations). I have the following doubt: I suppose that H0 and H1 have the meaning of a SPRT(H0, H1) test, but... [H0] = [H1] are Bayeselo or logistic Elo? There is a conversion formula between Bayeselo and logistic Elo that works for small numbers (let us say |value| < 10 Bayeselo for example), but I am not so sure about larger values. Anyway:
Code: Select all
x = 10^(-drawelo/400)
bayeselo_to_Elo_scale = 4*x/(1 + x)² // Elo = (bayeselo_to_Elo_scale)*bayeselo.
H0 = 0 Bayeselo = 0 Elo
H1?
H1 = 30 Bayeselo
or
H1(30 Elo, drawelo ~ 241.2287) ~ 46.9406 Bayeselo
or
H1(30 Elo, drawelo ~ 209.5036) ~ 42.2962 Bayeselo
------------------------
C:\[...]\sprta>sprt_w32
Usage: sprta.py elo0 elo1 draw_elo elo
elo0,elo1 are expressed in BayesElo
elo is expressed in LogisticElo
Balanced:
C:\[...]\sprta>sprt_w32 0 30 241.2287 52.5116
elo0 = 0.00
elo1 = 30.00
draw_elo = 241.23
elo = 52.51
pass probability: 100.03%
avg running time: 172
Unbalanced:
C:\[...]\sprta>sprt_w32 0 30 209.5036 56.0715
elo0 = 0.00
elo1 = 30.00
draw_elo = 209.50
elo = 56.07
pass probability: 100.00%
avg running time: 170
************************
Balanced:
C:\[...]\sprta>sprt_w32 0 46.9406 241.2287 52.5116
elo0 = 0.00
elo1 = 46.94
draw_elo = 241.23
elo = 52.51
pass probability: 99.94%
avg running time: 125
Unbalanced:
C:\[...]\sprta>sprt_w32 0 42.2962 209.5036 56.0715
elo0 = 0.00
elo1 = 42.30
draw_elo = 209.50
elo = 56.07
pass probability: 99.97%
avg running time: 133
'avg running time' is the average number of games. It is funny to see 'pass probability: 100.03%' in one of the outputs.
Please realise that avg_games(balanced) > avg_games(unbalanced) with H1 = 30 Bayeselo but the opposite with H1 = 30 Elo. It could happen due to the changing ratio (elo - H0)/(H1 - H0) (all written in the same units).
Last but not least, please remember that there is a
chess SPRT online calculator with theoretical results (not simulations), based on Michel's tool if I am not wrong.
------------------------
I do not run simulations in your proposed way right now because of two reasons:
a) I would like to know if you want H1 = 30 Elo (logistic Elo) or H1 = 30 Bayeselo.
b) I have not done the changes yet and I am not sure that I will have enough time and skills for make it work properly. Sorry.
So, reading b), it is unlikely that I can do what you requested though odds may change (like the
great comeback of 1999 Champions League Final).
Regards from Spain.
Ajedrecista.