Re: Type I error for p-value stopping: balanced and unbalanc
Posted: Tue Jun 21, 2016 1:52 pm
Hello Jesus, thanks for your time. The ELO difference is not exactly equal in these matches (it is not a rounding error), what is pretty equal in these matches is the significance (W-L)/sigma. Difference is a little larger for the cases of unbalanced, but error margin too, so significance (W-L)/sigma is equal in the two cases.Ajedrecista wrote:
I programmed a SPRT simulator almost three years ago and I recently added parameter estimation to fit the cumulative distribution function of the length of simulations (number of games) to a log-normal distribution, which fits reasonably well.
My simulator works in a slightly different way that the one you propose: I input alpha, beta, lower and upper bounds of SPRT (in Bayeselo units) and two parameters: expected Elo gain (in Bayeselo units, but there is a known relationship of conversion) and drawelo (which is related to the draw ratio).
So, the overall input is summarized in prob.(A wins), prob.(B wins) and prob.(draw) = 1 - prob.(A wins) - prob.(B wins); instead of prob.(A wins with white), prob.(A wins with black), prob.(B wins with white), prob.(B wins with black), prob.(draw of A-B) = 1 - prob.(A wins with white) - prob.(B wins with black), prob.(draw of B-A) = 1 - prob.(A wins with black) - prob.(B wins with white).
I see that your scores with balanced and unbalanced openings are not the same (µ_SF_strong = 57.5% and µ_SF_strong = 58% respectively) although it is expected due to error bars. OTOH, draw_ratio(balanced) = 58% and draw_ratio(unbalanced) = 52%. More draws usually translate into a larger average value in the number of games.
I need to do changes because my usual way to proceed is game after game with the same set of probabilities, not round after round (A-B, B-A and repeat) with two sets of probabilities.
First of all, averaging white and black probabilities (I know that you do not want this, but it is to get a rough idea):
I input these values into a SPRT tool by Michel van den Bergh that gives theoretical results (not simulations). I have the following doubt: I suppose that H0 and H1 have the meaning of a SPRT(H0, H1) test, but... [H0] = [H1] are Bayeselo or logistic Elo? There is a conversion formula between Bayeselo and logistic Elo that works for small numbers (let us say |value| < 10 Bayeselo for example), but I am not so sure about larger values. Anyway:Code: Select all
bayeselo = 200*log10{W*(1 - L)/[L*(1 - W)]} drawelo = 200*log10[(1 - L)*(1 - W)/(L*W)] // Estimated from the sample of games. From SF_strong POV: Balanced: W = (0.30 + 0.27)/2 = 0.285 L = (0.12 + 0.15)/2 = 0.135 Elo ~ 52.5116 bayeselo ~ 81.4442 drawelo ~ 241.2287 Unbalanced: W = (0.59 + 0.05)/2 = 0.32 L = (0.02 + 0.30)/2 = 0.16 Elo ~ 56.0715 bayeselo ~ 78.5601 drawelo ~ 209.5036
'avg running time' is the average number of games. It is funny to see 'pass probability: 100.03%' in one of the outputs.Code: Select all
x = 10^(-drawelo/400) bayeselo_to_Elo_scale = 4*x/(1 + x)² // Elo = (bayeselo_to_Elo_scale)*bayeselo. H0 = 0 Bayeselo = 0 Elo H1? H1 = 30 Bayeselo or H1(30 Elo, drawelo ~ 241.2287) ~ 46.9406 Bayeselo or H1(30 Elo, drawelo ~ 209.5036) ~ 42.2962 Bayeselo ------------------------ C:\[...]\sprta>sprt_w32 Usage: sprta.py elo0 elo1 draw_elo elo elo0,elo1 are expressed in BayesElo elo is expressed in LogisticElo Balanced: C:\[...]\sprta>sprt_w32 0 30 241.2287 52.5116 elo0 = 0.00 elo1 = 30.00 draw_elo = 241.23 elo = 52.51 pass probability: 100.03% avg running time: 172 Unbalanced: C:\[...]\sprta>sprt_w32 0 30 209.5036 56.0715 elo0 = 0.00 elo1 = 30.00 draw_elo = 209.50 elo = 56.07 pass probability: 100.00% avg running time: 170 ************************ Balanced: C:\[...]\sprta>sprt_w32 0 46.9406 241.2287 52.5116 elo0 = 0.00 elo1 = 46.94 draw_elo = 241.23 elo = 52.51 pass probability: 99.94% avg running time: 125 Unbalanced: C:\[...]\sprta>sprt_w32 0 42.2962 209.5036 56.0715 elo0 = 0.00 elo1 = 42.30 draw_elo = 209.50 elo = 56.07 pass probability: 99.97% avg running time: 133
Please realise that avg_games(balanced) > avg_games(unbalanced) with H1 = 30 Bayeselo but the opposite with H1 = 30 Elo. It could happen due to the changing ratio (elo - H0)/(H1 - H0) (all written in the same units).
Last but not least, please remember that there is a chess SPRT online calculator with theoretical results (not simulations), based on Michel's tool if I am not wrong.
------------------------
I do not run simulations in your proposed way right now because of two reasons:
a) I would like to know if you want H1 = 30 Elo (logistic Elo) or H1 = 30 Bayeselo.
b) I have not done the changes yet and I am not sure that I will have enough time and skills for make it work properly. Sorry.
So, reading b), it is unlikely that I can do what you requested though odds may change (like the great comeback of 1999 Champions League Final).
Regards from Spain.
Ajedrecista.
I forgot about this mess with logistic Elo and BayesElo, take H1 as 30 BayesElo. By averaging scores White-Black, it is expected that the number of necessary games to stop would be almost equal, as it is given mostly by significance. The desired effect is about very different White/Black compared to very similar White/Black, and I think that keeping sequentially pairwise color-reversed is important. I am no sure how it is done in the simulator, but if you manage to have this pairwise separated White/Black outcomes, it would be great. I expect not a small effect of say 1% less games, I would expect in excess of 20% less games for unbalanced.
Thanks and hoping for your Great Comeback .