Type I error for p-value stopping: Balanced and Unbalanced
Posted: Thu Jun 16, 2016 9:23 am
Several months ago I concentrated on the significance, the value t-value = (W-L)/(W+L)^(1/2) for unbalanced opening positions compared to those balanced. The results were inconclusive, with unbalanced opening positions being at least on par in revealing ELO differences with those balanced. What we really need is the goodness of the stopping rule based on t-value. The quantity needed is the Type I error (false positive) when one is stopping at say several standard deviations result, say 2 or 3. For the same t-value we have a bit different Type I error for unbalanced and balanced opening postions. Games from unbalanced (and balanced) positions proceed color and reversed, so for identical adversaries (in our case two recent Stockfishes) we have the needed empirical data:
60s+0.1s games between two identical Stockfishes
Balanaced (-30cp, 30cp) (white advantage below 20 ELO): White wins 17%, Black wins 16%, draws 67%
Unbalanced (70cp, 100cp) (white advantage above 120 ELO):White wins 42.5%, Black wins 4%, draws 53.5%
With this empirical data I computed in 100,000 simulations the Type I error function of t-value (number of standard deviations) used as stopping rule. Whenever one sees a difference of say 2.5 or 3.0 standard deviations, he stops and declares the winner as stronger. This stopping rule has no upper bound for Type I error, but it is still controllable to some reasonable number of games. Very few are testing beyond say 100,000 games. The type I error in the cases unbalanced openings/balanced openings is shown in this table:
We see that the Type I error for the same t-value (2.5 and 3 here) is significantly smaller in the case of unbalanced opening positions. In fact, t=2.5 stopping rule for unbalanced is pretty much equivalent to t=3.0 stopping rule for balanced. This amounts to about 40% less games needed to stop for unbalanced openings with a given Type I error. The rule of thumb would be: Type I error is about 5% for 3 standard deviations stop from balanced opening positions, and 2.5 standard deviations for unbalanced ones.
60s+0.1s games between two identical Stockfishes
Balanaced (-30cp, 30cp) (white advantage below 20 ELO): White wins 17%, Black wins 16%, draws 67%
Unbalanced (70cp, 100cp) (white advantage above 120 ELO):White wins 42.5%, Black wins 4%, draws 53.5%
With this empirical data I computed in 100,000 simulations the Type I error function of t-value (number of standard deviations) used as stopping rule. Whenever one sees a difference of say 2.5 or 3.0 standard deviations, he stops and declares the winner as stronger. This stopping rule has no upper bound for Type I error, but it is still controllable to some reasonable number of games. Very few are testing beyond say 100,000 games. The type I error in the cases unbalanced openings/balanced openings is shown in this table:
Code: Select all
Number Type I error Type I error
of Games Balanced Unbalanced
rule: 2.5sd 3.0sd 2.5sd 3.0sd
=======================================================
100 5.6% 0.95% 1.6% 0.21%
1000 13.6% 3.4% 3.9% 0.66%
10000 21.7% 6.1% 6.5% 0.9%
100000 30.6% 8.6% 9.2% 1.2%