Some properties of the Type I error in p-value stopping ru
Posted: Tue Mar 01, 2016 1:15 pm
Let's take the important t-value in Chess match results: (W-L)/sqrt(W+L) = (2*W-N)/sqrt(N), N is the scaling parameter, and integer. If null hypothesis is assumed true (W=L=N/2) and assuming normal approximation this quantity is normally distributed with expectation value zero and standard deviation 1. So it is convertible to p-value. We would like to have a p-value stopping rule rejecting the null hypothesis H0: W=L with certain Type I error (incorrect rejection of a true null hypothesis H0). While closest to the best solution to this is SPRT, many people still use p-value (confidence intervals) in determining "superiority" (W<>L, or rejection of H0). Say people using Bayeselo 2 standard deviations (t=2), are in fact stopping at certain p-value. Methodologically, I also assume that people stop as soon as they see the p-value decreasing below some threshold. This stopping rule has UNBOUNDED Type 1 error. In the limit N->infinity, Type I error is 100%.
Practically, aside theoretical considerations, the scale of the problem (N) is bounded, and this stopping rule can still be considered on some finite range. But it's important to control the Type I error for this quantity. My experiment starts here. I don't know the theoretical derivation for this case of this quantity, so I performed simulations. First observation: doubling the scale N gives a sensibly constant additional Type I error:
Constant within error margins. Therefore, the Type I error is logarithmic in N. It confirms that it's unbounded. Total Type I error for N: 500->16000 in this case (t=2) is 33.8%. But Type I error being logarithmic in N, for finite N there is some use of the stopping rule. If the error from doubling times log2(N) is sensibly smaller than 1, then Type I error is controlled, though there is some balance to do between smaller error and the necessary effort. The stopping rule is far from being optimal, but at least it can be soundly used.
Second observation: the Type I error from doubling seems to follow closely the quantity Exp(-t^2/2). So the error goes pretty quickly to some small values with increasing of t-value:
On practical grounds: in physical sciences N is at most say 2^300, so a stopping rule for this quantity with usual t=5 and more gives less than 1% Type 1 error. In Chess testing using games, N is at most 2^20, and a stopping rule based on t=3.5 can be safely used with less than 5% error. Again, this p-value stopping rule is far from being optimal with regard to effort. On the other hand, the often used 2 standard deviations stopping rule (p=0.05) is virtually impossible to apply beyond a sole doubling and is hardly of any use as a serious stopping rule for this quantity.
Practically, aside theoretical considerations, the scale of the problem (N) is bounded, and this stopping rule can still be considered on some finite range. But it's important to control the Type I error for this quantity. My experiment starts here. I don't know the theoretical derivation for this case of this quantity, so I performed simulations. First observation: doubling the scale N gives a sensibly constant additional Type I error:
Code: Select all
N Type I error for t=2
from N to 2*N
500 7.92%
1000 7.96%
2000 7.87%
4000 7.96%
8000 7.90%
Second observation: the Type I error from doubling seems to follow closely the quantity Exp(-t^2/2). So the error goes pretty quickly to some small values with increasing of t-value:
Code: Select all
t-value Type I error Exp(-t^2/2) p-value
per doubling
5.0 0.00040% 0.00037% 0.000057%
4.5 0.0044% 0.0040% 0.00068%
4.0 0.033% 0.034% 0.0063%
3.5 0.18% 0.219% 0.046%
3.0 0.81% 1.11% 0.27%
2.5 2.66% 4.39% 1.24%
2.0 7.92% 13.53% 4.55%
1.5 20.23% 32.47% 13.36%
1.0 49.08% 60.65% 31.73%