Some properties of the Type I error in p-value stopping ru

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
Laskos
Posts: 8023
Joined: Wed Jul 26, 2006 8:21 pm

Some properties of the Type I error in p-value stopping ru

Post by Laskos » Tue Mar 01, 2016 12:15 pm

Let's take the important t-value in Chess match results: (W-L)/sqrt(W+L) = (2*W-N)/sqrt(N), N is the scaling parameter, and integer. If null hypothesis is assumed true (W=L=N/2) and assuming normal approximation this quantity is normally distributed with expectation value zero and standard deviation 1. So it is convertible to p-value. We would like to have a p-value stopping rule rejecting the null hypothesis H0: W=L with certain Type I error (incorrect rejection of a true null hypothesis H0). While closest to the best solution to this is SPRT, many people still use p-value (confidence intervals) in determining "superiority" (W<>L, or rejection of H0). Say people using Bayeselo 2 standard deviations (t=2), are in fact stopping at certain p-value. Methodologically, I also assume that people stop as soon as they see the p-value decreasing below some threshold. This stopping rule has UNBOUNDED Type 1 error. In the limit N->infinity, Type I error is 100%.

Practically, aside theoretical considerations, the scale of the problem (N) is bounded, and this stopping rule can still be considered on some finite range. But it's important to control the Type I error for this quantity. My experiment starts here. I don't know the theoretical derivation for this case of this quantity, so I performed simulations. First observation: doubling the scale N gives a sensibly constant additional Type I error:

Code: Select all

   N     Type I error for t=2
           from N to 2*N

  500       7.92%
 1000       7.96%
 2000       7.87%
 4000       7.96%
 8000       7.90%
Constant within error margins. Therefore, the Type I error is logarithmic in N. It confirms that it's unbounded. Total Type I error for N: 500->16000 in this case (t=2) is 33.8%. But Type I error being logarithmic in N, for finite N there is some use of the stopping rule. If the error from doubling times log2(N) is sensibly smaller than 1, then Type I error is controlled, though there is some balance to do between smaller error and the necessary effort. The stopping rule is far from being optimal, but at least it can be soundly used.

Second observation: the Type I error from doubling seems to follow closely the quantity Exp(-t^2/2). So the error goes pretty quickly to some small values with increasing of t-value:

Code: Select all

t-value  Type I error   Exp&#40;-t^2/2&#41;     p-value
         per doubling

5.0         0.00040%      0.00037%      0.000057%  
4.5         0.0044%       0.0040%       0.00068%
4.0         0.033%        0.034%        0.0063%  
3.5         0.18%         0.219%        0.046%
3.0         0.81%         1.11%         0.27%
2.5         2.66%         4.39%         1.24%
2.0         7.92%        13.53%         4.55%
1.5        20.23%        32.47%        13.36% 
1.0        49.08%        60.65%        31.73%
On practical grounds: in physical sciences N is at most say 2^300, so a stopping rule for this quantity with usual t=5 and more gives less than 1% Type 1 error. In Chess testing using games, N is at most 2^20, and a stopping rule based on t=3.5 can be safely used with less than 5% error. Again, this p-value stopping rule is far from being optimal with regard to effort. On the other hand, the often used 2 standard deviations stopping rule (p=0.05) is virtually impossible to apply beyond a sole doubling and is hardly of any use as a serious stopping rule for this quantity.

User avatar
Laskos
Posts: 8023
Joined: Wed Jul 26, 2006 8:21 pm

Re: Some properties of the Type I error in p-value stopping

Post by Laskos » Tue Mar 08, 2016 8:23 pm

Laskos wrote:On the other hand, the often used stopping rule p=0.05 is virtually impossible to apply beyond a sole doubling and is hardly of any use as a serious stopping rule for this quantity.
In "Nature News" from the paper:
Nature 531, 151 (10 March 2016) doi:10.1038/nature.2016.19503
a similar thing is asserted by American Statistical Association (ASA):
http://www.nature.com/news/statistician ... NatureNews

User avatar
Laskos
Posts: 8023
Joined: Wed Jul 26, 2006 8:21 pm

Re: Some properties of the Type I error in p-value stopping

Post by Laskos » Fri Jul 28, 2017 9:13 am

Laskos wrote:

Code: Select all

t-value  Type I error   Exp&#40;-t^2/2&#41;     p-value
         per doubling

5.0         0.00040%      0.00037%      0.000057%  
4.5         0.0044%       0.0040%       0.00068%
4.0         0.033%        0.034%        0.0063%  
3.5         0.18%         0.219%        0.046%
3.0         0.81%         1.11%         0.27%
2.5         2.66%         4.39%         1.24%
2.0         7.92%        13.53%         4.55%
1.5        20.23%        32.47%        13.36% 
1.0        49.08%        60.65%        31.73%
On practical grounds: in physical sciences N is at most say 2^300, so a stopping rule for this quantity with usual t=5 and more gives less than 1% Type 1 error. In Chess testing using games, N is at most 2^20, and a stopping rule based on t=3.5 can be safely used with less than 5% error. Again, this p-value stopping rule is far from being optimal with regard to effort. On the other hand, the often used 2 standard deviations stopping rule (p=0.05) is virtually impossible to apply beyond a sole doubling and is hardly of any use as a serious stopping rule for this quantity.
Funny, "Big names in statistics" are needed to reach the same conclusions (more than a year later):

Big names in statistics want to shake up much-maligned P value
One of scientists’ favourite statistics — the P value — should face tougher standards, say leading researchers.

http://www.nature.com/news/big-names-in ... ue-1.22375

User avatar
Laskos
Posts: 8023
Joined: Wed Jul 26, 2006 8:21 pm

Re: Some properties of the Type I error in p-value stopping

Post by Laskos » Tue Nov 28, 2017 5:17 pm

Laskos wrote:
Laskos wrote:

Code: Select all

t-value  Type I error   Exp&#40;-t^2/2&#41;     p-value
         per doubling

5.0         0.00040%      0.00037%      0.000057%  
4.5         0.0044%       0.0040%       0.00068%
4.0         0.033%        0.034%        0.0063%  
3.5         0.18%         0.219%        0.046%
3.0         0.81%         1.11%         0.27%
2.5         2.66%         4.39%         1.24%
2.0         7.92%        13.53%         4.55%
1.5        20.23%        32.47%        13.36% 
1.0        49.08%        60.65%        31.73%
On practical grounds: in physical sciences N is at most say 2^300, so a stopping rule for this quantity with usual t=5 and more gives less than 1% Type 1 error. In Chess testing using games, N is at most 2^20, and a stopping rule based on t=3.5 can be safely used with less than 5% error. Again, this p-value stopping rule is far from being optimal with regard to effort. On the other hand, the often used 2 standard deviations stopping rule (p=0.05) is virtually impossible to apply beyond a sole doubling and is hardly of any use as a serious stopping rule for this quantity.
Funny, "Big names in statistics" are needed to reach the same conclusions (more than a year later):

Big names in statistics want to shake up much-maligned P value
One of scientists’ favourite statistics — the P value — should face tougher standards, say leading researchers.

http://www.nature.com/news/big-names-in ... ue-1.22375
Another paper in "Nature" on this, combined with the thread on priors and Bayesian analysis http://talkchess.com/forum/viewtopic.php?t=64084


https://www.nature.com/articles/d41586- ... n=20171128

Post Reply