And does it make a difference if the number was 0.99999999993 or 0.99993 that got rounded to "100%"? Chances are it just means "we should accept this change".Laskos wrote:That sort of LOS appears all the time in Fishtest regressions, they just write it as "100%".AlvaroBegue wrote:You won't see 0.99999999993 in chess engine tests either, unless you have time for billions of games. Perhaps that's why the pvalue scale is commonly used both in chess and in medical science.Laskos wrote:Yes, they are the same thing, but I rarely see a pvalue of 0.99999999993 in medical or social sciences, while in physics the same relevance tvalue appears all the time (and it's easier to write as "6").AlvaroBegue wrote:Both pvalues and tvalues are perfectly easy to understand. The pvalue means that, if the change you are exploring were actually a noop, you would expect this number to be drawn from a uniform distribution in the interval [0,1]. If the number you get is 0.9999997, you might think this is too much of a coincidence. The tvalue means that, if the change you are exploring were actually a noop, you would expect this number to follow a standard normal distribution. It the number you get is 5.0, you would be just as convinced that this is too much of a coincidence as you were with the pvalue.Laskos wrote:Thanks for this frequentist perspective, and I actually don't feel like the usage of pvalue in medical and social sciences is very sound, I like more the "five sigma" of physics. Pvalue of 0.05 or 0.01 can give large Type I error quickly, insomuch that with research groups seeking to prove miracles, some false miracles, out of many possible false miracles, will indeed appear to be proven from time to time with a very high likelihood (Type I error error explodes). The methodology assumes that one doesn't seek to prove miracles. That's why I like more this "number of standard deviations" than hardly intuitive pvalue. Even knowing that all these as "stopping rule" have a theoretically unbounded Type I error.Michel wrote:Nice!
Your formula is in fact statistically correct from a frequentist point of view.
Your formula computes how many standard deviations WL is from zero under the null hypothesis w=l (equal strength). So sigma is computed under the null hypothesis w=l (using that w+l=1d, and d is estimated from the sample).
So under the null hypothesis and assuming normal approximation (WL)/sqrt(W+L) is normally distributed with expectation value zero and standard deviation 1. So you can convert it to a pvalue (which is the frequentist version of LOS).
A simple expression
Moderators: bob, hgm, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 919
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: A simple expression
Re: A simple expression
I edited my earlier post asking this question "Then, do you know how Type I error behaves for a pvalue of 0.05 as stopping rule? I simulated it, this stopping rule is practically worthless in engine testing." You will be surprised to see how the Type I error behaves.AlvaroBegue wrote:And does it make a difference if the number was 0.99999999993 or 0.99993 that got rounded to "100%"? Chances are it just means "we should accept this change".Laskos wrote:That sort of LOS appears all the time in Fishtest regressions, they just write it as "100%".AlvaroBegue wrote:You won't see 0.99999999993 in chess engine tests either, unless you have time for billions of games. Perhaps that's why the pvalue scale is commonly used both in chess and in medical science.Laskos wrote:Yes, they are the same thing, but I rarely see a pvalue of 0.99999999993 in medical or social sciences, while in physics the same relevance tvalue appears all the time (and it's easier to write as "6").AlvaroBegue wrote:Both pvalues and tvalues are perfectly easy to understand. The pvalue means that, if the change you are exploring were actually a noop, you would expect this number to be drawn from a uniform distribution in the interval [0,1]. If the number you get is 0.9999997, you might think this is too much of a coincidence. The tvalue means that, if the change you are exploring were actually a noop, you would expect this number to follow a standard normal distribution. It the number you get is 5.0, you would be just as convinced that this is too much of a coincidence as you were with the pvalue.Laskos wrote:Thanks for this frequentist perspective, and I actually don't feel like the usage of pvalue in medical and social sciences is very sound, I like more the "five sigma" of physics. Pvalue of 0.05 or 0.01 can give large Type I error quickly, insomuch that with research groups seeking to prove miracles, some false miracles, out of many possible false miracles, will indeed appear to be proven from time to time with a very high likelihood (Type I error error explodes). The methodology assumes that one doesn't seek to prove miracles. That's why I like more this "number of standard deviations" than hardly intuitive pvalue. Even knowing that all these as "stopping rule" have a theoretically unbounded Type I error.Michel wrote:Nice!
Your formula is in fact statistically correct from a frequentist point of view.
Your formula computes how many standard deviations WL is from zero under the null hypothesis w=l (equal strength). So sigma is computed under the null hypothesis w=l (using that w+l=1d, and d is estimated from the sample).
So under the null hypothesis and assuming normal approximation (WL)/sqrt(W+L) is normally distributed with expectation value zero and standard deviation 1. So you can convert it to a pvalue (which is the frequentist version of LOS).