A simple expression

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
AlvaroBegue
Posts: 919
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: A simple expression

Post by AlvaroBegue » Thu Dec 10, 2015 2:58 pm

Laskos wrote:
AlvaroBegue wrote:
Laskos wrote:
AlvaroBegue wrote:
Laskos wrote:
Michel wrote:Nice!

Your formula is in fact statistically correct from a frequentist point of view.

Your formula computes how many standard deviations W-L is from zero under the null hypothesis w=l (equal strength). So sigma is computed under the null hypothesis w=l (using that w+l=1-d, and d is estimated from the sample).

So under the null hypothesis and assuming normal approximation (W-L)/sqrt(W+L) is normally distributed with expectation value zero and standard deviation 1. So you can convert it to a p-value (which is the frequentist version of LOS).
Thanks for this frequentist perspective, and I actually don't feel like the usage of p-value in medical and social sciences is very sound, I like more the "five sigma" of physics. P-value of 0.05 or 0.01 can give large Type I error quickly, insomuch that with research groups seeking to prove miracles, some false miracles, out of many possible false miracles, will indeed appear to be proven from time to time with a very high likelihood (Type I error error explodes). The methodology assumes that one doesn't seek to prove miracles. That's why I like more this "number of standard deviations" than hardly intuitive p-value. Even knowing that all these as "stopping rule" have a theoretically unbounded Type I error.
Both p-values and t-values are perfectly easy to understand. The p-value means that, if the change you are exploring were actually a no-op, you would expect this number to be drawn from a uniform distribution in the interval [0,1]. If the number you get is 0.9999997, you might think this is too much of a coincidence. The t-value means that, if the change you are exploring were actually a no-op, you would expect this number to follow a standard normal distribution. It the number you get is 5.0, you would be just as convinced that this is too much of a coincidence as you were with the p-value.
Yes, they are the same thing, but I rarely see a p-value of 0.99999999993 in medical or social sciences, while in physics the same relevance t-value appears all the time (and it's easier to write as "6").
You won't see 0.99999999993 in chess engine tests either, unless you have time for billions of games. Perhaps that's why the p-value scale is commonly used both in chess and in medical science.
That sort of LOS appears all the time in Fishtest regressions, they just write it as "100%".
And does it make a difference if the number was 0.99999999993 or 0.99993 that got rounded to "100%"? Chances are it just means "we should accept this change".

User avatar
Laskos
Posts: 9417
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: A simple expression

Post by Laskos » Thu Dec 10, 2015 3:02 pm

AlvaroBegue wrote:
Laskos wrote:
AlvaroBegue wrote:
Laskos wrote:
AlvaroBegue wrote:
Laskos wrote:
Michel wrote:Nice!

Your formula is in fact statistically correct from a frequentist point of view.

Your formula computes how many standard deviations W-L is from zero under the null hypothesis w=l (equal strength). So sigma is computed under the null hypothesis w=l (using that w+l=1-d, and d is estimated from the sample).

So under the null hypothesis and assuming normal approximation (W-L)/sqrt(W+L) is normally distributed with expectation value zero and standard deviation 1. So you can convert it to a p-value (which is the frequentist version of LOS).
Thanks for this frequentist perspective, and I actually don't feel like the usage of p-value in medical and social sciences is very sound, I like more the "five sigma" of physics. P-value of 0.05 or 0.01 can give large Type I error quickly, insomuch that with research groups seeking to prove miracles, some false miracles, out of many possible false miracles, will indeed appear to be proven from time to time with a very high likelihood (Type I error error explodes). The methodology assumes that one doesn't seek to prove miracles. That's why I like more this "number of standard deviations" than hardly intuitive p-value. Even knowing that all these as "stopping rule" have a theoretically unbounded Type I error.
Both p-values and t-values are perfectly easy to understand. The p-value means that, if the change you are exploring were actually a no-op, you would expect this number to be drawn from a uniform distribution in the interval [0,1]. If the number you get is 0.9999997, you might think this is too much of a coincidence. The t-value means that, if the change you are exploring were actually a no-op, you would expect this number to follow a standard normal distribution. It the number you get is 5.0, you would be just as convinced that this is too much of a coincidence as you were with the p-value.
Yes, they are the same thing, but I rarely see a p-value of 0.99999999993 in medical or social sciences, while in physics the same relevance t-value appears all the time (and it's easier to write as "6").
You won't see 0.99999999993 in chess engine tests either, unless you have time for billions of games. Perhaps that's why the p-value scale is commonly used both in chess and in medical science.
That sort of LOS appears all the time in Fishtest regressions, they just write it as "100%".
And does it make a difference if the number was 0.99999999993 or 0.99993 that got rounded to "100%"? Chances are it just means "we should accept this change".
I edited my earlier post asking this question "Then, do you know how Type I error behaves for a p-value of 0.05 as stopping rule? I simulated it, this stopping rule is practically worthless in engine testing." You will be surprised to see how the Type I error behaves.

Post Reply