margin of error

michiguel · Post by **michiguel** » Mon Sep 24, 2012 6:05 am

Daniel Shawul wrote:
HGM explanation is correct.

A plays B and the the difference (deltaAB) has an error Eab.
Then, with the same number games, we can calculate deltaAC, and it will have error Eac = Eab (since number of games are the same).
Also, with the same number games, we can calculate deltaCB, and it will have error Ecb = Eab (since number of games are the same).

So, we can calculate indirectly
deltaAB = deltaAC + deltaCB

Here we can already see that the error of this indirect calculation is bigger than Eab, no matter what, and we are already playing twice as many games.

deltaAC and deltaCB are independent, so the error for the indirect calculation is
IndirectError_ab = sqrt(Eac^2 + Ecb^2)
IndirectError_ab = sqrt(Eac^2 + Eac^2)
IndirectError_ab = sqrt(2*Eac^2)
IndirectError_ab = sqrt(2) * Eac

If we want the IndirectError_ab to be Eab, we have to make Eac = Eab/sqrt(2). We can do that playing twice as many games, which makes the total 4x.

Miguel
No you are missing inclusion of co-variance completely. In the first A vs B test you have a big covariance so that affects the variance of A - B big time. Even HGM agreed that for the example I gave two standard errors of 5 elo each , the std(A-B) = 10 which your calculation ignores..

No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.

Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 6:12 am

No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.

But the covariances Cov(A,C) and Cov(B,C) do not contribute to Var(A,B) when you calculate Var(A-B). That is the trick there.

Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.

This is wrong. See above. But to summarize.
Var(A,B)=Var(A)+Var(B)-2Cov(A,B)
Now when you play A vs C and B vs C
Var(A-B)=var(A)+Var(B) since Cov(A,B)=0 even if we have Cov(A,C) and Cov(B,C). So you see both covariances have no effect.

michiguel · Post by **michiguel** » Mon Sep 24, 2012 6:24 am

Daniel Shawul wrote:
No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.
But the covariances Cov(A,C) and Cov(B,C) do not contribute to Var(A,B) when you calculate Var(A-B). That is the trick there.
Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.
This is wrong. See above. But to summarize.
Var(A,B)=Var(A)+Var(B)-2Cov(A,B)
Now when you play A vs C and B vs C
Var(A-B)=var(A)+Var(B) since Cov(A,B)=0 even if we have Cov(A,C) and Cov(B,C). So you see both covariances have no effect.

There are no covariances, all measures are direct measures. DeltaAB, DeltaAC, and DeltaCB.

There is no such a thing as variable A because you actually measure its difference with B when you face A and B. BTW, that is not the error that BayesElo report, which is the error of A compared to the average of the pool. I am not talking about this error, I am talking about the error of DeltaAB.

Miguel

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 6:34 am

michiguel wrote:
Daniel Shawul wrote:
No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.
But the covariances Cov(A,C) and Cov(B,C) do not contribute to Var(A,B) when you calculate Var(A-B). That is the trick there.
Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.
This is wrong. See above. But to summarize.
Var(A,B)=Var(A)+Var(B)-2Cov(A,B)
Now when you play A vs C and B vs C
Var(A-B)=var(A)+Var(B) since Cov(A,B)=0 even if we have Cov(A,C) and Cov(B,C). So you see both covariances have no effect.
There are no covariances, all measures are direct measures. DeltaAB, DeltaAC, and DeltaCB.

There is no such a thing as variable A because you actually measure its difference with B when you face A and B. BTW, that is not the error that BayesElo report, which is the error of A compared to the average of the pool. I am not talking about this error, I am talking about the error of DeltaAB.

Miguel

Like I said already even HG agreed that the error will be twice as much for the example I gave with 180+-5 erros. Why do you think he said it would be 10 elos ?? Now we are changing the subject ..

michiguel · Post by **michiguel** » Mon Sep 24, 2012 6:37 am

Daniel Shawul wrote:
michiguel wrote:
Daniel Shawul wrote:
No, I am not missing anything. Whatever you do, Eab is always the same as Eac and Ecb, since you do exactly the same.
But the covariances Cov(A,C) and Cov(B,C) do not contribute to Var(A,B) when you calculate Var(A-B). That is the trick there.
Miguel
PS: Anyway, If you calculate the Eab correctly, there is no covariance, it is direct measure. But this is not relevant.
This is wrong. See above. But to summarize.
Var(A,B)=Var(A)+Var(B)-2Cov(A,B)
Now when you play A vs C and B vs C
Var(A-B)=var(A)+Var(B) since Cov(A,B)=0 even if we have Cov(A,C) and Cov(B,C). So you see both covariances have no effect.
There are no covariances, all measures are direct measures. DeltaAB, DeltaAC, and DeltaCB.

There is no such a thing as variable A because you actually measure its difference with B when you face A and B. BTW, that is not the error that BayesElo report, which is the error of A compared to the average of the pool. I am not talking about this error, I am talking about the error of DeltaAB.

Miguel
Like I said already even HG agreed that the error will be twice as much for the example I gave with 180+-5 erros. Why do you think he said it would be 10 elos ?? Now we are changing the subject ..

It *IS* 10 elo. That is Eab in my example.

Miguel

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 6:47 am

Well then what are you saying?? That result is impossible without *covariance*. You said there is no covariance , didn't you ?

michiguel · Post by **michiguel** » Mon Sep 24, 2012 6:58 am

Daniel Shawul wrote:Well then what are you saying?? That result is impossible without *covariance*. You said there is no covariance , didn't you ?

Yes, if you have

Code: Select all

            Elo   Error +/- 
Engine_A   +100    10
Engine_B   -100    10

That is a way to represent the results, but the direct measure is Engine_A-EngineB = 200 +/- 20. These are the numbers I am talking about. DeltaAB and Eab.

+100 is the elo compared to the average of the pool (zero), but that is a conversion after you actually found that the difference is 200. You can't calculate one elo without the other.

You are taking +100 and -100 like they are separate but not independent measures. Fine, they are correlated of course if you do it that way, but whatever you do to obtain the error, you will get +/- 20. That is Eab, which will be the same to Eac and Ecb if you do a similar match with the same number of games. From that point on, you can easily see that you need 4x games.

Miguel

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 7:12 am

michiguel wrote:
Daniel Shawul wrote:Well then what are you saying?? That result is impossible without *covariance*. You said there is no covariance , didn't you ?
Yes, if you have
Code: Select all
            Elo   Error +/- 
Engine_A   +100    10
Engine_B   -100    10
That is a way to represent the results, but the direct measure is Engine_A-EngineB = 200 +/- 20. These are the numbers I am talking about. DeltaAB and Eab.

+100 is the elo compared to the average of the pool (zero), but that is a conversion after you actually found that the difference is 200. You can't calculate one elo without the other.

You are taking +100 and -100 like they other separate but not independent measures. Fine, they are correlated of course, but whatever you do to obtain the error, you will get +/- 20. That is Eab, which will be the same to Eac and Ecb if you do a similar match with the same number of games. From that point on, you can easily see that you need 4x games.

Miguel

Well then the reported error of margins are wrong because both elostat and bayeselo default do report 20 (not 10) error of margin for your example. When we have multiple opponent, elostat still calculates variances for each individual by looking at all scores combined +1,0,0.5 so it completely disregards the opponent.

michiguel · Post by **michiguel** » Mon Sep 24, 2012 7:15 am

Daniel Shawul wrote:
michiguel wrote:
Daniel Shawul wrote:Well then what are you saying?? That result is impossible without *covariance*. You said there is no covariance , didn't you ?
Yes, if you have
Code: Select all
            Elo   Error +/- 
Engine_A   +100    10
Engine_B   -100    10
That is a way to represent the results, but the direct measure is Engine_A-EngineB = 200 +/- 20. These are the numbers I am talking about. DeltaAB and Eab.

+100 is the elo compared to the average of the pool (zero), but that is a conversion after you actually found that the difference is 200. You can't calculate one elo without the other.

You are taking +100 and -100 like they other separate but not independent measures. Fine, they are correlated of course, but whatever you do to obtain the error, you will get +/- 20. That is Eab, which will be the same to Eac and Ecb if you do a similar match with the same number of games. From that point on, you can easily see that you need 4x games.

Miguel
Well then the reported error of margins are wrong because both elostat and bayeselo default do report 20 (not 10) error of margin for your example. When we have multiple opponent, elostat still calculates variances for each individual by looking at all scores combined +1,0,0.5 so it completely disregards the opponent.

We don't have multiple opponents here.

If BE reports +/-20 in match between A and B (for each engine), then the error of A-B is 40.

Are you saying that when you measure the elo between A and B in a direct match, that is not a direct measure?

Miguel

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 7:21 am

Here is a result for 200-200-100 using elostat,bayeselo default and exactdist

Code: Select all

ResultSet-EloRating>mm 1 1
00&#58;00&#58;00,00
ResultSet-EloRating>ratings
Rank Name      Elo    +    - games score oppo. draws
   1 Player0     0   27   27   500   50%     0   20%
   2 Player1     0   27   27   500   50%     0   20%
ResultSet-EloRating>elostat
1 iterations
00&#58;00&#58;00,00
ResultSet-EloRating>ratings
Rank Name      Elo    +    - games score oppo. draws
   1 Player0     0   27   27   500   50%     0   20%
   2 Player1     0   27   27   500   50%     0   20%
ResultSet-EloRating>exactdist
00&#58;00&#58;00,06
ResultSet-EloRating>ratings
Rank Name      Elo    +    - games score oppo. draws
   1 Player0     0   15   15   500   50%     0   20%
   2 Player1     0   15   15   500   50%     0   20%
ResultSet-EloRating>

If you remeber last time, I noted that exactdist gives half as much variance...
Calculate the variance like you did for a 200-200-100 and tell me if you get 27 or 15 elos. The 27 elo is just raw calculated s.e and didn't get divided by 2 unlike your suggestion...

margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error