margin of error

michiguel · Post by **michiguel** » Mon Sep 24, 2012 7:28 am

michiguel wrote:
Daniel Shawul wrote:
michiguel wrote:
Daniel Shawul wrote:Well then what are you saying?? That result is impossible without *covariance*. You said there is no covariance , didn't you ?
Yes, if you have
Code: Select all
            Elo   Error +/- 
Engine_A   +100    10
Engine_B   -100    10
That is a way to represent the results, but the direct measure is Engine_A-EngineB = 200 +/- 20. These are the numbers I am talking about. DeltaAB and Eab.

+100 is the elo compared to the average of the pool (zero), but that is a conversion after you actually found that the difference is 200. You can't calculate one elo without the other.

You are taking +100 and -100 like they other separate but not independent measures. Fine, they are correlated of course, but whatever you do to obtain the error, you will get +/- 20. That is Eab, which will be the same to Eac and Ecb if you do a similar match with the same number of games. From that point on, you can easily see that you need 4x games.

Miguel
Well then the reported error of margins are wrong because both elostat and bayeselo default do report 20 (not 10) error of margin for your example. When we have multiple opponent, elostat still calculates variances for each individual by looking at all scores combined +1,0,0.5 so it completely disregards the opponent.
We don't have multiple opponents here.

If BE reports +/-20 in match between A and B (for each engine), then the error of A-B is 40.

Are you saying that when you measure the elo between A and B in a direct match, that is not a direct measure?

Miguel

You are choosing to calculate the errors in a way that you need to realize there is a covariance. But that is not needed. You get a direct measure, you split it, and then you combine it. That is why the measurements are correlated: They were only one in the first place!

But like I said, whatever you do, you apply the same procedure to the three matches and you will get the same error for each of them. When you combine two of them, of course the error will be bigger.

You are saying that A-B playing 1000 games is as accurate as playing 250 games between A and C and 250 games between C and B, and subtracting the results of the last two matches. Don't you find that really odd?

Miguel

michiguel · Post by **michiguel** » Mon Sep 24, 2012 7:30 am

Daniel Shawul wrote:Here is a result for 200-200-100 using elostat,bayeselo default and exactdist
Code: Select all
ResultSet-EloRating>mm 1 1
00:00:00,00
ResultSet-EloRating>ratings
Rank Name      Elo    +    - games score oppo. draws
   1 Player0     0   27   27   500   50%     0   20%
   2 Player1     0   27   27   500   50%     0   20%
ResultSet-EloRating>elostat
1 iterations
00:00:00,00
ResultSet-EloRating>ratings
Rank Name      Elo    +    - games score oppo. draws
   1 Player0     0   27   27   500   50%     0   20%
   2 Player1     0   27   27   500   50%     0   20%
ResultSet-EloRating>exactdist
00:00:00,06
ResultSet-EloRating>ratings
Rank Name      Elo    +    - games score oppo. draws
   1 Player0     0   15   15   500   50%     0   20%
   2 Player1     0   15   15   500   50%     0   20%
ResultSet-EloRating>
If you remeber last time, I noted that exactdist gives half as much variance...
Calculate the variance like you did for a 200-200-100 and tell me if you get 27 or 15 elos. The 27 elo is just raw calculated s.e and didn't get divided by 2 unlike your suggestion...

I explained to you already that default and exactdist are not the way to do it properly.

Miguel

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 7:37 am

You are choosing to calculate the errors in a way that you need to realize there is a covariance. But that is not needed. You get a direct measure, you split it, and then you combine it. That is why the measurements are correlated: They were only one in the first place!

But like I said, whatever you do, you apply the same procedure to the three matches and you will get the same error for each of them. When you combine two of them, of course the error will be bigger.

You are saying that A-B playing 1000 games is as accurate as playing 250 games between A and C and 250 games between C and B, and subtracting the results of the last two matches. Don't you find that really odd?

Miguel

You are avoiding the issue. Either elostat and bayeselo default report a wrong result or you are right. Both can't be true at the same time. I based my calculation based on the fact that error margins are reported for elos A +- D1 and B +- D2, changing this of course changes the result. Anyway since there are covariances reporting a single error of margin is wrong to being with...

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 7:43 am

That is funny. I go by default setups and this is not about accuracy at all since they are 2x larger. The assumption of normal has barely measurable effect anyway.

michiguel · Post by **michiguel** » Mon Sep 24, 2012 7:50 am

Daniel Shawul wrote:
You are choosing to calculate the errors in a way that you need to realize there is a covariance. But that is not needed. You get a direct measure, you split it, and then you combine it. That is why the measurements are correlated: They were only one in the first place!

But like I said, whatever you do, you apply the same procedure to the three matches and you will get the same error for each of them. When you combine two of them, of course the error will be bigger.

You are saying that A-B playing 1000 games is as accurate as playing 250 games between A and C and 250 games between C and B, and subtracting the results of the last two matches. Don't you find that really odd?

Miguel
You are avoiding the issue. Either elostat and bayeselo default report a wrong result or you are right. Both can't be true at the same time. I based my calculation based on the fact that error margins are reported for elos A +- D1 and B +- D2, changing this of course changes the result. Anyway since there are covariances reporting a single error of margin is wrong to being with...

Of course, the default is wrong. The one that gives you a correct measure of the error is if you use "covariance", and the error you obtained is related to the ELO value that represents the elo-average of the pool. You multiply both by 2 and you get DeltaAB and Eab.

When you do a match between A and B, and say, A beats B by 76%, this % is a direct measure. That measure is translated to 200 elo. What elo programs do is to split this in two and report +100 and -100. So, if you go along with the original 200 and the respective error that could be calculated, that is a direct measure than involves no covariance of any kind since it is single measure. So, if you split it, and then you combine it, you are doing it in a convoluted way (like the elo programs) but the result will be the same, but now you need to take into account that the values are correlated (perfectly). Hence, you need to introduce the concept of covariance.

Miguel

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 8:08 am

michiguel wrote:
Daniel Shawul wrote:
You are choosing to calculate the errors in a way that you need to realize there is a covariance. But that is not needed. You get a direct measure, you split it, and then you combine it. That is why the measurements are correlated: They were only one in the first place!

But like I said, whatever you do, you apply the same procedure to the three matches and you will get the same error for each of them. When you combine two of them, of course the error will be bigger.

You are saying that A-B playing 1000 games is as accurate as playing 250 games between A and C and 250 games between C and B, and subtracting the results of the last two matches. Don't you find that really odd?

Miguel
You are avoiding the issue. Either elostat and bayeselo default report a wrong result or you are right. Both can't be true at the same time. I based my calculation based on the fact that error margins are reported for elos A +- D1 and B +- D2, changing this of course changes the result. Anyway since there are covariances reporting a single error of margin is wrong to being with...
Of course, the default is wrong. The one that gives you a correct measure of the error is if you use "covariance", and the error you obtained is related to the ELO value that represents the elo-average of the pool. You multiply both by 2 and you get DeltaAB and Eab.

When you do a match between A and B, and say, A beats B by 76%, this % is a direct measure. That measure is translated to 200 elo. What elo programs do is to split this in two and report +100 and -100. So, if you go along with the original 200 and the respective error that could be calculated, that is a direct measure than involves no covariance of any kind since it is single measure. So, if you split it, and then you combine it, you are doing it in a convoluted way (like the elo programs) but the result will be the same, but now you need to take into account that the values are correlated (perfectly). Hence, you need to introduce the concept of covariance.

Miguel

Given a score of WWWWLLLLDD both elostat and bayeselo default simply calculate variances (without using co-variance). You don't need covariance if you are not calculating A-B. So the assumption is basically different from assuming it is a result of A-B. So they assume the following:
A's result: WWWWLLLLDD.. 27 elo for 500 games
B's result: LLLLWWWWDD.. 27 elo for the same games
This the basic assumption they made. You are saying the result is A-B's which opposes their assumption
A-B : WWWWLLLLDD.. 27 elo so A and B's should have been 27/2=14
So if that is the case they should have divided by 2 before comparison and it is not the fault of the normal assumption that is causing this big difference. So if the default method has differences with exactsore method, it should be solely due to the normal assumption. It can't be that bad unless there is a mistake like dividing by 2.
BTW if this was directly the percentages we are talking about then you would see the way I do it is the correct one. The differences of winning percentages would require twice the number of games in case of A vs B. You don't measure percentage differences directly (unlike the claim of measuring eloA-eloB directly) but percent A and percent B separately.

michiguel · Post by **michiguel** » Mon Sep 24, 2012 8:22 am

Daniel Shawul wrote:
michiguel wrote:
Daniel Shawul wrote:
You are choosing to calculate the errors in a way that you need to realize there is a covariance. But that is not needed. You get a direct measure, you split it, and then you combine it. That is why the measurements are correlated: They were only one in the first place!

But like I said, whatever you do, you apply the same procedure to the three matches and you will get the same error for each of them. When you combine two of them, of course the error will be bigger.

You are saying that A-B playing 1000 games is as accurate as playing 250 games between A and C and 250 games between C and B, and subtracting the results of the last two matches. Don't you find that really odd?

Miguel
You are avoiding the issue. Either elostat and bayeselo default report a wrong result or you are right. Both can't be true at the same time. I based my calculation based on the fact that error margins are reported for elos A +- D1 and B +- D2, changing this of course changes the result. Anyway since there are covariances reporting a single error of margin is wrong to being with...
Of course, the default is wrong. The one that gives you a correct measure of the error is if you use "covariance", and the error you obtained is related to the ELO value that represents the elo-average of the pool. You multiply both by 2 and you get DeltaAB and Eab.

When you do a match between A and B, and say, A beats B by 76%, this % is a direct measure. That measure is translated to 200 elo. What elo programs do is to split this in two and report +100 and -100. So, if you go along with the original 200 and the respective error that could be calculated, that is a direct measure than involves no covariance of any kind since it is single measure. So, if you split it, and then you combine it, you are doing it in a convoluted way (like the elo programs) but the result will be the same, but now you need to take into account that the values are correlated (perfectly). Hence, you need to introduce the concept of covariance.

Miguel
Given a score of WWWWLLLLDD both elostat and bayeselo default simply calculate variances (without using co-variance). You don't need covariance if you are not calculating A-B. So the assumption is basically different from assuming it is a result of A-B. So they assume the following:
A's result: WWWWLLLLDD.. 27 elo for 500 games
B's result: LLLLWWWWDD.. 27 elo for the same games
This the basic assumption they made. You are saying the result is A-B's which opposes their assumption
A-B : WWWWLLLLDD.. 27 elo so A and B's should have been 27/2=14
So if that is the case they should have divided by 2 before comparison and it is not the fault of the normal assumption that is causing this big difference. So if the default method has differences with exactsore method, it should be solely due to the normal assumption. It can't be that bad unless there is a mistake like dividing by 2.

No, both default and covariance assume Gaussian. That has nothing to do with this.
http://www.talkchess.com/forum/viewtopi ... &start=199

The point is one assume the the opponent is the true rating (gross approximation that could be valid when there are many opponents and lots of games), and the other not. So, that is why you get a difference that is close to 2.

BTW if this was directly the percentages we are talking about then you would see the way I do it is the correct one. The differences of winning percentages would require twice the number of games in case of A vs B. You don't measure percentage differences directly (unlike the claim of measuring eloA-eloB directly) but percent A and percent B separately.

When you measure A vs B, you measure only one percentage.

Miguel

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 8:37 am

When you measure A vs B, you measure only one percentage.

Miguel

This is what I am saying elostat and bayeselo do differently than your assumption. They do assume that they measured two percentages so for a winning percentage of 50%-50% with a correlation of -1. It is evident because they store separate scores for A and B as WWWWLLLLDD and LLLLWWWWDD. If you calculate the elo margin for 200-200-100 like I did var(A)=1/5 which translates to 27 elo as displayed by both elostat and bayeselo default. If it was a direct measure of A-B as you claimed , their report of error margin's are wrong and should have been 27/2.

Daniel Shawul · Post by **Daniel Shawul** » Mon Sep 24, 2012 8:44 am

No, both default and covariance assume Gaussian. That has nothing to do with this.
http://www.talkchess.com/forum/viewtopi ... &start=199

The point is one assume the the opponent is the true rating (gross approximation that could be valid when there are many opponents and lots of games), and the other not. So, that is why you get a difference that is close to 2.

I am talking about default and exactdist which is what we are comparing here. One gives 27 and the other 15 and the difference is exactly the gaussian assumption.

michiguel · Post by **michiguel** » Mon Sep 24, 2012 8:44 am

Daniel Shawul wrote:
When you measure A vs B, you measure only one percentage.

Miguel
This is what I am saying elostat and bayeselo do differently than your assumption. They do assume that they measured two percentages so for a winning percentage of 50%-50% with a correlation of -1. It is evident because they store separate scores for A and B as WWWWLLLLDD and LLLLWWWWDD. If you calculate the elo margin for 200-200-100 like I did var(A)=1/5 which translates to 27 elo as displayed by both elostat and bayeselo default. If it was a direct measure of A-B as you claimed , their report of error margin's are wrong and should have been 27/2.

No, I claimed that the correct way to calculate the errors in BE is in covariance mode, and that is the error over the value (A-average of the pool).

Once you do that, you get DeltaAB = 2*A and Eab = 2*Error that BE reports.

Regardless of how you do it, you will end up having a bigger error if you do this process indirectly through a third party.

Miguel

margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error

Re: margin of error