You are choosing to calculate the errors in a way that you need to realize there is a covariance. But that is not needed. You get a direct measure, you split it, and then you combine it. That is why the measurements are correlated: They were only one in the first place!michiguel wrote:We don't have multiple opponents here.Daniel Shawul wrote:Well then the reported error of margins are wrong because both elostat and bayeselo default do report 20 (not 10) error of margin for your example. When we have multiple opponent, elostat still calculates variances for each individual by looking at all scores combined +1,0,0.5 so it completely disregards the opponent.michiguel wrote:Yes, if you haveDaniel Shawul wrote:Well then what are you saying?? That result is impossible without *covariance*. You said there is no covariance , didn't you ?That is a way to represent the results, but the direct measure is Engine_A-EngineB = 200 +/- 20. These are the numbers I am talking about. DeltaAB and Eab.Code: Select all
Elo Error +/- Engine_A +100 10 Engine_B -100 10
+100 is the elo compared to the average of the pool (zero), but that is a conversion after you actually found that the difference is 200. You can't calculate one elo without the other.
You are taking +100 and -100 like they other separate but not independent measures. Fine, they are correlated of course, but whatever you do to obtain the error, you will get +/- 20. That is Eab, which will be the same to Eac and Ecb if you do a similar match with the same number of games. From that point on, you can easily see that you need 4x games.
Miguel
If BE reports +/-20 in match between A and B (for each engine), then the error of A-B is 40.
Are you saying that when you measure the elo between A and B in a direct match, that is not a direct measure?
Miguel
But like I said, whatever you do, you apply the same procedure to the three matches and you will get the same error for each of them. When you combine two of them, of course the error will be bigger.
You are saying that A-B playing 1000 games is as accurate as playing 250 games between A and C and 250 games between C and B, and subtracting the results of the last two matches. Don't you find that really odd?
Miguel