IWB wrote:Hello Uri,
I know that your mathematical skills are far beyond my capabilities but I knoww as well that the calculation done by the Shredder-Classic-GUI is allways right for Elostat and differs just slightly for BayesElo. In the current discussion it is even right for both. When I take the games and throw them into Bayes or Elostat I get 3016.
If this is "right" or not is an intersting discussion, but it is the way Elos are calculated. Nonetheless I still hope for something better which will be accepted ... !
Bye
Ingo
Maybe I can clear things up a bit. I know something about ratings as I was a long-time chairman of the USCF ratings committee.
Uri is right that it is mathematically wrong to base anything on the average rating of the opponents. This is in fact what Elostat does, and I think this was a major reason for the creation of BayesElo. Using the Elostat averaging causes all the ratings to "contract" towards their average value, with the percentage contraction depending on the spread of the ratings of the players. This is what we observe in the present example.
I believe BayesElo handles this issue properly. However due to the use of a "prior" assumed result and perhaps also to the special treatment of draws, their ratings also contract towards the mean for entirely different reasons. It just so happens that given the spread of ratings of your field, the draw percentage, and the size of the sample the two methods produce very similar ratings. If you played a million games, or if you included engines a thousand points weaker than most, the two methods might not be close at all.
I think the fact that you have all the engines play the same field makes the BayesElo calculations fair. The resultant ratings are somewhat "contracted" from what they might be if bayeselo did things like elostat but without the use of th average of the field, but I regard that as a good thing since it's clear that computer vs. computer ratings overstate rating differences anyway. The contraction is fair to all, it doesn't distort the rankings. I think the use of bayeselo causes more problems for those testing organizations that have widely varying sample sizes and opposition strength for different engines, but elostat is worse.
Bottom line: don't change anything!
Larry