EloStat, Bayeselo and Ordo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: EloStat, Bayeselo and Ordo

Post by hgm »

michiguel wrote:It is what I asked up in the thread
"How is it explained that you copy the same results 4 times and obtained different results? that is not related to draws or anything like that. "

Kai reported that, and it looks like a drawback. I was curious why.
No, it is not a drawback, but the harsh reality of statistical life. If someone scores 3 out of 4, in most cases this is simply due to luck, and attaching too much meaning to it will significantly overrate him. But if he scores 300 out of 400, you can be pretty sure it is because he is was stronger.
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: EloStat, Bayeselo and Ordo

Post by Rémi Coulom »

hgm wrote:Sorry, I am mixing things up. It was Edmund who gave the link above, and also compiled the data. In particular the graph in

http://talkchess.com/forum/viewtopic.ph ... 76&t=42729
I remember that discussion, but it did not compare the predicition ability of different models.

Rémi
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: EloStat, Bayeselo and Ordo

Post by hgm »

It compared the correctness of the models, which should be the same thing. If the curve used by a model correctly decribes the WDL probabilities, it follows by pure math how well it will predict.
User avatar
Ozymandias
Posts: 1532
Joined: Sun Oct 25, 2009 2:30 am

Re: EloStat, Bayeselo and Ordo

Post by Ozymandias »

Did you check the Deloitte/FIDE Chess Rating Challenge?
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: EloStat, Bayeselo and Ordo

Post by hgm »

No, but I don't expect it to be of much interest. Determining ratings of humans is a completely different game, because their ratings vary in time.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: EloStat, Bayeselo and Ordo

Post by michiguel »

hgm wrote:
michiguel wrote:It is what I asked up in the thread
"How is it explained that you copy the same results 4 times and obtained different results? that is not related to draws or anything like that. "

Kai reported that, and it looks like a drawback. I was curious why.
No, it is not a drawback, but the harsh reality of statistical life. If someone scores 3 out of 4, in most cases this is simply due to luck, and attaching too much meaning to it will significantly overrate him. But if he scores 300 out of 400, you can be pretty sure it is because he is was stronger.
No, it is a drawback. If you score 3/4, the measure should give exactly the same value as 300/400. The error of the measure should be different.

Miguel
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: EloStat, Bayeselo and Ordo

Post by hgm »

michiguel wrote:No, it is a drawback. If you score 3/4, the measure should give exactly the same value as 300/400. The error of the measure should be different.
Unfortunately hard math tells us that is not true. Unless you are prepared to accept asymmetric error bars, which is the same thing as admitting you are using a faulty average. Calculate it yourself, if you don't believe it. The likelihood for heads probability p in a coin flip after observing 3 heads is (unnormalized) p^3 * (1-p). The expectation value of this distribution is NOT 3/4, but 2/3.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: EloStat, Bayeselo and Ordo

Post by Laskos »

hgm wrote:
michiguel wrote:No, it is a drawback. If you score 3/4, the measure should give exactly the same value as 300/400. The error of the measure should be different.
Unfortunately hard math tells us that is not true. Unless you are prepared to accept asymmetric error bars, which is the same thing as admitting you are using a faulty average. Calculate it yourself, if you don't believe it. The likelihood for heads probability p in a coin flip after observing 3 heads is (unnormalized) p^3 * (1-p). The expectation value of this distribution is NOT 3/4, but 2/3.
Yes, but this wouldn't explain some large differences for 90 and 360 games. By the way, if I am not wrong, the result is not exactly 2/3, more like 0.69. And the maximum likelihood is still at 3/4. As for 90 and 360, the results are 0.747 ans 0.749, would be hardly visible in ratings, which, it seems, are compressed in Bayeselo by the artifficial draw rules and priors.

Kai
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: EloStat, Bayeselo and Ordo

Post by hgm »

Laskos wrote:Yes, but this wouldn't explain some large differences for 90 and 360 games. By the way, if I am not wrong, the result is not exactly 2/3, more like 0.69. And the maximum likelihood is still at 3/4. As for 90 and 360, the results are 0.747 ans 0.749, would be hardly visible in ratings, which, it seems, are compressed in Bayeselo by the artifficial draw rules and priors.
INT from 0 to 1 p^3 (1-p) dp = [ 1/4 p^4 - 1/5 p^5 ] from 0 to 1 = 1/4 - 1/5 = 1/20

INT from 0 to 1 p^4 (1-p) dp = [ 1/5 p^5 - 1/6 p^6 ] from 0 to 1 = 1/5 - 1/6 = 1/30

E(p) = (1/30) / (1/20) = 2/3

The maximum likelihood is indeed at 3/4, but that is not what is significant for making accurate predictions. The prediction error is minimal in the sense of least squares only when you predict the average.

As for the difference betwen 90 and 360 games I would not know, without further looking into what games exactly these were. I suppose this was not a plain match between two opponents, because in that case indeed the effect of the prior should be negligible. Note furthermore that the prior can be switched off in BayesElo.

There is nothing 'artificial' about the 'draw rules' (assuming you mean double-counting of draws). This aspect of the model was confirmed by analysis of the actual computer data. The likelihood of a single draw is indeed equal to that of one win plus one loss within a reasonable accuracy. Any analysis that does not take account of that fact just sucks, in the sense that it expands the rating scale, predicting too extreme results between the top and bottom dwellers.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: EloStat, Bayeselo and Ordo

Post by Laskos »

hgm wrote:
Laskos wrote:Yes, but this wouldn't explain some large differences for 90 and 360 games. By the way, if I am not wrong, the result is not exactly 2/3, more like 0.69. And the maximum likelihood is still at 3/4. As for 90 and 360, the results are 0.747 ans 0.749, would be hardly visible in ratings, which, it seems, are compressed in Bayeselo by the artifficial draw rules and priors.
INT from 0 to 1 p^3 (1-p) dp = [ 1/4 p^4 - 1/5 p^5 ] from 0 to 1 = 1/4 - 1/5 = 1/20

INT from 0 to 1 p^4 (1-p) dp = [ 1/5 p^5 - 1/6 p^6 ] from 0 to 1 = 1/5 - 1/6 = 1/30

E(p) = (1/30) / (1/20) = 2/3

The maximum likelihood is indeed at 3/4, but that is not what is significant for making accurate predictions. The prediction error is minimal in the sense of least squares only when you predict the average.

As for the difference betwen 90 and 360 games I would not know, without further looking into what games exactly these were. I suppose this was not a plain match between two opponents, because in that case indeed the effect of the prior should be negligible. Note furthermore that the prior can be switched off in BayesElo.

There is nothing 'artificial' about the 'draw rules' (assuming you mean double-counting of draws). This aspect of the model was confirmed by analysis of the actual computer data. The likelihood of a single draw is indeed equal to that of one win plus one loss within a reasonable accuracy. Any analysis that does not take account of that fact just sucks, in the sense that it expands the rating scale, predicting too extreme results between the top and bottom dwellers.
Sorry, it's 2/3 indeed, the max. likelihood is 3/4. Hope the other two numbers are correct, which show that for some 90 to 360 games it's pretty irrelevant. Probably 1 draw equal to 1 win and 1 loss was badly analysed by computer data. How do you explain the blue dots compared to the green line in the second plot of Edmund and other empirical data? http://talkchess.com/forum/viewtopic.ph ... =&start=10
You have to admit that this draw rule is necessarily an approximation which may compress the predictions (and does it).

Kai