Likelihood of superiority

Rémi Coulom · Post by **Rémi Coulom** » Sun Nov 15, 2009 11:54 pm

mcostalba wrote:
Rémi Coulom wrote:
Note 1: draws do not count

Thanks a lot ! I was looking for something like this. But I have a big question: How draws do not count ?

I mean I could have in A vs A' 1 win, 0 loss or I could have 1 win 300 draws 0 loss. What we can infere in the two cases is totally different.

In the first case we can say almost nothing, while in the second case we can say that the two engines should not (with good probability) be so far apart.

Could you pelase explain why number of draws (that at the end is similar to say number of palyed matches) does not count ?

Thanks
Marco

The two cases you mention are very different in terms of estimating the strength difference between the two programs: in the first case you know very little about their relative playing strength, in the second case you know that they are likely to be very close in strength. But LOS is still the same.

A draw will at the same time make estimated Elo ratings closer to each other, and reduce the width of confidence intervals. It does this in such a way that the LOS does not change.

More precisely, here is a mathematical proof. Probabilities of win, draw, and loss are p_1, p_0.5, and p_0. Number of wins, draws, losses are n_1, n_0.5, and n_0. With a uniform prior:

Rémi

Uri Blass · Post by **Uri Blass** » Sun Nov 15, 2009 11:55 pm

mcostalba wrote:
Uri Blass wrote:
1)You ask the wrong question.
The probability that A is stronger than A' is always 1 or 0
and the result of the match does not change this probability.

I guess that you want to know what is the probability that the A' is better than A after you know the result of 500 games match between them(when A' is better than A mean that you can expect A' to beat A in a match).
Yes I mean that.

Uri Blass wrote: Without knowing a prior probability distribution it is impossible to calculate probability.
I have very big doubts on your last sentence. For instance suppose you have two engines A and A' and you know nothing about them. You play say 100.000 games between them and at the end the first engine scores 100 ELO points more then the second.

I think you can quite clearly assume that A is stronger then A' even without knowing anything about a prior probability of them.

You can be practically sure that A is stronger than A' because every reasonable distribution lead to this conclusion.

By the same way you can say that if you throw a cube 100,000 times and it falls on 6 90,000 times the cube is not a fair cube but you cannot calculate exact probability.

You also cannot calculate exact probability that the cube is not a fair cube if you know the cube fell 5 times on 6 in the first 5 times that you throw it(in this case if your apriory assumption is that you are almost sure that the cube is fair you may continue to believe that the cube is fair inspite of the fact that it fell 5 times on the same side)
You need some apriory probability about the cube to calculate probabilities.

Uri

Rémi Coulom · Post by **Rémi Coulom** » Mon Nov 16, 2009 12:01 am

mcostalba wrote:
Rémi Coulom wrote:
Note 1: draws do not count

Thanks a lot ! I was looking for something like this. But I have a big question: How draws do not count ?

I mean I could have in A vs A' 1 win, 0 loss or I could have 1 win 300 draws 0 loss. What we can infere in the two cases is totally different.

In the first case we can say almost nothing, while in the second case we can say that the two engines should not (with good probability) be so far apart.

Could you pelase explain why number of draws (that at the end is similar to say number of palyed matches) does not count ?

Thanks
Marco

Also, there is another intuitive explanation:

Suppose you change the rules of the game, and call it chess*: whenever two players draw, they have to start playing a new game until the outcome is not a draw.

Being stronger at chess is equivalent to being stronger at chess*. With your data, if we consider chess*, the outcome is the same: 1-0.

Rémi

mcostalba · Post by **mcostalba** » Mon Nov 16, 2009 12:11 am

Rémi Coulom wrote: A draw will at the same time make estimated Elo ratings closer to each other, and reduce the width of confidence intervals. It does this in such a way that the LOS does not change.

Wow, this is absolutely a counterintuitive result (at least for me). I already knew statistic could be very counter intuitive sometime, but this really surprised me !

Thanks again for the explanation. I have learnt something this evening

mcostalba · Post by **mcostalba** » Mon Nov 16, 2009 12:14 am

Rémi Coulom wrote: Also, there is another intuitive explanation:

Suppose you change the rules of the game, and call it chess*: whenever two players draw, they have to start playing a new game until the outcome is not a draw.

Being stronger at chess is equivalent to being stronger at chess*. With your data, if we consider chess*, the outcome is the same: 1-0.

Rémi

Yes but you can get a 1-0 score after 1 game only or after 1000 games where the first 999 were draws !

Rémi Coulom · Post by **Rémi Coulom** » Mon Nov 16, 2009 12:15 am

mcostalba wrote:
Rémi Coulom wrote: A draw will at the same time make estimated Elo ratings closer to each other, and reduce the width of confidence intervals. It does this in such a way that the LOS does not change.

Wow, this is absolutely a counterintuitive result (at least for me). I already knew statistic could be very counter intuitive sometime, but this really surprised me !

Thanks again for the explanation. I have learnt something this evening

Well, if you try it in bayeselo, you will see that it is not perfectly true. In particular, bayeselo considers that playing first is an advantage, so a draw with black means you are likely to be stronger.

If we don't consider the advantage of playing first, I hope the chess* explanation can convince your intuition.

Rémi

michiguel · Post by **michiguel** » Mon Nov 16, 2009 12:48 am

mcostalba wrote:
Rémi Coulom wrote: A draw will at the same time make estimated Elo ratings closer to each other, and reduce the width of confidence intervals. It does this in such a way that the LOS does not change.

Wow, this is absolutely a counterintuitive result (at least for me). I already knew statistic could be very counter intuitive sometime, but this really surprised me !

Thanks again for the explanation. I have learnt something this evening

Because you are analyzing the probability to be better in a head to head competition, not how much better. If you think about it, it makes sense. I do not believe this extrapolates to estimate chances against a third player, though.

Miguel

michiguel · Post by **michiguel** » Mon Nov 16, 2009 12:51 am

Rémi Coulom wrote:
mcostalba wrote:
Rémi Coulom wrote: A draw will at the same time make estimated Elo ratings closer to each other, and reduce the width of confidence intervals. It does this in such a way that the LOS does not change.

Wow, this is absolutely a counterintuitive result (at least for me). I already knew statistic could be very counter intuitive sometime, but this really surprised me !

Thanks again for the explanation. I have learnt something this evening
Well, if you try it in bayeselo, you will see that it is not perfectly true. In particular, bayeselo considers that playing first is an advantage, so a draw with black means you are likely to be stronger.

If we don't consider the advantage of playing first, I hope the chess* explanation can convince your intuition.

Rémi

What do you mean exactly playing first is an advantage? In a match with 10 victories in a row and ten losses later you get a higher ELO?
Is is a time sensitive procedure?

Miguel

Rémi Coulom · Post by **Rémi Coulom** » Mon Nov 16, 2009 12:56 am

michiguel wrote: What do you mean exactly playing first is an advantage? In a match with 10 victories in a row and ten losses later you get a higher ELO?
Is is a time sensitive procedure?

No. All results are supposed independent. Order of games does not matter. By playing first, I meant playing White, ie moving first in one game.

Maybe I used this more general formulation because I am a Go programmer now. In Go, Black plays first.

Rémi

Rein Halbersma · Post by **Rein Halbersma** » Mon Nov 16, 2009 3:03 pm

Rémi Coulom wrote:More precisely, here is a mathematical proof. Probabilities of win, draw, and loss are p_1, p_0.5, and p_0. Number of wins, draws, losses are n_1, n_0.5, and n_0. With a uniform prior:

Rémi

Very interesting! Could you give a reference from the paper where this derivation is taken from? I couldn't find anything on your website. And how is this formula related to the piece of C++ code you provided at the beginning of this thread? It wasn't reall obvious what that code did. Some explanation would be appreciated!

Rein

Likelihood of superiority

Re: Likelihood of superiority

Re: Likelihood of superiority

Re: Likelihood of superiority

Re: Likelihood of superiority

Re: Likelihood of superiority

Re: Likelihood of superiority

Re: Likelihood of superiority

Re: Likelihood of superiority

Re: Likelihood of superiority

Re: Likelihood of superiority