Name for elo without draws?

bob · Post by **bob** » Thu Sep 03, 2015 2:04 am

Dann Corbit wrote:This is clearly wrong.
Ignoring a million draws is lunacy.
If those players play a game, the outcome will be a draw.
It would be utterly unsurprising if after a million and two games the next time, the one who lost won three games.

When the math says something utterly stupid, then the math is wrong.

What is the likelihood that A is better than B if B wins 2 and there are 1 million draws? Very low. B is clearly better. Perhaps only by .001 Elo, but better is better, it is not relative. That was good enough to crown a world champion for years. Needed to win match by 2. No limit on number of draws. No significant confidence, but probability says he who has the most wins is better. Perfectly rational when you think about "which is better" and ignore anything about "how MUCH better?"

In fact, fabricate a pgn collection that starts with 2 wins for A. Run it through baseless. Then add 1,000 draws. The Elos will be closer, but A will still have a higher Elo. Add 1M draws. Same thing. A is better, but now by a much smaller amount. But better means "better".

mvk · Post by **mvk** » Thu Sep 03, 2015 8:57 am

michiguel wrote:
What you then get I now call 'pseudo-elo', but I'm wondering if there is a standard name for that quantity already.
I will use Wilo in the next version of Ordo (I may even change the name of the program not to confuse things) with a different model as I mentioned before.
http://www.talkchess.com/forum/viewtopi ... ilo#593262

Ah yes! That is what I was looking for! Wilo is a perfect name.

The reason I'm encountering this is that I'm toying with a model evaluation function for study purposes with an alternative draw & contempt concept. It now has a draw model which assumes there is an intrinsic skill difference between the players which can be expressed as a constant W/L ratio, and there an additional draw contribution which depends on the position and which can be evaluated just like the score. (The real draw ratio also depends on the time control, but that can be ignored for move selection purposes. But it could be accounted for when using PGN game results for eval tuning).

You can think of it as the resulting elo when individual games are replaced by sudden death rematches. That also doesn't change the W/L ratio, but removes the draws from the equation.

hgm · Post by **hgm** » Thu Sep 03, 2015 10:53 am

Dann Corbit wrote:When the math says something utterly stupid, then the math is wrong.

Not always. Math can be proven. Stupidity is just an opinion. Math can be wrong, but only if it can be proven wrong.

Dann Corbit wrote:It would be utterly unsurprising if after a million and two games the next time, the one who lost won three games.

That depends on your 'surprise threshold'. It is just as surprising as that after a 2-0 the next three games go to the loser (for 2-3). Perhaps not very, but the likelihood is nowhere near 50%.

The proper math is this: if you did know nothing about the players, the probability p that A beats B will have a flat ('homogeneous') prior likelihood of 1 (i.e. distribution P(p) = 1 for all p in [0,1]) The likelihood that the hypothesis that "the win probability for A is p" is correct will be favored with a factor equal to the probabily that it predicted the observed outcome. We have two outcomes, both wins, and the predicted probability for a single win is p. So the prior (1) is multiplied by p*p. After normalization the likelihood for p then becomes 3*p*p. (The integral 3*p*p from p=0 to p=1 equals 1, so 3 was the correct normalization factor.)

The probability that the next game will also be a win (i.e. 3-0) is the average for p under this likelihood distribution, integral (3*p*p)*p from 0 to 1, which equals 3/4. Of all cases where you see two players totally unknown tp you reach a 2-0 score in their first two games, only 25% of those will see the score rebound to 2-1 (if draws are ignored), and in 75% of the cases the score will go to 3-0. The chances that it will get to 2-3 are 1/4^3 = 1/64. Whether is is 'utterly unsurprising' that something with a likelihood of 1 in 64 happens is up to you. I for one, would not be willing to bet my life savings on it...

The 'likelihood of superiority' can be calculated from the posterior likelihood distribution; the likelihood that the losing player is better encompasses all cases where 0 <= p < 0.5, so it is the integral of 3*p*p from 0 to 0.5. As the antiderivative of 3*p*p is p*p*p (p-cubed), this is 1/2^3 = 1/8. So as Rein already pointed out, the LOS of the winning player is 7/8. We see that the likelihood to lose a game (25%) is larger than the likelihood that he is weaker (12.5%). This is of course to be expected; even if it would be 100% certain that he is better, he could still lose a substantial fraction of the games. (As long as that is less than 50%.)

There is nothing wrong with this math. If you don't believe it, just try a simulation: Pick a random number P between 0 and 1, representing the probability that A beats B. Pick two random numbers x1 and x2 between 0 and 1, and consider them wins for A when xi < P. Now pick a another random x3, and consider it a win for A in the third game when x3 < P.

Repeat this a few million times, taking statistics as follows:
* keep separate histograms for the cases P<0.5 and P>0.5 (B superior vs A superior), say A[][] and B[][]
* for each case keep a 2-dimensional array of counters, using the score from the first two games (0-2, 1-1 or 2-0) as first index, and the score of the third game (0 or 1) as second index.

At the end, compare:
* the number of times B wins the 3rd game when A won the first two vs the number of times he lost it (A[2][1]+B[2][1] vs A[2][0] vs B[2][0]).
* The number of times A was superior when he won the first two games (irrespective of third), vs the number of times B was superior (A[2][0] + A[2][1] vs B[2][0] + B[2][1]).

Finally, tell us what you observed.

If you think the probability to draw matters, you can also simulate a flat prior by calling the PRNG 3 times, for win, draw and loss probability, reject the attempt if the sum of those three is larger than one, and normalize by dividing all through their sum if the sum was smaller than one. Each draw for a game should then be considered a win for A when it is smaller than Pwin, a loss for A when it is larger than 1-Ploss, and a draw otherwise. You can then discard any result of a run of 1M+2 games that did not have 1M draws, but you would have to make billions of runs of a million games to find a significant number of those. So perhaps it would be more productive to first consider the case where there were 1000 draws in 1002 games, so that you will already have some such runs after 1000 tries, and then do a few million of tries to get statistically significant numbers. This way you can easily convince yourself that it does not matter at all for the likeihood the 3rd win will go to B whether it took 100, 1000 or 10,000 draws to get that many wins.

Rebel · Post by **Rebel** » Thu Sep 03, 2015 11:08 am

HGM in theory is right.

One day we will arrive at the level comps doing 200 plies, then you get a million draws and 2-3 occasional wins which proofs that engine is stronger.

Michel · Post by **Michel** » Thu Sep 03, 2015 11:24 am

The reason I'm encountering this is that I'm toying with a model evaluation function for study purposes with an alternative draw & contempt concept. It now has a draw model which assumes there is an intrinsic skill difference between the players which can be expressed as a constant W/L ratio, and there an additional draw contribution which depends on the position and which can be evaluated just like the score. (The real draw ratio also depends on the time control, but that can be ignored for move selection purposes. But it could be accounted for when using PGN game results for eval tuning).

Nice approach! Hopefully it will make it possible to understand better the meaning of contempt.

Speaking of contempt and time control. On Fishtest someone once tested a patch which made contempt a function of relative remaining time. It did not work in the form tested but still something should be in the idea since time is a resource and if you have more of it than your opponent then you should be entitled to use contempt (and also in the other direction of course).

Kotlov · Post by **Kotlov** » Thu Sep 03, 2015 11:50 am

I think Elo without draw result is good idea.

Imagine, some engine make only draw with any other engines.

If this engine play only in top league he would have an average rating from top engines. But if this engine play in bottom league he would have an average rating from bottom engines.

mvk · Post by **mvk** » Thu Sep 03, 2015 5:40 pm

Michel wrote:Speaking of contempt and time control. On Fishtest someone once tested a patch which made contempt a function of relative remaining time. It did not work in the form tested but still something should be in the idea since time is a resource and if you have more of it than your opponent then you should be entitled to use contempt (and also in the other direction of course).

In my draw model "(effective) time control" goes into the draw rate, and "skill difference" goes into the "raw" evaluation. Both then get combined into an expected outcome which becomes the resulting evaluation.

I therefore think the fishtest proposal you describe can work at best when the opponent is pondering, and the effect would then scale with the ponder hit ratio: The player with a time advantage can dictate the effective time control for the game, and with that the draw ratio. If he is ahead on the board, he can choose to play at the opponent's rate and reduce the draw ratio. If he is behind on the board, he can choose to follow his own clock. And anywhere in between. But in each cases while doing so he sacrifices his own elo more than the opponent's, because your own thinking time is more important than the opponents pondering time. So I can imagine that the combined effect (dictating the draw ratio vs losing some "wilo") doesn't make this an interesting proposition.

Dann Corbit · Post by **Dann Corbit** » Thu Sep 03, 2015 8:50 pm

bob wrote:
Dann Corbit wrote:This is clearly wrong.
Ignoring a million draws is lunacy.
If those players play a game, the outcome will be a draw.
It would be utterly unsurprising if after a million and two games the next time, the one who lost won three games.

When the math says something utterly stupid, then the math is wrong.
What is the likelihood that A is better than B if B wins 2 and there are 1 million draws? Very low. B is clearly better. Perhaps only by .001 Elo, but better is better, it is not relative. That was good enough to crown a world champion for years. Needed to win match by 2. No limit on number of draws. No significant confidence, but probability says he who has the most wins is better. Perfectly rational when you think about "which is better" and ignore anything about "how MUCH better?"

In fact, fabricate a pgn collection that starts with 2 wins for A. Run it through baseless. Then add 1,000 draws. The Elos will be closer, but A will still have a higher Elo. Add 1M draws. Same thing. A is better, but now by a much smaller amount. But better means "better".

You are right. I had not considered "How much better."
I do repent in dust and ashes.

Michel · Post by **Michel** » Fri Sep 04, 2015 9:07 am

Not always. Math can be proven. Stupidity is just an opinion. Math can be wrong, but only if it can be proven wrong.

Of course math doesn't ly. But sometimes things go wrong when applying math to the real world...

In this case the independence of LOS of draws is proved under a uniform prior which is kind of a mathematical fiction. So it cannot be considered literally true I believe.

However the drawless chess argument shows the result is true asymptotically(!) for an arbitrary prior.

It would be interesting to compute (some initial terms of) an asymptotic expansion for LOS in case of an arbitrary prior. I think I can do it.

hgm · Post by **hgm** » Fri Sep 04, 2015 9:28 am

Isn't uniform prior by defenition what you have when you know nothing about the players? I would say it is not so much a mathematical fiction as a condition stipulated by the formulation of the problem. I agree that if you were testing coins for doing coin flips the situation would be different. There 2 times head would not make it significanty more unlikely the third flip would produce tails. Because you know that there exist so many more fair coins than loaded coins, and that the mechanics of the flip process makes it actually quite hard to produce a bias.

I am not sure what you mean with "drawless chess argument". But I think it is pretty obvious that draws do not affect the LOS. 'Superiority' is defined as (the sign of) the difference between P(win) vs P(loss); P(draw) does not enter the definition. So as long as P(draw) does not imply anything about P(win) and P(loss) other than their sum (which of course has to be 1-P(draw), it is utterly irrelevant; the difference between them could be of any sign irrespective of their sum. Only losses and wins can provide you information on P(win) vs P(loss).

Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?