Throwing out draws to calculate Elo

hgm · Post by **hgm** » Fri Jul 03, 2020 1:01 pm

Dann Corbit wrote: ↑Fri Jul 03, 2020 1:16 amMy difficulty is this:
If my opponent is better than me, it is difficult even to achieve a draw, especially if he/she/they/it are a lot better than me.
And so, if I see one hundred draws, that seems to be sending me a big signal of "Equality, equality, equality..." and so collecting an enormous amount of this kind of data indicates equality to me. On the other hand, I must admit that the Monte Hall problem and the Birthday paradox were hard for me to understand until I really understood properly the math behind them. But I cannot accept it until I understand it. If I am wrong, I do hope to somehow understand why the equality data does not matter. My problem is that I suspect the model (not the math). So even though the math works, I do not feel convinced that it is right.

But if your opponent is only marginally better than you, it should not be exceptionally difficult to draw him. Especially if you are both playing so well that both of you hardly ever make a losing error. But when you blunder once every 100 games, and he never does, would you be willing to admit that he is still somewhat stronger than you? That you had so many draws doesn't suggest in any way he must blunder as frequently/infrequently as you do.

I don't see anything strange or non-intuitive in this. "We both blunder very infrequently. So we must blunder equally often". That seems pure and obvious nonsense to me. It doesn't seem to me that it would require any mathematical background to see recognize that.

syzygy · Post by **syzygy** » Fri Jul 03, 2020 5:22 pm

hgm wrote: ↑Fri Jul 03, 2020 1:01 pm
Dann Corbit wrote: ↑Fri Jul 03, 2020 1:16 amMy difficulty is this:
If my opponent is better than me, it is difficult even to achieve a draw, especially if he/she/they/it are a lot better than me.
And so, if I see one hundred draws, that seems to be sending me a big signal of "Equality, equality, equality..." and so collecting an enormous amount of this kind of data indicates equality to me. On the other hand, I must admit that the Monte Hall problem and the Birthday paradox were hard for me to understand until I really understood properly the math behind them. But I cannot accept it until I understand it. If I am wrong, I do hope to somehow understand why the equality data does not matter. My problem is that I suspect the model (not the math). So even though the math works, I do not feel convinced that it is right.
But if your opponent is only marginally better than you, it should not be exceptionally difficult to draw him. Especially if you are both playing so well that both of you hardly ever make a losing error. But when you blunder once every 100 games, and he never does, would you be willing to admit that he is still somewhat stronger than you? That you had so many draws doesn't suggest in any way he must blunder as frequently/infrequently as you do.

I don't see anything strange or non-intuitive in this. "We both blunder very infrequently. So we must blunder equally often". That seems pure and obvious nonsense to me. It doesn't seem to me that it would require any mathematical background to see recognize that.

For some reason Dann constructed an example (8 wins, 0 losses, 10^100-8 draws) that, for anyone with some experience with chess engines, clearly is not a realistic outcome of a match between chess engines. Then he somehow uses the strangeness of the example he himself constructed as an argument that LOS does not make sense.

Of course when you take the same example but consider it to be the outcome of a match between Tic Tac Toe engines (all draws except when cosmic radiation interferes), the strangeness falls away completely and it becomes clear that 8 wins and 0 losses strongly suggests that the winning engine is somehow less sensitive to cosmic radiation (perhaps it runs on a system with ECC memory).

hgm · Post by **hgm** » Fri Jul 03, 2020 6:55 pm

Or it could perhaps have more robust program code, which checks and double-checks its calculations for consistency, even where the machine architecture would have guaranteed no error could have developed.

But I think part of Dann's problem is that he doesn't recognize 8-0 as a decisive score under any circumstance. To him it is always 8 games = noise, and he is not able to see how different 8-0 is from 5-3. Even at 99-0 he would probably say "this is less than 1000 games, so the result is meaningless, and the next 99 games could just as easily go in the other direction...

This turned out already years ago, when discussing the Shredder-Ginkgo WCCC results. But unfortunately Dann is not able to learn from what he is told.

syzygy · Post by **syzygy** » Fri Jul 03, 2020 7:00 pm

hgm wrote: ↑Fri Jul 03, 2020 6:55 pm But I think part of Dann's problem is that he doesn't recognize 8-0 as a decisive score under any circumstance. To him it is always 8 games = noise, and he is not able to see how different 8-0 is from 5-3. Even at 99-0 he would probably say "this is less than 1000 games, so the result is meaningless, and the next 99 games could just as easily go in the other direction...

Then I wonder if he considers that 1000-0 with 10^10^1000-1000 draws says anything about which engine is stronger. Surely, 1000 wins in 10^10^1000 games are even more clearly random noise than 8 wins in a measly 10^100 games?

Dann Corbit wrote: So let's do a gedankenexperiment:
Engine A plays Engine B one goolgol (10^100) times.
There are 10^100 - 8 draws and 8 wins for engine B.
Standard calculation would make engine B much stronger and also give a very large LOS for engine B.
But this is totally absurd.
If we watched games for many lifetimes between engine A and engine B, we would (almost surely) never see anything but a draw, despite engine B's much larger Elo and LOS.
At this point, the 8 wins are clearly random noise.

hgm · Post by **hgm** » Fri Jul 03, 2020 7:17 pm

I would not be surprised at all if he still thinks that.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 03, 2020 9:03 pm

syzygy wrote: ↑Fri Jul 03, 2020 7:00 pm
Then I wonder if he considers that 1000-0 with 10^10^1000-1000 draws says anything about which engine is stronger. Surely, 1000 wins in 10^10^1000 games are even more clearly random noise than 8 wins in a measly 10^100 games?

Yes, in fact I am even more sure that those engines have exactly the same strength.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 03, 2020 9:11 pm

I imagine now that you think I am even more crazy than before. Look! It happed 1000 times that A won and B lost and B never won a single game.
That assumes, of course, that there is no randomness and no margin of error. That 1000 wins is buried under a massive error band, probably a googol games wide.

syzygy · Post by **syzygy** » Fri Jul 03, 2020 11:10 pm

Dann Corbit wrote: ↑Fri Jul 03, 2020 9:11 pm I imagine now that you think I am even more crazy than before. Look! It happed 1000 times that A won and B lost and B never won a single game.
That assumes, of course, that there is no randomness and no margin of error. That 1000 wins is buried under a massive error band, probably a googol games wide.

I also imagine that you did not take the time to read and understand the Tic Tac Toe explanation.

And perhaps you might be sensitive to this analysis of the situation:
http://talkchess.com/forum3/viewtopic.p ... 40#p849781
However, it requires that you really study it until you understand it.

syzygy · Post by **syzygy** » Fri Jul 03, 2020 11:28 pm

Dann Corbit wrote: ↑Fri Jul 03, 2020 9:11 pm I imagine now that you think I am even more crazy than before. Look! It happed 1000 times that A won and B lost and B never won a single game.
That assumes, of course, that there is no randomness and no margin of error. That 1000 wins is buried under a massive error band, probably a googol games wide.

The outcome of the match is that each of the 1000 decided games was won by A and thus lost by B.

How does that "assume that there is no randomness and no margin of error"? The outcome of the match is the outcome of the match, nothing more, nothing less.

I cannot make sense out of what you are saying.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 03, 2020 11:35 pm

Suppose you play ten games. The result is 8 wins 2 draws for engine A. A huge LOS for A. Will you decide that the change you made is worthwhile and incorporate the new code into your engine?

If not, why not? The LOS is enormous.

Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo