Throwing out draws to calculate Elo

hgm · Post by **hgm** » Mon Jul 06, 2020 1:52 pm

Dann Corbit wrote: ↑Mon Jul 06, 2020 12:55 pm The only one that really puzzles me is hgm, I mean you are a physicist. The Heisenburg uncertainty principle, the laws of thermodynamics, quantum physics, Schrodinger's cat. If there is any person on this board who should understand uncertainty, it is hgm, but you have no understanding of uncertainty whatsoever, or pretend not to have it.

Anyone who says that 8 values out of ten to the one hundredth power are significant is either delusional or deliberately dense. I don't see you as either, so I do not understand your responses.

But hey, to each his own

Indeed, such a thing would worry any intelligent agent. But I guess it is normal for a stone to not only discard the words of a single God, but of the entire Pantheon.

Only a completely deluded person would think that studying 10^100 ducks would teach him anything about the habits of a lion... Even studying one lion would teach you more. Let alone eight.

Pio · Post by **Pio** » Mon Jul 06, 2020 2:57 pm

Dann Corbit wrote: ↑Mon Jul 06, 2020 12:55 pm The only one that really puzzles me is hgm, I mean you are a physicist. The Heisenburg uncertainty principle, the laws of thermodynamics, quantum physics, Schrodinger's cat. If there is any person on this board who should understand uncertainty, it is hgm, but you have no understanding of uncertainty whatsoever, or pretend not to have it.

Anyone who says that 8 values out of ten to the one hundredth power are significant is either delusional or deliberately dense. I don't see you as either, so I do not understand your responses.

But hey, to each his own

It seems I am delusional as well

and as I have said before it won’t help us or you if you put 10^100 of normal people Next to us either. We will be as delusional as before neither more nor less

Alayan · Post by **Alayan** » Mon Jul 06, 2020 3:08 pm

LoS should not be used in patch testing for engines because the uniform prior is wrong. A correct prior is a distribution curve that says "your patch most likely lose some elo, if it gains any, it's most likely a small value". The prior tells us that if your test shows +10 elo and 99% LoS, this performance is almost certainly a very lucky run for a patch that's far from this good, and you should still run more games.

Using LoS with uniform prior when the uniform prior hypothesis is known to be wrong is a user error, not a problem with the LoS mathematical model.

And mixing "likelihood of superiority" (how likely) with "mean estimated superiority" (how much) is just wrong.

As is assuming that more draws would mean any decisive results are more random. Not only are the probabilities between equal players to score some wins in a row independent of draws, but between unequal players, for a given strength difference, the higher the draw rate the more likely it is that the strongest player came on top.

Say you know engine A scores on average 55% over engine B.

Assuming independence of games, if the draw rate is known to be 0%, if you get 10-0, you have a (0,45^10)/(0,55^10) = 13,44% chance that engine B is the one that managed to score the 10-0.

But if you add draws, then each and every draw takes away from B as much wins as from A and the ratio of wins gets more favourable for A. If the draw rate is known to be 80% and you get 10-0 + 40 draws, then there is only a (0,25^10)/(0,75^10) = 0,0017% chance that engine B is the one that managed to score the 10-0.

In practice, you won't know the exact expected draw rate and scores and can't do exact computations, but the point remains.

hgm · Post by **hgm** » Mon Jul 06, 2020 3:41 pm

Note that you in fact say that one should not believe the high LOS or Elo of an 8-0 score (for patch acceptance), because the large Elo increase it predicts violates the prior knowledge that your patch cannot possibly have increased Elo by that much. But if it would have been 9992 draws + 8-0, the same LOS would be unsuspect, because it only corresponds to a 0.3 Elo increase, and the prior in such a narrow region around 0 can be considered approximately flat.

So the additional draws make the 8-0 more decisive.

Alayan · Post by **Alayan** » Mon Jul 06, 2020 4:04 pm

Yes, that's correct.

Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo