Throwing out draws to calculate Elo

hgm · Post by **hgm** » Thu Jul 02, 2020 9:45 pm

Dann Corbit wrote: ↑Thu Jul 02, 2020 9:39 pm Here are the first few lines of the raw data:
losses: 50099 wins: 49901 ties: 0 LOS: 0.265615 Elo diff: -0.687897
Look at that, A is not superior to B at all. But that means B has an LOS of 1-0.265615=0.734385
So, would you like to choose that one? It tells us that B looks stronger than A. And we have 100K games to prove it.

losses: 49948 wins: 50052 ties: 0 LOS: 0.628876 Elo diff: 0.361319
This one is only .6, so maybe a candidate you would like.

losses: 50060 wins: 49940 ties: 0 LOS: 0.352168 Elo diff: -0.416907
Again, A is not looking very strong here, but B is. 1-0.352168 =0.647832

losses: 50040 wins: 49960 ties: 0 LOS: 0.400141 Elo diff: -0.277938
B is .6

losses: 49872 wins: 50128 ties: 0 LOS: 0.790899 Elo diff: 0.889403
Not a bad cherry pick here.

losses: 50180 wins: 49820 ties: 0 LOS: 0.127473 Elo diff: -1.25073
Now we are getting somewhere. You should have told me to pick the 6th one, not the first.

losses: 50010 wins: 49990 ties: 0 LOS: 0.474785 Elo diff: -0.0694844
My goodness, even better. Let's choose the 7th.

Well, that is what I said, right? And disproves what you said. 100K games already, and the LOS is still nowhere near 0 or 1. No 0.99999... there. It nicely fluctuates around the range 0.16-0.84, as I predicted.

Now try it for a match of a million games. Or a billion games. It won't go up.

Pio · Post by **Pio** » Thu Jul 02, 2020 9:52 pm

hgm wrote: ↑Thu Jul 02, 2020 9:45 pm
Dann Corbit wrote: ↑Thu Jul 02, 2020 9:39 pm Here are the first few lines of the raw data:
losses: 50099 wins: 49901 ties: 0 LOS: 0.265615 Elo diff: -0.687897
Look at that, A is not superior to B at all. But that means B has an LOS of 1-0.265615=0.734385
So, would you like to choose that one? It tells us that B looks stronger than A. And we have 100K games to prove it.

losses: 49948 wins: 50052 ties: 0 LOS: 0.628876 Elo diff: 0.361319
This one is only .6, so maybe a candidate you would like.

losses: 50060 wins: 49940 ties: 0 LOS: 0.352168 Elo diff: -0.416907
Again, A is not looking very strong here, but B is. 1-0.352168 =0.647832

losses: 50040 wins: 49960 ties: 0 LOS: 0.400141 Elo diff: -0.277938
B is .6

losses: 49872 wins: 50128 ties: 0 LOS: 0.790899 Elo diff: 0.889403
Not a bad cherry pick here.

losses: 50180 wins: 49820 ties: 0 LOS: 0.127473 Elo diff: -1.25073
Now we are getting somewhere. You should have told me to pick the 6th one, not the first.

losses: 50010 wins: 49990 ties: 0 LOS: 0.474785 Elo diff: -0.0694844
My goodness, even better. Let's choose the 7th.
Well, that is what I said, right? And disproves what you said. 100K games already, and the LOS is still nowhere near 0 or 1. No 0.99999... there. It nicely fluctuates around the range 0.16-0.84, as I predicted.

Now try it for a match of a million games. Or a billion games. It won't go up.

It might go up if the number of games in the match is so big that the random function will pass one cycle

and starts repeating itself.

Dann Corbit · Post by **Dann Corbit** » Thu Jul 02, 2020 9:53 pm

Well, obviously the difference in wins and losses that drives LOS will stop getting larger up once we get to a large enough set of games.
I guess you have to be a physicist to understand that.

Dann Corbit · Post by **Dann Corbit** » Thu Jul 02, 2020 9:56 pm

Pio wrote: ↑Thu Jul 02, 2020 9:52 pm It might go up if the number of games in the match is so big that the random function will pass one cycle and starts repeating itself.

I am using the Mersenne Twister PRNG.
https://en.wikipedia.org/wiki/Mersenne_Twister

Pio · Post by **Pio** » Thu Jul 02, 2020 10:10 pm

Dann Corbit wrote: ↑Thu Jul 02, 2020 9:56 pm
Pio wrote: ↑Thu Jul 02, 2020 9:52 pm It might go up if the number of games in the match is so big that the random function will pass one cycle and starts repeating itself.
I am using the Mersenne Twister PRNG.
https://en.wikipedia.org/wiki/Mersenne_Twister

It could very well be a problem. The longer the cycles are the less random the algorithm will become. For example if rand(7) will return 7 as the next seed we will get 7, 7, 7 ... but preventing something like this will also make the algorithm less random because you will force it not to repeat itself. If it is the 32 bit version you will most likely get this problem.

I know because “a friend” and absolutely not me

did it in a stupid way a long time ago.

Pio · Post by **Pio** » Thu Jul 02, 2020 10:28 pm

Pio wrote: ↑Thu Jul 02, 2020 10:10 pm
Dann Corbit wrote: ↑Thu Jul 02, 2020 9:56 pm
Pio wrote: ↑Thu Jul 02, 2020 9:52 pm It might go up if the number of games in the match is so big that the random function will pass one cycle and starts repeating itself.
I am using the Mersenne Twister PRNG.
https://en.wikipedia.org/wiki/Mersenne_Twister
It could very well be a problem. The longer the cycles are the less random the algorithm will become. For example if rand(7) will return 7 as the next seed we will get 7, 7, 7 ... but preventing something like this will also make the algorithm less random because you will force it not to repeat itself. If it is the 32 bit version you will most likely get this problem.

I know because “a friend” and absolutely not me did it in a stupid way a long time ago.

Apparently I was wrong. The Mersenne twister has a very long period but for making that possible it will have to have a big state representation within and that will make it less random in the first iterations. Try generating a couple of million samples before you start the experiment.

hgm · Post by **hgm** » Thu Jul 02, 2020 10:30 pm

Dann Corbit wrote: ↑Thu Jul 02, 2020 9:53 pm Well, obviously the difference in wins and losses that drives LOS will stop getting larger up once we get to a large enough set of games.
I guess you have to be a physicist to understand that.

A mathematician would suffice.

But are we agreeing now that the LOS you get from a single match typically (i.e. in 68% of the cases) is between 0.16 and 0.84, and only occasionally (i.e. less than 5% of the cases) outside [0.025, 0.975], irrespective of how large the number of games is?

BTW, there is absolutely no difference between the error bars on the Elo calculation and the LOS. They are just the same thing, in a different disguise. If you are at the upper bound of a 95%-confidence error bar, you have a LOS of 97.5%, at the lower bound a LOS of 2.5% (the 2 x 2.5% you are removed from 0 and 1, respectively, making the 5% that you were outside the error bar).

Just like you could get very high LOS in some of a 100K matches, you can also get Elo calculations where the two engines ly very far outside each other's error bars. Try to do the Elo calculation of the result that gave you the highest LOS. The Elo calculation will suggest the Elos are significantly different just as much as the LOS did. (Of course the Elos will be very close, but with such a large number of games you will also know them very precisely.)

Milos · Post by **Milos** » Thu Jul 02, 2020 10:37 pm

Ajedrecista wrote: ↑Thu Jul 02, 2020 8:23 pm Although I want to advise: if you are measuring Elo with a fixed game match, never stop it before the match is finished or you will introduce a bias. Disgracefully, I can not prove it due to my limited statistics skills, but someone smarter can.

It's a Monty Hall variation. If you stop at a random point without knowing the score you don't introduce bias. If you stop when you already know the score you impact the result.

Milos · Post by **Milos** » Thu Jul 02, 2020 10:51 pm

Dann Corbit wrote: ↑Thu Jul 02, 2020 9:56 pm
Pio wrote: ↑Thu Jul 02, 2020 9:52 pm It might go up if the number of games in the match is so big that the random function will pass one cycle and starts repeating itself.
I am using the Mersenne Twister PRNG.
https://en.wikipedia.org/wiki/Mersenne_Twister

Marsenne Twister is hardly a very good PRNG. And it is pretty ancient.
Better use something like this:
https://en.wikipedia.org/wiki/Xorshift#xorshift+

MonteCarlo · Post by **MonteCarlo** » Thu Jul 02, 2020 11:55 pm

Ok, I'll bite, Dann (really important comma!)

Here's the percentage of trials from the data in your Access file that showed LOS greater than various thresholds.

What exactly is the problem, again?

Code: Select all

LOS_threshold                           percent_trials_greater_than_LOS_threshold
--------------------------------------- -----------------------------------------
0.100                                   0.900080000000
0.200                                   0.802730000000
0.300                                   0.700140000000
0.400                                   0.602300000000
0.500                                   0.500230000000
0.600                                   0.400240000000
0.700                                   0.301800000000
0.800                                   0.199260000000
0.900                                   0.098940000000
0.950                                   0.049170000000
0.990                                   0.009920000000
0.999                                   0.000870000000

Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo.

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo