Name for elo without draws?

Michel · Post by **Michel** » Fri Sep 04, 2015 9:48 am

Isn't uniform prior by defenition what you have when you know nothing about the players?

A prior is the distribution of a parameter, which you generally do not know...

I consider that positing that every value of an unknown parameter is equally likely to be indeed a mathematical fiction.

If it were not then LOS based stopping would be valid for sequential testing (stop testing if the LOS for a uniform prior >=95%). And by now everyone knows it is not.

The best you can do is derive results which are, asymptotically, independent of the prior.

Michel · Post by **Michel** » Fri Sep 04, 2015 10:17 pm

I did some numerical computations and for non-uniform priors the LOS depends on the number of draws:

E.g. for the (non-normalized prior)

Code: Select all

&#40;4*score*&#40;1-score&#41;)**20

(this a prior that indicates that two engines are close in strength, a typical situation)

I get

Code: Select all

W=0 D=0 L=2 LOS=0.333
W=0 D=1 L=2 LOS=0.296
W=0 D=2 L=2 LOS=0.267
W=0 D=3 L=2 LOS=0.245
W=0 D=4 L=2 LOS=0.227
W=0 D=5 L=2 LOS=0.213
W=0 D=6 L=2 LOS=0.202
W=0 D=7 L=2 LOS=0.192
W=0 D=8 L=2 LOS=0.185
W=0 D=9 L=2 LOS=0.178
W=0 D=10 L=2 LOS=0.173
W=0 D=11 L=2 LOS=0.168
W=0 D=12 L=2 LOS=0.164
W=0 D=13 L=2 LOS=0.160
W=0 D=14 L=2 LOS=0.157

So stating that draws convey no information about LOS is a bit misleading.

Below is de code

Code: Select all

from __future__ import division
import scipy.integrate
import math

def uniform&#40;w,l&#41;&#58;
    return 1

def standard_prior&#40;w,l&#41;&#58;
    return w**10*l**10

def close_prior&#40;w,l&#41;&#58;
    d=1-w-l
    s=w+&#40;1/2&#41;*d
    return &#40;4*s*&#40;1-s&#41;)**20

def LOS_integrand&#40;prior,w,l,W,D,L&#41;&#58;
    return prior&#40;w,l&#41;*&#40;w**W&#41;*(&#40;1.0-w-l&#41;**D&#41;*&#40;l**L&#41;

def LOS_denominator&#40;prior,W,D,L&#41;&#58;
    return scipy.integrate.dblquad&#40;lambda l,w&#58; LOS_integrand&#40;prior,w,l,W,D,L&#41;,0.0,1.0,lambda w&#58;0.0,lambda w&#58;1.0-w&#41;

def LOS_numerator&#40;prior,W,D,L&#41;&#58;
    return scipy.integrate.dblquad&#40;lambda l,w&#58; LOS_integrand&#40;prior,w,l,W,D,L&#41;,0.0,1.0,lambda w&#58;0.0,lambda w&#58;min&#40;w,1.0-w&#41;)
    
if __name__=='__main__'&#58;
    W=0
    L=2
    prior=close_prior
    for D in xrange&#40;0,15&#41;&#58;
        print "W=%d D=%d L=%d LOS=%.3f" % &#40;W,D,L,LOS_numerator&#40;prior,W,D,L&#41;&#91;0&#93;/LOS_denominator&#40;prior,W,D,L&#41;&#91;0&#93;)

hgm · Post by **hgm** » Fri Sep 04, 2015 10:48 pm

This is fishy. What is 'score'? P(win) + 0.5*P(draw)? By requiring that score has a near-fixed value you force a correlation between P(win)and P(draw), so that experimental determination of P(draw) gives you information on P(win). In particular, if you require the score to be close to 50%, and there are almost no draws, this forces P(win) to be near equal to P(loss).

Suppose I take a prior 1/((P(win)-P(draw))**4+1e-24)). That basically forces P(win) = P(draw). And, surprise, surprise, suddenly I can determine P(win) by knowing nothing but the number of draws! Priors that force P(draw) ~ (P(win)*P(loss))**2 will also be pretty good in having draws support the notion that P(win) and P(loss) must be close.

These are not priors that are compatible with the premise that you knew nothing beforehand, and the result of the match is the only evidence. Any reasonable rating model that implies a relation between the three probabilities will obviously be falsified if you score a 99.9998% draws. This is exactly the pit Dan fell in: assuming that the players must be of equal strength, and thus have P(win) ~ P(loss), just because they score many draws.

Michel · Post by **Michel** » Fri Sep 04, 2015 11:27 pm

hgm wrote:This is fishy. What is 'score'? P(win) + 0.5*P(draw)? By requiring that score has a near-fixed value you force a correlation between P(win)and P(draw), so that experimental determination of P(draw) gives you information on P(win). In particular, if you require the score to be close to 50%, and there are almost no draws, this forces P(win) to be near equal to P(loss).

Suppose I take a prior 1/((P(win)-P(draw))**4+1e-24)). That basically forces P(win) = P(draw). And, surprise, surprise, suddenly I can determine P(win) by knowing nothing but the number of draws! Priors that force P(draw) ~ (P(win)*P(loss))**2 will also be pretty good in having draws support the notion that P(win) and P(loss) must be close.

These are not priors that are compatible with the premise that you knew nothing beforehand, and the result of the match is the only evidence. Any reasonable rating model that implies a relation between the three probabilities will obviously be falsified if you score a 99.9998% draws. This is exactly the pit Dan fell in: assuming that the players must be of equal strength, and thus have P(win) ~ P(loss), just because they score many draws.

I don't believe my prior is in anyway fishy (it does not force the score to be nearly constant, but rather assumes it is in the range [0.35,0.65] which is very reasonable). But my point was just to show that LOS is usually affected by draws, except for the special case of a uniform prior (and some other special priors).

You seem to attach some special value to a uniform prior. I claim that if you know nothing about a parameter you can _certainly not_ conclude that all values are equally likely and then draw probabilistic conclusions from it. If you do you will draw incorrect conclusions (like LOS based stopping).

Note that this discussion is about small samples. Asymptotically there is no problem.

hgm · Post by **hgm** » Sat Sep 05, 2015 9:24 am

Michel wrote:But my point was just to show that LOS is usually affected by draws, except for the special case of a uniform prior (and some other special priors).

What you call 'special priors' are priors that correspond to having no prior knowledge, and what you call 'usual' is priors that describe a correlation between P(draw) and other probabilities.

Obviously the number of draws provides information on P(draw). So as soon as you assume a prior that correlates P(draw) with P(win) and P(loss), the draws of course start to provide information on P(win) and P(loss) too, and hence affect the LOS. But there is nothing usual about that. Usually the ratio of P(win) and P(loss) is an independent quantity from P(draw). Only under that condition can the LOS be expected to be independent of the number of draws. (Note that this does not necessarily require the prior to be uniform.) Your prior does not respect that condition.

Michel · Post by **Michel** » Sat Sep 05, 2015 9:54 am

Well it seems that you put as an axiom that LOS should be independent of draws and then only want to accept priors for which this is true...

This is a circular reasoning...

Whether the real (unknown) prior satisfies your condition we cannot know.

Michel · Post by **Michel** » Sat Sep 05, 2015 11:03 am

Anyway for the record. Inspired by your post I checked the proof and it seems that independence of draws is valid for priors of the form

p(w/(l+w))q(d).

So d should be independent of w/(l+w). More or less as you suggested.

You cannot require that d is independent of w since you may also want symmetry in w and l and

cov(w,d)=cov(l,d)=0

contradicts d+w+l=1.

hgm · Post by **hgm** » Sat Sep 05, 2015 12:26 pm

Michel wrote:This is a circular reasoning...

Well, you call it circular, I called it obvious. I don't think we have any real disagreement. It is obvious that the number of observed draws affects the distribution of P(draw), and equally obvious that if P(draw) is related to P(win) and P(loss) through prior knowledge, it can also affect those, and thus the LOS.

The p(w/(l+w))q(d) is indeed the precise mathematical formulation of what I meant.

Note that the prior you first coined, which forces 'score' to be near 50%, implies that there can only be a large ratio between wins and losses if they draw a lot. That is not at all what I would call 'usual'. Usually when the win/loss ratio increases, the draw fraction diminishes as well.

Laskos · Post by **Laskos** » Sun Sep 06, 2015 11:14 am

hgm wrote:
Michel wrote:This is a circular reasoning...
Well, you call it circular, I called it obvious. I don't think we have any real disagreement. It is obvious that the number of observed draws affects the distribution of P(draw), and equally obvious that if P(draw) is related to P(win) and P(loss) through prior knowledge, it can also affect those, and thus the LOS.

The p(w/(l+w))q(d) is indeed the precise mathematical formulation of what I meant.

Note that the prior you first coined, which forces 'score' to be near 50%, implies that there can only be a large ratio between wins and losses if they draw a lot. That is not at all what I would call 'usual'. Usually when the win/loss ratio increases, the draw fraction diminishes as well.

Nice discussion, I had to refresh my undergrad knowledge which I lost. The choice of the prior can be empirical, applied to chess. Empirically longer tails Davidson model works, with

Code: Select all

Davidson&#58; 
d^2 = C*w*&#40;1 - w - d&#41;
w -> 1/2 &#40;1 - d + Sqrt&#91;1 - 2 d + ((-4 + C&#41; d^2&#41;/C&#93;)

We take only w>l solution. It indeed favors a diminishing closer to 1 w/l at higher draw rate. Example plot:

So, empirically: more draws mean closer to 1 w/l, closer the engines, LOS closer to 0.5.

That constant C complicates a bit the things, as it mildly depends on time control (or strength), but let's forget about it.

hgm · Post by **hgm** » Sun Sep 06, 2015 11:58 am

Indeed, using an Elo model postulates a relation between P(draw) and the others, so draws start to provide information. The Davidson model equates two draws to one win + one loss, so effectively observing draws has exactly the same consequences for the likelihoods as observing wins and losses.

Note, however, that the observation can sort of falsify the prior in practice, although in theory this is not possible. Observing a million draws out of a million + 2 games is not compatible with the Davidson model, which allows a maximum for P(draw) of ~0.63. Mindlessly cranking through the calculation then would give you a P(draw) of 0.63 as best predicting the observation, but even the probability that this 'maximum-likelihood' value predicts for the observation is infinitesimal. You get something similar when the prior tells you x is normally distributed with a standard deviation of 1.0 but unknown mean, and your only two samples have x=0 and x=100. The prior would suggest that you now know that the mean m of the distribution is 50 +/- 0.7. While the sensible conclusion would be that either the data was somehow corrupted, or the standard deviation was around 50 rather than 1.

Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?

Re: Name for elo without draws?