Binomial distribution for chess statistics

lantonov · Post by **lantonov** » Fri Mar 03, 2017 5:45 pm

In statistical tests for chess results (SPRT, Elo and LOS calculations, etc.), most people naturally choose trinomial distribution (wins, draws, losses) or some normal approximation to it. Trinomial distribution has two population parameters (probabilities): win probability and draw probability. For example, in the following formula for trinomial distribution:

Probability(wins, draws, losses) = N! / wins! / draws! / losses! * win_probability^wins * draw_probability^draws * (1 - win_probability - draw_probability)^losses

win_probability and draw_probability are the respective probabilities in the population (all possible chess games). In practice, the true win_probability and draw_probability can never be known as the number of possible chess games is huge. In such cases, which are common across all fields, the population probabilities are substituted with maximal likelihoods which are essentially the best estimates of the population probabilities derived from a sample (i.e. probability statistics).

With binomial distribution,

Probability(successes, failures) = N! / r! / (1-r)! * p^r * (1-p)^(1-r)

it is easy to show that the maximal likelihood for its single parameter (p) is r / N where r is the number of successes (1) and N is the sample size. Namely, we write its likelihood function (with Cr = N! / r! / (1-r)!):

L(p | r) = Product from i=1 to N of Cri * p^ri * (1-p)^(1-ri) = N * Cr * p^(Sum from i=1 to N of ri) * (1-p)^(Sum from i=1 to N of (1-ri)) = N * Cr * p^r * (1-p)^(N-r)

First, we note that according to the Law of Likelihood, if we have two likelihood functions L1, L2 and L1 = k * L2 where k is a coefficient constant wrt p, these two likelihood functions are equivalent. Therefore, if we are comparing different values of p using the same likelihood function, the leading term (N*Cr) becomes irrelevant and can be dropped. But let's leave it for now to see more explicitly how it drops out.

We can take the log (natural) of the likelihood function:

log(N * Cr * p^r * (1-p)^(N-r)) = log(N * Cr) + r * log(p) + (N-r) * log(1-p)

Then we take the derivative wrt p of this and set to 0:

d/dp of log(N * Cr) + r * log(p) + (N-r) * log(1-p) = r/p - (N-r)/(1-p) = 0

From the last equality we derive p = r / N. QED

Let's try the same procedure for trinomial distribution. With Cr = N! / wins! / draws! / losses! the likelihood function is:

L(win_probability, draw_probability | wins, draws) = N * Cr * win_probability^wins * draw_probability^draws * (1 - win_probability - draw_probability)^(N - wins - draws)

Logarithming we obtain:

log(N * Cr) + wins * log(win_probability) + draws * log(draw_probability) + (N - wins - draws) * log(1 - win_probability - draw_probability)

The total derivative of the likelihood function wrt win_probability is:

DL(win_probability, draw_probability) / D(win_probability) = dL / d(win_probability) + dL / d(draw_probability) * D(draw_probability) / D(win_probability)

where D denotes a total derivative and d denotes a partial derivative.

We do not have any idea if the draw_probability depends on the win_probability. In other words, we do not know if the term D(draw_probability) / D(win_probability) is different from zero or not. Intuition tells that the two are dependent: a player who is likely to win is also more likely to draw his opponent and vice versa. For the sake of argument, however, let's assume that the two are independent, i.e. D(draw_probability) / D(win_probability) = 0. Then the above formula simplifies to DL / D(win_probability) = dL / d(win_probability).

Then

dL / d(win_probability) = wins / win_probability - (N-wins-draws) / (1 - win_probability - draw_probability)

from where we obtain

ML(win_probability) = (wins - draw_probability * wins) / (N - draws)

and similarly:

ML(draw_probability) = (draws - win_probability * draws) / (N - wins)

We see that the maximal likelihoods for win_probability and draw_probability in all cases, even with the far-fetched assumption that the two probabilities are independent, are much different than what is commonly thought, i.e. ML(win_probability) is not wins / N and ML(draw_probability) is not draws / N.

In an effort to alleviate the problem with this "pesky" draw_probability, as it is perceived, some recommend (https://chessprogramming.wikispaces.com ... Statistics ) determining the draw probability from a large number of games played, e.g. from rating sites and online chess playing sites. However, the draw rate (draws / N) is different for different time controls, for opponents with different strengths, for humans and for engines, and a whole host of other conditions. For the Bayesian treatment (BayesElo) similarly the drawelo is determined from a draw rate. We see that these estimates are to a large extent arbitrary and cannot serve for maximum likelihood estimates of the respective probabilities. This shows that the underlying model is wrong in not taking into account the true relationships between the two parameters.

Hereby I propose another model that naturally leads to binomial distribution without any assumptions. It is based not on results (+, =, -) but on points such as the usual point system adopted in chess. The sample unit for this model is half-point. In each game the players strive to win 2 half-points. A won game is 2 half-points to the winner, a draw is 1 half-point each and a loss is 0 half-points to the loser. The score for each player is just the sum of his half-points. So, if a result from 11 games is +3 =6 -2, for the winning player this is 12 half-points (6 + 3*2), and for the losing player it is 10 half-points (6 + 2*2) from a total of 22 half-points (11*2). This naturally leads to the binomial distribution:

Probability(draws+2*wins, draws+2*losses) = (2*N)! / (draws + 2*wins)! / (draws + 2*losses)! * win_probability^(draws + 2*wins) * (1 - win_probability)^(draws + 2*losses)

As in every binomial distribution, we have only one parameter, win_probability, whose maximal likelihood, as found by the way outlined above for binomial distribution is:

ML(win_probability) = (draws + 2*wins) / (2*N)

abulmo2 · Post by **abulmo2** » Fri Mar 03, 2017 8:15 pm

lantonov wrote:We see that the maximal likelihoods for win_probability and draw_probability in all cases, even with the far-fetched assumption that the two probabilities are independent, are much different than what is commonly thought, i.e. ML(win_probability) is not wins / N and ML(draw_probability) is not draws / N.

You are wrong here.
1) The probabilities are not independent but are constrained by p(wins) + p(draws) + p(losses) = 1
2) Maximum Likelihood of p(wins)) = wins / N
An easy demonstration of 2) is to consider only p(wins) & 1-p(wins) as a binomial distribution (wins is the same random variable, being binomial or trinomial).
For another demonstration, you can read, for example p 12-13 of http://www.cs.ubc.ca/~murphyk/Teaching/ ... noulli.pdf

lantonov · Post by **lantonov** » Fri Mar 03, 2017 8:58 pm

You are right that the method using Langrange multiplier includes all the dependencies between the probabilities and this leads to the canonical formulas for the maximal likelihoods.

But my proposal still stands because it avoids estimating (or guessing) the draw probability.

lantonov · Post by **lantonov** » Fri Mar 03, 2017 10:41 pm

In light of the above comment I have to correct the derivation for multinomial (including trinomial) distribution.

Let X be a random variable following multinomial distribution. Then,

P(X=x;n,p)=n! * Product from k=1 to K of pk^xk / xk!
is the multinomial distribution. xi is the number of successes of the k-th category in n random draws, where pk is the probability of success of the k-th category. Note that,

Sum from k=1 to K of xk = n
Sum from k=1 to K of pk = 1

For the estimation problem, we have N samples X1,…,XN drawn independently from above multinomial distribution. The log-likelihood is given as

L(p,n) = Sum from i=1 to N of log(P(xi,n,p))

where

log(P(xi,n,p)) = log(n!/ Product_k of xik!) + Sum_k of xik*log(pk)
Sum_n of log⁡(P(xi,n,p)) = C + Sum_k of Nk*log⁡(pk)
where Nk = Sum_n of xik is the total number of successes of k-th category in N samples.

For MLE estimate of p, assuming n is known, we solve the following optimization problem:
max by p of L(p,n) with the constraint on p ==> Sum_k of pk = 1

Using equality constraint for variable reduction,

pK = 1 − Sum from k=1 to K-1 of pk

We have an unconstrained problem in K−1 variables. Compute gradient for stationary point as,

∂L(p,n)/∂pk = Nk/pk − NK/pK = 0
pk = Nk * pK / NK

Solving, with Sum from k=1 to K of pk = 1 gives MLE estimate for pk,

pk = Nk / (n*N)

Robert Pope · Post by **Robert Pope** » Fri Mar 03, 2017 11:15 pm

lantonov wrote:As in every binomial distribution, we have only one parameter, win_probability, whose maximal likelihood, as found by the way outlined above for binomial distribution is:

ML(win_probability) = (draws + 2*wins) / (2*N)

Isn't this where we started 20 years ago, and then people moved to trinomial because a binomial representation of a trinomial process didn't give satisfactory results?

lantonov · Post by **lantonov** » Sat Mar 04, 2017 12:00 am

Robert Pope wrote: Isn't this where we started 20 years ago, and then people moved to trinomial because a binomial representation of a trinomial process didn't give satisfactory results?

I am not aware of such old discussions. Looking at posts here in Talkchess, the oldest posts of 2007-2008 already talk about trinomial. People who turn to binomial advocate it in the form:

N! / wins! / losses! * p(wins)^wins * p(losses)^losses

excluding draws altogether.

I don't see why a model using only one MLE is worse than a model using 2 MLE. Intuitively, these 2 MLE will introduce a bigger error in the model in addition to the fact that for optimality we have to avoid including extra parameters.

Practical results can prove me wrong, however. If you remember some results or theory that speak contrary to the proposed model, I would be glad to direct me there.

lantonov · Post by **lantonov** » Sat Mar 04, 2017 8:07 am

Another problem with the model might be that the sample size is 2N while the actual games are N. However, the terms that decrease the P value: (draws + 2*wins) and (draws + 2*losses) also increase proportionally.
Overall, granularity is increased 2 times which hopefully results in better precision if there isn't any theoretical flaw.

abulmo2 · Post by **abulmo2** » Sun Mar 05, 2017 12:22 am

Robert Pope wrote:
lantonov wrote:As in every binomial distribution, we have only one parameter, win_probability, whose maximal likelihood, as found by the way outlined above for binomial distribution is:

ML(win_probability) = (draws + 2*wins) / (2*N)
Isn't this where we started 20 years ago, and then people moved to trinomial because a binomial representation of a trinomial process didn't give satisfactory results?

This is indeed similar to what is still used in the elo model, which defines the expected score of a player as:
E(score) = p(wins) + 0.5 p(draws) + 0 p(losses).
(I explicitly write the 0 p(losses) to show the trinomial behind the formula).
I let as an exercise for people to compute the variance of the mean score, and to see what happens for different draw rates and do the same on the Lyudmil Antonov's win_probablity with a binomial distribution.

lantonov · Post by **lantonov** » Sun Mar 05, 2017 11:10 am

Let's compare the 2 models for the Likelihood Ratio Test with the following example:
n = 72772.; w = 12970.; l = 12790.; d = 47012.;
H0 is w=l

Trinomial distribution with draw probability MLE taken from the sample (ignoring the constant coefficients):
pw = w/n; pd = d/n;
L1 = pw^w*pd^d*(1 - pw - pd)^l;
L0 = ((1 - pd)/2)^w*pd^d*((1 - pd)/2)^l;
L0/L1 = 0.53318484952
-2*log(L0/L1) = 1.2577742107

Binomial distribution as above:
pw = (d + 2*w)/2/n;
L1 = pw^(d + 2*w)*(1 - pw)^(d + 2*l);
L0 = 0.5^(d + 2*w)*0.5^(d + 2*l);
L0/L1 = 0.6406790561
-2*Log[L0/L1] = 0.8904532798

The test with trinomial distribution appears to be more powerful.

lantonov · Post by **lantonov** » Mon Mar 06, 2017 12:57 pm

abulmo2 wrote: I let as an exercise for people to compute the variance of the mean score, and to see what happens for different draw rates and do the same on the Lyudmil Antonov's win_probablity with a binomial distribution.

For the variance of the mean, we have to find first the variance and then divide it by N.
For the binomial distribution, this is easy to do:
Variance = N*p(win)*p(loss) = N*p(win)*(1-p(win))
Variance of the mean = p(win)*p(loss) = p(win)*(1-p(win))

For the trinomial distribution, we have to take into account both variances and covariances into a covariance matrix, so direct comparison with binomial is difficult. By your saying to compare for different draw rates, I suppose that you have in mind the so-called generalised variance for multinomial distribution:
N^(k-1)*Product of i=1 to k of p_i where k = 3 for trinomial

So for trinomial, generalised variance will be:
N^2*p(win)*p(draw)*p(loss) = N^2*p(win)*p(draw)*(1-p(win)-p(draw))

Variance of the mean will be N*p(win)*p(draw)*(1-p(win)-p(draw))

Looking at the actual numbers these still seem incomparable to me though.

Note that variances do not depend explicitly on the random variables: (draw+2*win) and (draw+2*loss) for the proposed binomial distribution.

Binomial distribution for chess statistics

Binomial distribution for chess statistics

Re: Binomial distribution for chess statistics

Re: Binomial distribution for chess statistics

Re: Binomial distribution for chess statistics

Re: Binomial distribution for chess statistics

Re: Binomial distribution for chess statistics

Re: Binomial distribution for chess statistics

Re: Binomial distribution for chess statistics

Re: Binomial distribution for chess statistics

Re: Binomial distribution for chess statistics