Binomial distribution for chess statistics
Moderators: bob, hgm, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Binomial distribution for chess statistics
In statistical tests for chess results (SPRT, Elo and LOS calculations, etc.), most people naturally choose trinomial distribution (wins, draws, losses) or some normal approximation to it. Trinomial distribution has two population parameters (probabilities): win probability and draw probability. For example, in the following formula for trinomial distribution:
Probability(wins, draws, losses) = N! / wins! / draws! / losses! * win_probability^wins * draw_probability^draws * (1  win_probability  draw_probability)^losses
win_probability and draw_probability are the respective probabilities in the population (all possible chess games). In practice, the true win_probability and draw_probability can never be known as the number of possible chess games is huge. In such cases, which are common across all fields, the population probabilities are substituted with maximal likelihoods which are essentially the best estimates of the population probabilities derived from a sample (i.e. probability statistics).
With binomial distribution,
Probability(successes, failures) = N! / r! / (1r)! * p^r * (1p)^(1r)
it is easy to show that the maximal likelihood for its single parameter (p) is r / N where r is the number of successes (1) and N is the sample size. Namely, we write its likelihood function (with Cr = N! / r! / (1r)!):
L(p  r) = Product from i=1 to N of Cri * p^ri * (1p)^(1ri) = N * Cr * p^(Sum from i=1 to N of ri) * (1p)^(Sum from i=1 to N of (1ri)) = N * Cr * p^r * (1p)^(Nr)
First, we note that according to the Law of Likelihood, if we have two likelihood functions L1, L2 and L1 = k * L2 where k is a coefficient constant wrt p, these two likelihood functions are equivalent. Therefore, if we are comparing different values of p using the same likelihood function, the leading term (N*Cr) becomes irrelevant and can be dropped. But let's leave it for now to see more explicitly how it drops out.
We can take the log (natural) of the likelihood function:
log(N * Cr * p^r * (1p)^(Nr)) = log(N * Cr) + r * log(p) + (Nr) * log(1p)
Then we take the derivative wrt p of this and set to 0:
d/dp of log(N * Cr) + r * log(p) + (Nr) * log(1p) = r/p  (Nr)/(1p) = 0
From the last equality we derive p = r / N. QED
Let's try the same procedure for trinomial distribution. With Cr = N! / wins! / draws! / losses! the likelihood function is:
L(win_probability, draw_probability  wins, draws) = N * Cr * win_probability^wins * draw_probability^draws * (1  win_probability  draw_probability)^(N  wins  draws)
Logarithming we obtain:
log(N * Cr) + wins * log(win_probability) + draws * log(draw_probability) + (N  wins  draws) * log(1  win_probability  draw_probability)
The total derivative of the likelihood function wrt win_probability is:
DL(win_probability, draw_probability) / D(win_probability) = dL / d(win_probability) + dL / d(draw_probability) * D(draw_probability) / D(win_probability)
where D denotes a total derivative and d denotes a partial derivative.
We do not have any idea if the draw_probability depends on the win_probability. In other words, we do not know if the term D(draw_probability) / D(win_probability) is different from zero or not. Intuition tells that the two are dependent: a player who is likely to win is also more likely to draw his opponent and vice versa. For the sake of argument, however, let's assume that the two are independent, i.e. D(draw_probability) / D(win_probability) = 0. Then the above formula simplifies to DL / D(win_probability) = dL / d(win_probability).
Then
dL / d(win_probability) = wins / win_probability  (Nwinsdraws) / (1  win_probability  draw_probability)
from where we obtain
ML(win_probability) = (wins  draw_probability * wins) / (N  draws)
and similarly:
ML(draw_probability) = (draws  win_probability * draws) / (N  wins)
We see that the maximal likelihoods for win_probability and draw_probability in all cases, even with the farfetched assumption that the two probabilities are independent, are much different than what is commonly thought, i.e. ML(win_probability) is not wins / N and ML(draw_probability) is not draws / N.
In an effort to alleviate the problem with this "pesky" draw_probability, as it is perceived, some recommend (https://chessprogramming.wikispaces.com ... Statistics ) determining the draw probability from a large number of games played, e.g. from rating sites and online chess playing sites. However, the draw rate (draws / N) is different for different time controls, for opponents with different strengths, for humans and for engines, and a whole host of other conditions. For the Bayesian treatment (BayesElo) similarly the drawelo is determined from a draw rate. We see that these estimates are to a large extent arbitrary and cannot serve for maximum likelihood estimates of the respective probabilities. This shows that the underlying model is wrong in not taking into account the true relationships between the two parameters.
Hereby I propose another model that naturally leads to binomial distribution without any assumptions. It is based not on results (+, =, ) but on points such as the usual point system adopted in chess. The sample unit for this model is halfpoint. In each game the players strive to win 2 halfpoints. A won game is 2 halfpoints to the winner, a draw is 1 halfpoint each and a loss is 0 halfpoints to the loser. The score for each player is just the sum of his halfpoints. So, if a result from 11 games is +3 =6 2, for the winning player this is 12 halfpoints (6 + 3*2), and for the losing player it is 10 halfpoints (6 + 2*2) from a total of 22 halfpoints (11*2). This naturally leads to the binomial distribution:
Probability(draws+2*wins, draws+2*losses) = (2*N)! / (draws + 2*wins)! / (draws + 2*losses)! * win_probability^(draws + 2*wins) * (1  win_probability)^(draws + 2*losses)
As in every binomial distribution, we have only one parameter, win_probability, whose maximal likelihood, as found by the way outlined above for binomial distribution is:
ML(win_probability) = (draws + 2*wins) / (2*N)
Probability(wins, draws, losses) = N! / wins! / draws! / losses! * win_probability^wins * draw_probability^draws * (1  win_probability  draw_probability)^losses
win_probability and draw_probability are the respective probabilities in the population (all possible chess games). In practice, the true win_probability and draw_probability can never be known as the number of possible chess games is huge. In such cases, which are common across all fields, the population probabilities are substituted with maximal likelihoods which are essentially the best estimates of the population probabilities derived from a sample (i.e. probability statistics).
With binomial distribution,
Probability(successes, failures) = N! / r! / (1r)! * p^r * (1p)^(1r)
it is easy to show that the maximal likelihood for its single parameter (p) is r / N where r is the number of successes (1) and N is the sample size. Namely, we write its likelihood function (with Cr = N! / r! / (1r)!):
L(p  r) = Product from i=1 to N of Cri * p^ri * (1p)^(1ri) = N * Cr * p^(Sum from i=1 to N of ri) * (1p)^(Sum from i=1 to N of (1ri)) = N * Cr * p^r * (1p)^(Nr)
First, we note that according to the Law of Likelihood, if we have two likelihood functions L1, L2 and L1 = k * L2 where k is a coefficient constant wrt p, these two likelihood functions are equivalent. Therefore, if we are comparing different values of p using the same likelihood function, the leading term (N*Cr) becomes irrelevant and can be dropped. But let's leave it for now to see more explicitly how it drops out.
We can take the log (natural) of the likelihood function:
log(N * Cr * p^r * (1p)^(Nr)) = log(N * Cr) + r * log(p) + (Nr) * log(1p)
Then we take the derivative wrt p of this and set to 0:
d/dp of log(N * Cr) + r * log(p) + (Nr) * log(1p) = r/p  (Nr)/(1p) = 0
From the last equality we derive p = r / N. QED
Let's try the same procedure for trinomial distribution. With Cr = N! / wins! / draws! / losses! the likelihood function is:
L(win_probability, draw_probability  wins, draws) = N * Cr * win_probability^wins * draw_probability^draws * (1  win_probability  draw_probability)^(N  wins  draws)
Logarithming we obtain:
log(N * Cr) + wins * log(win_probability) + draws * log(draw_probability) + (N  wins  draws) * log(1  win_probability  draw_probability)
The total derivative of the likelihood function wrt win_probability is:
DL(win_probability, draw_probability) / D(win_probability) = dL / d(win_probability) + dL / d(draw_probability) * D(draw_probability) / D(win_probability)
where D denotes a total derivative and d denotes a partial derivative.
We do not have any idea if the draw_probability depends on the win_probability. In other words, we do not know if the term D(draw_probability) / D(win_probability) is different from zero or not. Intuition tells that the two are dependent: a player who is likely to win is also more likely to draw his opponent and vice versa. For the sake of argument, however, let's assume that the two are independent, i.e. D(draw_probability) / D(win_probability) = 0. Then the above formula simplifies to DL / D(win_probability) = dL / d(win_probability).
Then
dL / d(win_probability) = wins / win_probability  (Nwinsdraws) / (1  win_probability  draw_probability)
from where we obtain
ML(win_probability) = (wins  draw_probability * wins) / (N  draws)
and similarly:
ML(draw_probability) = (draws  win_probability * draws) / (N  wins)
We see that the maximal likelihoods for win_probability and draw_probability in all cases, even with the farfetched assumption that the two probabilities are independent, are much different than what is commonly thought, i.e. ML(win_probability) is not wins / N and ML(draw_probability) is not draws / N.
In an effort to alleviate the problem with this "pesky" draw_probability, as it is perceived, some recommend (https://chessprogramming.wikispaces.com ... Statistics ) determining the draw probability from a large number of games played, e.g. from rating sites and online chess playing sites. However, the draw rate (draws / N) is different for different time controls, for opponents with different strengths, for humans and for engines, and a whole host of other conditions. For the Bayesian treatment (BayesElo) similarly the drawelo is determined from a draw rate. We see that these estimates are to a large extent arbitrary and cannot serve for maximum likelihood estimates of the respective probabilities. This shows that the underlying model is wrong in not taking into account the true relationships between the two parameters.
Hereby I propose another model that naturally leads to binomial distribution without any assumptions. It is based not on results (+, =, ) but on points such as the usual point system adopted in chess. The sample unit for this model is halfpoint. In each game the players strive to win 2 halfpoints. A won game is 2 halfpoints to the winner, a draw is 1 halfpoint each and a loss is 0 halfpoints to the loser. The score for each player is just the sum of his halfpoints. So, if a result from 11 games is +3 =6 2, for the winning player this is 12 halfpoints (6 + 3*2), and for the losing player it is 10 halfpoints (6 + 2*2) from a total of 22 halfpoints (11*2). This naturally leads to the binomial distribution:
Probability(draws+2*wins, draws+2*losses) = (2*N)! / (draws + 2*wins)! / (draws + 2*losses)! * win_probability^(draws + 2*wins) * (1  win_probability)^(draws + 2*losses)
As in every binomial distribution, we have only one parameter, win_probability, whose maximal likelihood, as found by the way outlined above for binomial distribution is:
ML(win_probability) = (draws + 2*wins) / (2*N)
Re: Binomial distribution for chess statistics
You are wrong here.lantonov wrote:We see that the maximal likelihoods for win_probability and draw_probability in all cases, even with the farfetched assumption that the two probabilities are independent, are much different than what is commonly thought, i.e. ML(win_probability) is not wins / N and ML(draw_probability) is not draws / N.
1) The probabilities are not independent but are constrained by p(wins) + p(draws) + p(losses) = 1
2) Maximum Likelihood of p(wins)) = wins / N
An easy demonstration of 2) is to consider only p(wins) & 1p(wins) as a binomial distribution (wins is the same random variable, being binomial or trinomial).
For another demonstration, you can read, for example p 1213 of http://www.cs.ubc.ca/~murphyk/Teaching/ ... noulli.pdf
Richard Delorme
Re: Binomial distribution for chess statistics
You are right that the method using Langrange multiplier includes all the dependencies between the probabilities and this leads to the canonical formulas for the maximal likelihoods.
But my proposal still stands because it avoids estimating (or guessing) the draw probability.
But my proposal still stands because it avoids estimating (or guessing) the draw probability.
Re: Binomial distribution for chess statistics
In light of the above comment I have to correct the derivation for multinomial (including trinomial) distribution.
Let X be a random variable following multinomial distribution. Then,
P(X=x;n,p)=n! * Product from k=1 to K of pk^xk / xk!
is the multinomial distribution. xi is the number of successes of the kth category in n random draws, where pk is the probability of success of the kth category. Note that,
Sum from k=1 to K of xk = n
Sum from k=1 to K of pk = 1
For the estimation problem, we have N samples X1,…,XN drawn independently from above multinomial distribution. The loglikelihood is given as
L(p,n) = Sum from i=1 to N of log(P(xi,n,p))
where
log(P(xi,n,p)) = log(n!/ Product_k of xik!) + Sum_k of xik*log(pk)
Sum_n of log⁡(P(xi,n,p)) = C + Sum_k of Nk*log⁡(pk)
where Nk = Sum_n of xik is the total number of successes of kth category in N samples.
For MLE estimate of p, assuming n is known, we solve the following optimization problem:
max by p of L(p,n) with the constraint on p ==> Sum_k of pk = 1
Using equality constraint for variable reduction,
pK = 1 − Sum from k=1 to K1 of pk
We have an unconstrained problem in K−1 variables. Compute gradient for stationary point as,
∂L(p,n)/∂pk = Nk/pk − NK/pK = 0
pk = Nk * pK / NK
Solving, with Sum from k=1 to K of pk = 1 gives MLE estimate for pk,
pk = Nk / (n*N)
Let X be a random variable following multinomial distribution. Then,
P(X=x;n,p)=n! * Product from k=1 to K of pk^xk / xk!
is the multinomial distribution. xi is the number of successes of the kth category in n random draws, where pk is the probability of success of the kth category. Note that,
Sum from k=1 to K of xk = n
Sum from k=1 to K of pk = 1
For the estimation problem, we have N samples X1,…,XN drawn independently from above multinomial distribution. The loglikelihood is given as
L(p,n) = Sum from i=1 to N of log(P(xi,n,p))
where
log(P(xi,n,p)) = log(n!/ Product_k of xik!) + Sum_k of xik*log(pk)
Sum_n of log⁡(P(xi,n,p)) = C + Sum_k of Nk*log⁡(pk)
where Nk = Sum_n of xik is the total number of successes of kth category in N samples.
For MLE estimate of p, assuming n is known, we solve the following optimization problem:
max by p of L(p,n) with the constraint on p ==> Sum_k of pk = 1
Using equality constraint for variable reduction,
pK = 1 − Sum from k=1 to K1 of pk
We have an unconstrained problem in K−1 variables. Compute gradient for stationary point as,
∂L(p,n)/∂pk = Nk/pk − NK/pK = 0
pk = Nk * pK / NK
Solving, with Sum from k=1 to K of pk = 1 gives MLE estimate for pk,
pk = Nk / (n*N)

 Posts: 522
 Joined: Sat Mar 25, 2006 7:27 pm
Re: Binomial distribution for chess statistics
Isn't this where we started 20 years ago, and then people moved to trinomial because a binomial representation of a trinomial process didn't give satisfactory results?lantonov wrote:As in every binomial distribution, we have only one parameter, win_probability, whose maximal likelihood, as found by the way outlined above for binomial distribution is:
ML(win_probability) = (draws + 2*wins) / (2*N)
Re: Binomial distribution for chess statistics
I am not aware of such old discussions. Looking at posts here in Talkchess, the oldest posts of 20072008 already talk about trinomial. People who turn to binomial advocate it in the form:Robert Pope wrote: Isn't this where we started 20 years ago, and then people moved to trinomial because a binomial representation of a trinomial process didn't give satisfactory results?
N! / wins! / losses! * p(wins)^wins * p(losses)^losses
excluding draws altogether.
I don't see why a model using only one MLE is worse than a model using 2 MLE. Intuitively, these 2 MLE will introduce a bigger error in the model in addition to the fact that for optimality we have to avoid including extra parameters.
Practical results can prove me wrong, however. If you remember some results or theory that speak contrary to the proposed model, I would be glad to direct me there.
Re: Binomial distribution for chess statistics
Another problem with the model might be that the sample size is 2N while the actual games are N. However, the terms that decrease the P value: (draws + 2*wins) and (draws + 2*losses) also increase proportionally.
Overall, granularity is increased 2 times which hopefully results in better precision if there isn't any theoretical flaw.
Overall, granularity is increased 2 times which hopefully results in better precision if there isn't any theoretical flaw.
Re: Binomial distribution for chess statistics
This is indeed similar to what is still used in the elo model, which defines the expected score of a player as:Robert Pope wrote:Isn't this where we started 20 years ago, and then people moved to trinomial because a binomial representation of a trinomial process didn't give satisfactory results?lantonov wrote:As in every binomial distribution, we have only one parameter, win_probability, whose maximal likelihood, as found by the way outlined above for binomial distribution is:
ML(win_probability) = (draws + 2*wins) / (2*N)
E(score) = p(wins) + 0.5 p(draws) + 0 p(losses).
(I explicitly write the 0 p(losses) to show the trinomial behind the formula).
I let as an exercise for people to compute the variance of the mean score, and to see what happens for different draw rates and do the same on the Lyudmil Antonov's win_probablity with a binomial distribution.
Richard Delorme
Re: Binomial distribution for chess statistics
Let's compare the 2 models for the Likelihood Ratio Test with the following example:
n = 72772.; w = 12970.; l = 12790.; d = 47012.;
H0 is w=l
Trinomial distribution with draw probability MLE taken from the sample (ignoring the constant coefficients):
pw = w/n; pd = d/n;
L1 = pw^w*pd^d*(1  pw  pd)^l;
L0 = ((1  pd)/2)^w*pd^d*((1  pd)/2)^l;
L0/L1 = 0.53318484952
2*log(L0/L1) = 1.2577742107
Binomial distribution as above:
pw = (d + 2*w)/2/n;
L1 = pw^(d + 2*w)*(1  pw)^(d + 2*l);
L0 = 0.5^(d + 2*w)*0.5^(d + 2*l);
L0/L1 = 0.6406790561
2*Log[L0/L1] = 0.8904532798
The test with trinomial distribution appears to be more powerful.
n = 72772.; w = 12970.; l = 12790.; d = 47012.;
H0 is w=l
Trinomial distribution with draw probability MLE taken from the sample (ignoring the constant coefficients):
pw = w/n; pd = d/n;
L1 = pw^w*pd^d*(1  pw  pd)^l;
L0 = ((1  pd)/2)^w*pd^d*((1  pd)/2)^l;
L0/L1 = 0.53318484952
2*log(L0/L1) = 1.2577742107
Binomial distribution as above:
pw = (d + 2*w)/2/n;
L1 = pw^(d + 2*w)*(1  pw)^(d + 2*l);
L0 = 0.5^(d + 2*w)*0.5^(d + 2*l);
L0/L1 = 0.6406790561
2*Log[L0/L1] = 0.8904532798
The test with trinomial distribution appears to be more powerful.
Re: Binomial distribution for chess statistics
For the variance of the mean, we have to find first the variance and then divide it by N.abulmo2 wrote: I let as an exercise for people to compute the variance of the mean score, and to see what happens for different draw rates and do the same on the Lyudmil Antonov's win_probablity with a binomial distribution.
For the binomial distribution, this is easy to do:
Variance = N*p(win)*p(loss) = N*p(win)*(1p(win))
Variance of the mean = p(win)*p(loss) = p(win)*(1p(win))
For the trinomial distribution, we have to take into account both variances and covariances into a covariance matrix, so direct comparison with binomial is difficult. By your saying to compare for different draw rates, I suppose that you have in mind the socalled generalised variance for multinomial distribution:
N^(k1)*Product of i=1 to k of p_i where k = 3 for trinomial
So for trinomial, generalised variance will be:
N^2*p(win)*p(draw)*p(loss) = N^2*p(win)*p(draw)*(1p(win)p(draw))
Variance of the mean will be N*p(win)*p(draw)*(1p(win)p(draw))
Looking at the actual numbers these still seem incomparable to me though.
Note that variances do not depend explicitly on the random variables: (draw+2*win) and (draw+2*loss) for the proposed binomial distribution.