Binomial distribution for chess statistics
Posted: Fri Mar 03, 2017 5:45 pm
In statistical tests for chess results (SPRT, Elo and LOS calculations, etc.), most people naturally choose trinomial distribution (wins, draws, losses) or some normal approximation to it. Trinomial distribution has two population parameters (probabilities): win probability and draw probability. For example, in the following formula for trinomial distribution:
Probability(wins, draws, losses) = N! / wins! / draws! / losses! * win_probability^wins * draw_probability^draws * (1 - win_probability - draw_probability)^losses
win_probability and draw_probability are the respective probabilities in the population (all possible chess games). In practice, the true win_probability and draw_probability can never be known as the number of possible chess games is huge. In such cases, which are common across all fields, the population probabilities are substituted with maximal likelihoods which are essentially the best estimates of the population probabilities derived from a sample (i.e. probability statistics).
With binomial distribution,
Probability(successes, failures) = N! / r! / (1-r)! * p^r * (1-p)^(1-r)
it is easy to show that the maximal likelihood for its single parameter (p) is r / N where r is the number of successes (1) and N is the sample size. Namely, we write its likelihood function (with Cr = N! / r! / (1-r)!):
L(p | r) = Product from i=1 to N of Cri * p^ri * (1-p)^(1-ri) = N * Cr * p^(Sum from i=1 to N of ri) * (1-p)^(Sum from i=1 to N of (1-ri)) = N * Cr * p^r * (1-p)^(N-r)
First, we note that according to the Law of Likelihood, if we have two likelihood functions L1, L2 and L1 = k * L2 where k is a coefficient constant wrt p, these two likelihood functions are equivalent. Therefore, if we are comparing different values of p using the same likelihood function, the leading term (N*Cr) becomes irrelevant and can be dropped. But let's leave it for now to see more explicitly how it drops out.
We can take the log (natural) of the likelihood function:
log(N * Cr * p^r * (1-p)^(N-r)) = log(N * Cr) + r * log(p) + (N-r) * log(1-p)
Then we take the derivative wrt p of this and set to 0:
d/dp of log(N * Cr) + r * log(p) + (N-r) * log(1-p) = r/p - (N-r)/(1-p) = 0
From the last equality we derive p = r / N. QED
Let's try the same procedure for trinomial distribution. With Cr = N! / wins! / draws! / losses! the likelihood function is:
L(win_probability, draw_probability | wins, draws) = N * Cr * win_probability^wins * draw_probability^draws * (1 - win_probability - draw_probability)^(N - wins - draws)
Logarithming we obtain:
log(N * Cr) + wins * log(win_probability) + draws * log(draw_probability) + (N - wins - draws) * log(1 - win_probability - draw_probability)
The total derivative of the likelihood function wrt win_probability is:
DL(win_probability, draw_probability) / D(win_probability) = dL / d(win_probability) + dL / d(draw_probability) * D(draw_probability) / D(win_probability)
where D denotes a total derivative and d denotes a partial derivative.
We do not have any idea if the draw_probability depends on the win_probability. In other words, we do not know if the term D(draw_probability) / D(win_probability) is different from zero or not. Intuition tells that the two are dependent: a player who is likely to win is also more likely to draw his opponent and vice versa. For the sake of argument, however, let's assume that the two are independent, i.e. D(draw_probability) / D(win_probability) = 0. Then the above formula simplifies to DL / D(win_probability) = dL / d(win_probability).
Then
dL / d(win_probability) = wins / win_probability - (N-wins-draws) / (1 - win_probability - draw_probability)
from where we obtain
ML(win_probability) = (wins - draw_probability * wins) / (N - draws)
and similarly:
ML(draw_probability) = (draws - win_probability * draws) / (N - wins)
We see that the maximal likelihoods for win_probability and draw_probability in all cases, even with the far-fetched assumption that the two probabilities are independent, are much different than what is commonly thought, i.e. ML(win_probability) is not wins / N and ML(draw_probability) is not draws / N.
In an effort to alleviate the problem with this "pesky" draw_probability, as it is perceived, some recommend (https://chessprogramming.wikispaces.com ... Statistics ) determining the draw probability from a large number of games played, e.g. from rating sites and online chess playing sites. However, the draw rate (draws / N) is different for different time controls, for opponents with different strengths, for humans and for engines, and a whole host of other conditions. For the Bayesian treatment (BayesElo) similarly the drawelo is determined from a draw rate. We see that these estimates are to a large extent arbitrary and cannot serve for maximum likelihood estimates of the respective probabilities. This shows that the underlying model is wrong in not taking into account the true relationships between the two parameters.
Hereby I propose another model that naturally leads to binomial distribution without any assumptions. It is based not on results (+, =, -) but on points such as the usual point system adopted in chess. The sample unit for this model is half-point. In each game the players strive to win 2 half-points. A won game is 2 half-points to the winner, a draw is 1 half-point each and a loss is 0 half-points to the loser. The score for each player is just the sum of his half-points. So, if a result from 11 games is +3 =6 -2, for the winning player this is 12 half-points (6 + 3*2), and for the losing player it is 10 half-points (6 + 2*2) from a total of 22 half-points (11*2). This naturally leads to the binomial distribution:
Probability(draws+2*wins, draws+2*losses) = (2*N)! / (draws + 2*wins)! / (draws + 2*losses)! * win_probability^(draws + 2*wins) * (1 - win_probability)^(draws + 2*losses)
As in every binomial distribution, we have only one parameter, win_probability, whose maximal likelihood, as found by the way outlined above for binomial distribution is:
ML(win_probability) = (draws + 2*wins) / (2*N)
Probability(wins, draws, losses) = N! / wins! / draws! / losses! * win_probability^wins * draw_probability^draws * (1 - win_probability - draw_probability)^losses
win_probability and draw_probability are the respective probabilities in the population (all possible chess games). In practice, the true win_probability and draw_probability can never be known as the number of possible chess games is huge. In such cases, which are common across all fields, the population probabilities are substituted with maximal likelihoods which are essentially the best estimates of the population probabilities derived from a sample (i.e. probability statistics).
With binomial distribution,
Probability(successes, failures) = N! / r! / (1-r)! * p^r * (1-p)^(1-r)
it is easy to show that the maximal likelihood for its single parameter (p) is r / N where r is the number of successes (1) and N is the sample size. Namely, we write its likelihood function (with Cr = N! / r! / (1-r)!):
L(p | r) = Product from i=1 to N of Cri * p^ri * (1-p)^(1-ri) = N * Cr * p^(Sum from i=1 to N of ri) * (1-p)^(Sum from i=1 to N of (1-ri)) = N * Cr * p^r * (1-p)^(N-r)
First, we note that according to the Law of Likelihood, if we have two likelihood functions L1, L2 and L1 = k * L2 where k is a coefficient constant wrt p, these two likelihood functions are equivalent. Therefore, if we are comparing different values of p using the same likelihood function, the leading term (N*Cr) becomes irrelevant and can be dropped. But let's leave it for now to see more explicitly how it drops out.
We can take the log (natural) of the likelihood function:
log(N * Cr * p^r * (1-p)^(N-r)) = log(N * Cr) + r * log(p) + (N-r) * log(1-p)
Then we take the derivative wrt p of this and set to 0:
d/dp of log(N * Cr) + r * log(p) + (N-r) * log(1-p) = r/p - (N-r)/(1-p) = 0
From the last equality we derive p = r / N. QED
Let's try the same procedure for trinomial distribution. With Cr = N! / wins! / draws! / losses! the likelihood function is:
L(win_probability, draw_probability | wins, draws) = N * Cr * win_probability^wins * draw_probability^draws * (1 - win_probability - draw_probability)^(N - wins - draws)
Logarithming we obtain:
log(N * Cr) + wins * log(win_probability) + draws * log(draw_probability) + (N - wins - draws) * log(1 - win_probability - draw_probability)
The total derivative of the likelihood function wrt win_probability is:
DL(win_probability, draw_probability) / D(win_probability) = dL / d(win_probability) + dL / d(draw_probability) * D(draw_probability) / D(win_probability)
where D denotes a total derivative and d denotes a partial derivative.
We do not have any idea if the draw_probability depends on the win_probability. In other words, we do not know if the term D(draw_probability) / D(win_probability) is different from zero or not. Intuition tells that the two are dependent: a player who is likely to win is also more likely to draw his opponent and vice versa. For the sake of argument, however, let's assume that the two are independent, i.e. D(draw_probability) / D(win_probability) = 0. Then the above formula simplifies to DL / D(win_probability) = dL / d(win_probability).
Then
dL / d(win_probability) = wins / win_probability - (N-wins-draws) / (1 - win_probability - draw_probability)
from where we obtain
ML(win_probability) = (wins - draw_probability * wins) / (N - draws)
and similarly:
ML(draw_probability) = (draws - win_probability * draws) / (N - wins)
We see that the maximal likelihoods for win_probability and draw_probability in all cases, even with the far-fetched assumption that the two probabilities are independent, are much different than what is commonly thought, i.e. ML(win_probability) is not wins / N and ML(draw_probability) is not draws / N.
In an effort to alleviate the problem with this "pesky" draw_probability, as it is perceived, some recommend (https://chessprogramming.wikispaces.com ... Statistics ) determining the draw probability from a large number of games played, e.g. from rating sites and online chess playing sites. However, the draw rate (draws / N) is different for different time controls, for opponents with different strengths, for humans and for engines, and a whole host of other conditions. For the Bayesian treatment (BayesElo) similarly the drawelo is determined from a draw rate. We see that these estimates are to a large extent arbitrary and cannot serve for maximum likelihood estimates of the respective probabilities. This shows that the underlying model is wrong in not taking into account the true relationships between the two parameters.
Hereby I propose another model that naturally leads to binomial distribution without any assumptions. It is based not on results (+, =, -) but on points such as the usual point system adopted in chess. The sample unit for this model is half-point. In each game the players strive to win 2 half-points. A won game is 2 half-points to the winner, a draw is 1 half-point each and a loss is 0 half-points to the loser. The score for each player is just the sum of his half-points. So, if a result from 11 games is +3 =6 -2, for the winning player this is 12 half-points (6 + 3*2), and for the losing player it is 10 half-points (6 + 2*2) from a total of 22 half-points (11*2). This naturally leads to the binomial distribution:
Probability(draws+2*wins, draws+2*losses) = (2*N)! / (draws + 2*wins)! / (draws + 2*losses)! * win_probability^(draws + 2*wins) * (1 - win_probability)^(draws + 2*losses)
As in every binomial distribution, we have only one parameter, win_probability, whose maximal likelihood, as found by the way outlined above for binomial distribution is:
ML(win_probability) = (draws + 2*wins) / (2*N)