I've written an informal paper titled "Elo estimation using quasi-Monte Carlo integration". Here is the introduction section:

The entire document can be found here: https://docs.google.com/document/d/11-J ... sp=sharing. I'd like to hear your comments or questions...First, an introduction. In June 2014 there was a mild controversy about Stockish 5 entering the IPON rating list at #2, as scored by Bayeselo, behind Houdini 4. On the other hand, Ordo and Elostat put SF at #1. There was a Talkchess thread named “Empirically 1 win + 1 loss ~ 2 draws” that discussed the finer points of Elo estimation. Which program had it right?

I've always thought none of the above mentioned Elo estimation programs were entirely correct (to be fair, their authors themselves never claimed otherwise), and I've found their approach a bit "dirty" in some respects (granted, the problem seems to be notoriously hard to solve in a "clean" way). In particular, I was quite certain that 1 win + 1 loss is not equal to 2 draws, except perhaps as a rough approximation. So, the event inspired me to try and devise an algorithm which could yield more correct estimates. By August, I had the answer…

In chess, Elo estimation is the problem of determining the relative ratings of players given their respective performances in an n-player tournament. In a two-player match, given its final score, an exact Elo estimate can be obtained by integration over all possible prior probabilities of winning, drawing and losing. Of course, there are two degrees of freedom here - once e.g. pw and pl (prior probabilities of winning and losing) are fixed, pd must be equal to 1-pw-pl, so a double integral suffices. For tournaments with more than two players, the problem is much more difficult. One of the reasons is dimensionality, which is equal to N + M, where N is the number of players in a tournament, while M is the number of matches (pairings) between these players. Since these integrals do not have a closed form, even for comparably small tournaments numerical integration becomes impossible. Monte Carlo integration - or quasi-Monte Carlo integration (QMCI) - offers a way to work around the infamous "curse of dimensionality".

The basic properties of the QMCI algorithm are the following:This document supplies the full working source code and explains the way it works. In the process, I’m going to say a few words about the underlying theory, which may or may not be new, but to my knowledge most of it hasn’t been discussed in online sources. A number of concrete examples are also provided. Interestingly, some results diverge from those of current Elo estimation programs. I'm hoping this document might stimulate some new thinking and better algorithms in the future.

- The algorithm relies on an assumption that prior probabilities of winning, losing and drawing a game between any two players are uniformly distributed.

Hence, prior expected score (s = pw+pd/2) of any given player against the other has a triangular distribution.

For all participants, QMCI calculates the expected Elo, the sampling error, as well as the 95% credible interval.

The QMCI algorithm relies on a large number of (quasi-)randomly generated samples to calculate the expected Elo. Sampling error can be made smaller by increasing the number of iterations.

The calculation takes roughly on the order of minutes, even for rather simple examples.

While MC is necessarily a probabilistic technique, QMC uses quasi-random sequences, not actual random numbers, so Elo estimates are exactly reproducible over repeated runs, and depend only on the number of iterations.

There are no pre-set "draw rates", "draw models", or any kind of special-case draw treatment. The prior probability of drawing is uniform, and is treated in exactly the same way as probability of winning or losing.

The algorithm ignores color, i.e. the advantage of playing first. I believe enhancing it with "color-awareness" shouldn't be too difficult.

The maximum problem size (dimensionality) is unfortunately still very limited, so the QMCI algorithm in its present form is far from replacing any of the above mentioned Elo estimation programs.