Hi, all!

Let's say I have a list of results between n engines like this (first engine, second engine, results, count,):

engine1 engine2 1-0 10

engine1 engine2 0-1 13

engine2 engine9 1/2-1/2 20

What is the simplest way to compute the ELO (or similar performance rating) of the engines? I need a formula / method, I don't want to run through programs such as Ordo, EloStat or BayesElo.

Regards,

## computing elo of multiple chess engines

**Moderators:** bob, hgm, Harvey Williamson

**Forum rules**

This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

### Re: computing elo of multiple chess engines

one simple option:

1) start engines at any initial rating state

2) calculate performance rating for each engine based on these initial ratings - formulas for doing so can be found at https://en.wikipedia.org/wiki/Elo_rating_system

3) adjust each rating by moving them closer to their performance rating

4) repeat steps 2-3 until you reach a steady state

1) start engines at any initial rating state

2) calculate performance rating for each engine based on these initial ratings - formulas for doing so can be found at https://en.wikipedia.org/wiki/Elo_rating_system

3) adjust each rating by moving them closer to their performance rating

4) repeat steps 2-3 until you reach a steady state

- Ajedrecista
**Posts:**1443**Joined:**Wed Jul 13, 2011 7:04 pm**Location:**Madrid, Spain.-
**Contact:**

### Re: Computing Elo of multiple chess engines.

Hello Alexandru:

The next step is the difficult one. People will surely differ from me. I wrote my own rating calculator for Round-Robin tournaments but your example is not a Round-Robin. Here I go with my own method, which gives very similar results than EloSTAT in a Round-Robin:

IMHO average_delta_of_opponents(i) is useful in a Round-Robin because each engine plays the rest of engines and the same number of engines.

When you say 'I have a list of results between n engines like this (first engine, second engine, results, count)', I understand that it is a Round-Robin. Then this method should be valid.

If not, like your example, then average_delta_of_opponents(i) has much less sense than before. Anyway, following blindly my method:

I hope no typos. This method does not provide error bars, as you will see. Has it sense a difference of 2015.69 - 1980.76 = 34.93 Elo between the best and the worst engine of your example?

Kevin's suggestion may be better, iterating until a convergence is reached. OTOH, I find my method very simple and with around ten lines of code in the rating calculation, it is almost a mirror when compared to EloSTAT in Round-Robin tournaments. You can download it with other programmes from here:

Six Fortran 95 tools.rar (686.26 KB)

The name of the programme is telltale: Rating_performances_for_Round_Robin_tournaments.

Regards from Spain.

Ajedrecista.

The first, easisest and obvious thing is compute the number of points and games that each engine played. In your example: engine 1 = 10/23 (10 points out of 23 games), engine 2 = 23/43, engine 9 = 10/20. Of course: 1 point for each win, 0.5 points for each draw and 0 points for each lose.brtzsnr wrote:Hi, all!

Let's say I have a list of results between n engines like this (first engine, second engine, results, count,):

engine1 engine2 1-0 10

engine1 engine2 0-1 13

engine2 engine9 1/2-1/2 20

What is the simplest way to compute the ELO (or similar performance rating) of the engines? I need a formula / method, I don't want to run through programs such as Ordo, EloStat or BayesElo.

Regards,

The next step is the difficult one. People will surely differ from me. I wrote my own rating calculator for Round-Robin tournaments but your example is not a Round-Robin. Here I go with my own method, which gives very similar results than EloSTAT in a Round-Robin:

Code: Select all

```
i: each one of the n engines. i = 1, 2, ..., n.
delta(i) = 400*log10{points(i)/[games(i) - points(i)]}
In a Round-Robin: games(i) = constant for each engine.
average_delta = (1/n)*SUM[delta(i); i = 1, ..., n]
average_delta_of_opponents(i) = [average_delta - delta(i)]/(n - 1)
rating(i) = average_delta_of_opponents(i) + delta(i)
If you want that the mean of all these ratings is X, then you must add X - average_delta to each rating(i).
```

When you say 'I have a list of results between n engines like this (first engine, second engine, results, count)', I understand that it is a Round-Robin. Then this method should be valid.

If not, like your example, then average_delta_of_opponents(i) has much less sense than before. Anyway, following blindly my method:

Code: Select all

```
delta(1) = 400*log10(10/13) ~ -45.58
delta(2) = 400*log10(23/20) ~ 24.28
delta(9) = 400*log10(10/10) = 0
average_delta ~ (-45.58 + 24.28 + 0)/3 = -7.1
average_delta_of_opponents(1) ~ [-7.1 - (-45.58)]/(3 - 1) = 19.24
average_delta_of_opponents(2) ~ (-7.1 - 24.28)/(3 - 1) = -15.69
average_delta_of_opponents(9) ~ (-7.1 - 0)/(3 - 1) = -3.55
rating(1) ~ 19.24 - 45.58 = -26.34
rating(2) ~ -15.69 + 24.28 = 8.59
rating(9) ~ -3.55 + 0 = -3.55
If you want a mean of ratings of 2000 Elo then you must add 2000 - (-7.1) = 2007.1:
rating'(1) = 2007.1 - 26.34 = 1980.76 Elo
rating'(2) = 2007.1 + 8.59 = 2015.69 Elo
rating'(9) = 2007.1 - 3.55 = 2003.55 Elo
Mean: (1980.76 + 2015.69 + 2003.55)/3 = 6000/3 = 2000 Elo
```

Kevin's suggestion may be better, iterating until a convergence is reached. OTOH, I find my method very simple and with around ten lines of code in the rating calculation, it is almost a mirror when compared to EloSTAT in Round-Robin tournaments. You can download it with other programmes from here:

Six Fortran 95 tools.rar (686.26 KB)

The name of the programme is telltale: Rating_performances_for_Round_Robin_tournaments.

Regards from Spain.

Ajedrecista.