computing elo of multiple chess engines

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

computing elo of multiple chess engines

Post by brtzsnr »

Hi, all!

Let's say I have a list of results between n engines like this (first engine, second engine, results, count,):

engine1 engine2 1-0 10
engine1 engine2 0-1 13
engine2 engine9 1/2-1/2 20


What is the simplest way to compute the ELO (or similar performance rating) of the engines? I need a formula / method, I don't want to run through programs such as Ordo, EloStat or BayesElo.

Regards,
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: computing elo of multiple chess engines

Post by kbhearn »

one simple option:

1) start engines at any initial rating state
2) calculate performance rating for each engine based on these initial ratings - formulas for doing so can be found at https://en.wikipedia.org/wiki/Elo_rating_system
3) adjust each rating by moving them closer to their performance rating
4) repeat steps 2-3 until you reach a steady state
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Computing Elo of multiple chess engines.

Post by Ajedrecista »

Hello Alexandru:
brtzsnr wrote:Hi, all!

Let's say I have a list of results between n engines like this (first engine, second engine, results, count,):

engine1 engine2 1-0 10
engine1 engine2 0-1 13
engine2 engine9 1/2-1/2 20


What is the simplest way to compute the ELO (or similar performance rating) of the engines? I need a formula / method, I don't want to run through programs such as Ordo, EloStat or BayesElo.

Regards,
The first, easisest and obvious thing is compute the number of points and games that each engine played. In your example: engine 1 = 10/23 (10 points out of 23 games), engine 2 = 23/43, engine 9 = 10/20. Of course: 1 point for each win, 0.5 points for each draw and 0 points for each lose.

The next step is the difficult one. People will surely differ from me. I wrote my own rating calculator for Round-Robin tournaments but your example is not a Round-Robin. Here I go with my own method, which gives very similar results than EloSTAT in a Round-Robin:

Code: Select all

i: each one of the n engines. i = 1, 2, ..., n.

delta(i) = 400*log10{points(i)/[games(i) - points(i)]}
In a Round-Robin: games(i) = constant for each engine.

average_delta = (1/n)*SUM[delta(i); i = 1, ..., n]

average_delta_of_opponents(i) = [average_delta - delta(i)]/(n - 1)
rating(i) = average_delta_of_opponents(i) + delta(i)

If you want that the mean of all these ratings is X, then you must add X - average_delta to each rating(i).
IMHO average_delta_of_opponents(i) is useful in a Round-Robin because each engine plays the rest of engines and the same number of engines.

When you say 'I have a list of results between n engines like this (first engine, second engine, results, count)', I understand that it is a Round-Robin. Then this method should be valid.

If not, like your example, then average_delta_of_opponents(i) has much less sense than before. Anyway, following blindly my method:

Code: Select all

delta(1) = 400*log10(10/13) ~ -45.58
delta(2) = 400*log10(23/20) ~ 24.28
delta(9) = 400*log10(10/10) = 0

average_delta ~ (-45.58 + 24.28 + 0)/3 = -7.1

average_delta_of_opponents(1) ~ [-7.1 - (-45.58)]/(3 - 1) = 19.24
average_delta_of_opponents(2) ~ (-7.1 - 24.28)/(3 - 1) = -15.69
average_delta_of_opponents(9) ~ (-7.1 - 0)/(3 - 1) = -3.55

rating(1) ~ 19.24 - 45.58 = -26.34
rating(2) ~ -15.69 + 24.28 = 8.59
rating(9) ~ -3.55 + 0 = -3.55

If you want a mean of ratings of 2000 Elo then you must add 2000 - (-7.1) = 2007.1:

rating'(1) = 2007.1 - 26.34 = 1980.76 Elo
rating'(2) = 2007.1 +  8.59 = 2015.69 Elo
rating'(9) = 2007.1 -  3.55 = 2003.55 Elo

Mean: (1980.76 + 2015.69 + 2003.55)/3 = 6000/3 = 2000 Elo
I hope no typos. This method does not provide error bars, as you will see. Has it sense a difference of 2015.69 - 1980.76 = 34.93 Elo between the best and the worst engine of your example?

Kevin's suggestion may be better, iterating until a convergence is reached. OTOH, I find my method very simple and with around ten lines of code in the rating calculation, it is almost a mirror when compared to EloSTAT in Round-Robin tournaments. You can download it with other programmes from here:

Six Fortran 95 tools.rar (686.26 KB)

The name of the programme is telltale: Rating_performances_for_Round_Robin_tournaments.

Regards from Spain.

Ajedrecista.