Hi, all!
Let's say I have a list of results between n engines like this (first engine, second engine, results, count,):
engine1 engine2 1-0 10
engine1 engine2 0-1 13
engine2 engine9 1/2-1/2 20
What is the simplest way to compute the ELO (or similar performance rating) of the engines? I need a formula / method, I don't want to run through programs such as Ordo, EloStat or BayesElo.
Regards,
computing elo of multiple chess engines
Moderators: hgm, Rebel, chrisw
-
- Posts: 411
- Joined: Thu Dec 30, 2010 4:48 am
Re: computing elo of multiple chess engines
one simple option:
1) start engines at any initial rating state
2) calculate performance rating for each engine based on these initial ratings - formulas for doing so can be found at https://en.wikipedia.org/wiki/Elo_rating_system
3) adjust each rating by moving them closer to their performance rating
4) repeat steps 2-3 until you reach a steady state
1) start engines at any initial rating state
2) calculate performance rating for each engine based on these initial ratings - formulas for doing so can be found at https://en.wikipedia.org/wiki/Elo_rating_system
3) adjust each rating by moving them closer to their performance rating
4) repeat steps 2-3 until you reach a steady state
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Computing Elo of multiple chess engines.
Hello Alexandru:
The next step is the difficult one. People will surely differ from me. I wrote my own rating calculator for Round-Robin tournaments but your example is not a Round-Robin. Here I go with my own method, which gives very similar results than EloSTAT in a Round-Robin:
IMHO average_delta_of_opponents(i) is useful in a Round-Robin because each engine plays the rest of engines and the same number of engines.
When you say 'I have a list of results between n engines like this (first engine, second engine, results, count)', I understand that it is a Round-Robin. Then this method should be valid.
If not, like your example, then average_delta_of_opponents(i) has much less sense than before. Anyway, following blindly my method:
I hope no typos. This method does not provide error bars, as you will see. Has it sense a difference of 2015.69 - 1980.76 = 34.93 Elo between the best and the worst engine of your example?
Kevin's suggestion may be better, iterating until a convergence is reached. OTOH, I find my method very simple and with around ten lines of code in the rating calculation, it is almost a mirror when compared to EloSTAT in Round-Robin tournaments. You can download it with other programmes from here:
Six Fortran 95 tools.rar (686.26 KB)
The name of the programme is telltale: Rating_performances_for_Round_Robin_tournaments.
Regards from Spain.
Ajedrecista.
The first, easisest and obvious thing is compute the number of points and games that each engine played. In your example: engine 1 = 10/23 (10 points out of 23 games), engine 2 = 23/43, engine 9 = 10/20. Of course: 1 point for each win, 0.5 points for each draw and 0 points for each lose.brtzsnr wrote:Hi, all!
Let's say I have a list of results between n engines like this (first engine, second engine, results, count,):
engine1 engine2 1-0 10
engine1 engine2 0-1 13
engine2 engine9 1/2-1/2 20
What is the simplest way to compute the ELO (or similar performance rating) of the engines? I need a formula / method, I don't want to run through programs such as Ordo, EloStat or BayesElo.
Regards,
The next step is the difficult one. People will surely differ from me. I wrote my own rating calculator for Round-Robin tournaments but your example is not a Round-Robin. Here I go with my own method, which gives very similar results than EloSTAT in a Round-Robin:
Code: Select all
i: each one of the n engines. i = 1, 2, ..., n.
delta(i) = 400*log10{points(i)/[games(i) - points(i)]}
In a Round-Robin: games(i) = constant for each engine.
average_delta = (1/n)*SUM[delta(i); i = 1, ..., n]
average_delta_of_opponents(i) = [average_delta - delta(i)]/(n - 1)
rating(i) = average_delta_of_opponents(i) + delta(i)
If you want that the mean of all these ratings is X, then you must add X - average_delta to each rating(i).
When you say 'I have a list of results between n engines like this (first engine, second engine, results, count)', I understand that it is a Round-Robin. Then this method should be valid.
If not, like your example, then average_delta_of_opponents(i) has much less sense than before. Anyway, following blindly my method:
Code: Select all
delta(1) = 400*log10(10/13) ~ -45.58
delta(2) = 400*log10(23/20) ~ 24.28
delta(9) = 400*log10(10/10) = 0
average_delta ~ (-45.58 + 24.28 + 0)/3 = -7.1
average_delta_of_opponents(1) ~ [-7.1 - (-45.58)]/(3 - 1) = 19.24
average_delta_of_opponents(2) ~ (-7.1 - 24.28)/(3 - 1) = -15.69
average_delta_of_opponents(9) ~ (-7.1 - 0)/(3 - 1) = -3.55
rating(1) ~ 19.24 - 45.58 = -26.34
rating(2) ~ -15.69 + 24.28 = 8.59
rating(9) ~ -3.55 + 0 = -3.55
If you want a mean of ratings of 2000 Elo then you must add 2000 - (-7.1) = 2007.1:
rating'(1) = 2007.1 - 26.34 = 1980.76 Elo
rating'(2) = 2007.1 + 8.59 = 2015.69 Elo
rating'(9) = 2007.1 - 3.55 = 2003.55 Elo
Mean: (1980.76 + 2015.69 + 2003.55)/3 = 6000/3 = 2000 Elo
Kevin's suggestion may be better, iterating until a convergence is reached. OTOH, I find my method very simple and with around ten lines of code in the rating calculation, it is almost a mirror when compared to EloSTAT in Round-Robin tournaments. You can download it with other programmes from here:
Six Fortran 95 tools.rar (686.26 KB)
The name of the programme is telltale: Rating_performances_for_Round_Robin_tournaments.
Regards from Spain.
Ajedrecista.