Calculating an engines elo rating?

PeterO · Post by **PeterO** » Sat Jan 04, 2020 11:42 pm

Hi friends,

I read Ferdy‘s tests on talkchess - which engine (uci setting 1500 elo) plays THE MOST like a human FIDE elo 1500
Result: Rodent IV. So for my tests this will be the reference engine.

I never did an engine tournamen/elo calculation.

Questions:
1. What is THE MOST SIMPLE tool to calculate the elo of some engines? When I have the pgn
I want to set ONE engine with a FIX elo number (Rodent IV - 1500 with elo 1500) and calculate the elo number of the other engines. If this is possible I would like to FIX some other engines elo later too. I am using Windows 10.
2. How much games are necessary for a hobby tester to get a valid result?
3. Which time setting is a good one - 5 minutes/game?

Peter

mwyoung · Post by **mwyoung** » Sun Jan 05, 2020 1:49 am

PeterO wrote: ↑Sat Jan 04, 2020 11:42 pm Hi friends,

I read Ferdy‘s tests on talkchess - which engine (uci setting 1500 elo) plays THE MOST like a human FIDE elo 1500
Result: Rodent IV. So for my tests this will be the reference engine.

I never did an engine tournamen/elo calculation.

Questions:
1. What is THE MOST SIMPLE tool to calculate the elo of some engines? When I have the pgn
I want to set ONE engine with a FIX elo number (Rodent IV - 1500 with elo 1500) and calculate the elo number of the other engines. If this is possible I would like to FIX some other engines elo later too. I am using Windows 10.
2. How much games are necessary for a hobby tester to get a valid result?
3. Which time setting is a good one - 5 minutes/game?

Peter

1. For me the simplest tool is the Chessbase GUI. If you do not like chessbase try this one. https://www.remi-coulom.fr/Bayesian-Elo/
2. It depends on what you are trying to achieve. If you want to know with the highest rating resolution for the programs tested. You can never have too many games. If you just need to know with a high level of certainty what program is better. This is variable depending on the strength difference between the programs. In other words 2 programs that are far apart in strength will take a lot less games to know with one is better. If the 2 programs are very close in strength it can take 100's to 1000's of games to achieve a high confidence of what program is better.
3. You are not limited to any one time control. If you want to know the 5 min ratings save them to a PGN. And If you want to know the rating at 1 min a game, or 10 mins a game or any time control. Just save those games to their own PGN file.

PeterO · Post by **PeterO** » Sun Jan 05, 2020 10:10 am

Hi,

thanks for your answer.

I have Chessbase 13 + Fritz 13.

1. How can I set FIX Elo points for ONE engine? And calculate all other engines from this?

Peter

Kotlov · Post by **Kotlov** » Sun Jan 05, 2020 4:05 pm

PeterO wrote: ↑Sat Jan 04, 2020 11:42 pm Result: Rodent IV. So for my tests this will be the reference engine.

Depend on hardware and time control.

PK · Post by PK » Sun Jan 05, 2020 5:21 pm

Depend on hardware and time control.

Depends on time control and opponent's approach to weakening. Weak levels of Rodent run at low nodes per second, therefore should be the same across wide range of hardware (as long as you don't run it on a washing machine). But if other engine approximates elo by fixed nodes search, all comparisons are meaningless.

mwyoung · Post by **mwyoung** » Sun Jan 05, 2020 5:25 pm

PeterO wrote: ↑Sun Jan 05, 2020 10:10 am Hi,

thanks for your answer.

I have Chessbase 13 + Fritz 13.

1. How can I set FIX Elo points for ONE engine? And calculate all other engines from this?

Peter

In the chess base gui after you have the games played. You have the chess base gui calculate the ratings. Then you will have the option to gauge the rating to one engine as your fix point.

Ovyron · Post by **Ovyron** » Sun Jan 05, 2020 6:06 pm

PeterO wrote: ↑Sun Jan 05, 2020 10:10 am 1. How can I set FIX Elo points for ONE engine? And calculate all other engines from this?

Just add or subtract a number from all ratings and you'll get what you want. For instance, if you end with Rodent IV 2500 ELO, subtract 1000 from the rating of all participants and you're done.

Kotlov · Post by **Kotlov** » Sun Jan 05, 2020 8:40 pm

PK wrote: ↑Sun Jan 05, 2020 5:21 pm
Depend on hardware and time control.
Depends on time control and opponent's approach to weakening. Weak levels of Rodent run at low nodes per second, therefore should be the same across wide range of hardware (as long as you don't run it on a washing machine). But if other engine approximates elo by fixed nodes search, all comparisons are meaningless.

I'm not mean exactly Rodent.
Playing strength is relative value, not absolute.
What is 0 Elo in your measuring?

Ferdy · Post by **Ferdy** » Mon Jan 06, 2020 4:14 am

PeterO wrote: ↑Sat Jan 04, 2020 11:42 pm Hi friends,

I read Ferdy‘s tests on talkchess - which engine (uci setting 1500 elo) plays THE MOST like a human FIDE elo 1500
Result: Rodent IV. So for my tests this will be the reference engine.

I never did an engine tournamen/elo calculation.

Questions:
1. What is THE MOST SIMPLE tool to calculate the elo of some engines? When I have the pgn
I want to set ONE engine with a FIX elo number (Rodent IV - 1500 with elo 1500) and calculate the elo number of the other engines. If this is possible I would like to FIX some other engines elo later too. I am using Windows 10.

You can use ordo at https://github.com/michiguel/Ordo/releases
Download ordo-1.2.6-win.zip, and unzip it.
There is a sample batch file inside named ordo_example.bat, you can modify/edit it and use the following command lines.

Code: Select all

ordo-win64.exe -p elo_1500_test.pgn -g group_connection.txt
ordo-win64.exe -p elo_1500_test.pgn -o ordo_summary.txt -j ordo_head_to_head.txt -W -a 1500 -A "Rodent IV 022 Elo 1500" -G -s 100 -E
pause

There is elo_1500_test.pgn, If you have other filename, replace it.

There is also "Rodent IV 022 Elo 1500", because elo_1500_test.pgn file has that player. Replace that with the actual name from your pgn.

-p is for input pgn file
-E is for elo stats output

Read the manual for the meaning of options.

ordo_summary.txt would look something like this.

Code: Select all

   # PLAYER                              :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Rhetoric 1.4.3 Elo 1500             :  1760.8  224.0    10.0      12    83
   2 Discocheck 5.2 Elo 1500             :  1631.9  217.8     9.0      13    69
   3 Stockfish 2019-12-30 Elo 1500       :  1565.2  209.2     9.0      14    64
   4 Rodent IV 022 Elo 1500              :  1500.0   ----     6.5      13    50
   5 Arasan 21.1 Elo 1500                :  1424.4  211.2     5.0      12    42
   6 Hiarcs 14 Elo 1500                  :  1290.4  240.4     3.0      12    25
   7 Deuterium v2019.2.37.73 Elo 1500    :  1231.5  231.9     2.5      14    18

White advantage = 30.94 +/- 49.87
Draw rate (equal opponents) = 50.00 % +/- 0.00

Rodent IV 022 Elo 1500 is anchored at elo 1500.

Run ordo if pgn file has already some games in it.

2. How much games are necessary for a hobby tester to get a valid result?

More games is better, but important too is opening selection, try to have some popular openings, like slav, ruy lopez, nimzo, and others. You can use 8 plies or less. Twelve different openings are fine, do a round robin, engine1 vs engine2 will get 24 games with reversed color.

3. Which time setting is a good one - 5 minutes/game?

Depends on your hardware. You can use 3m+2s blitz, most of these engines are holding back their strength by limiting its nodes, depths, eval position knowledge and the use of unoptimized piece values. After 1s or 2s of thinking most of these engine can already generate a move intended for Elo 1500. Others just delayed sending their move.

It is also important that you have to play against these engines personally, and send feedback to the programmers, so they may develop their engines to play close to human at specific Elo rating range.

Calculating an engines elo rating?

Calculating an engines elo rating?

Re: Calculating an engines elo rating?

Re: Calculating an engines elo rating?

Re: Calculating an engines elo rating?

Re: Calculating an engines elo rating?

Re: Calculating an engines elo rating?

Re: Calculating an engines elo rating?

Re: Calculating an engines elo rating?

Re: Calculating an engines elo rating?