Hi friends,
I read Ferdy‘s tests on talkchess - which engine (uci setting 1500 elo) plays THE MOST like a human FIDE elo 1500
Result: Rodent IV. So for my tests this will be the reference engine.
I never did an engine tournamen/elo calculation.
Questions:
1. What is THE MOST SIMPLE tool to calculate the elo of some engines? When I have the pgn
I want to set ONE engine with a FIX elo number (Rodent IV - 1500 with elo 1500) and calculate the elo number of the other engines. If this is possible I would like to FIX some other engines elo later too. I am using Windows 10.
2. How much games are necessary for a hobby tester to get a valid result?
3. Which time setting is a good one - 5 minutes/game?
Peter
Calculating an engines elo rating?
Moderators: hgm, Rebel, chrisw
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: Calculating an engines elo rating?
1. For me the simplest tool is the Chessbase GUI. If you do not like chessbase try this one. https://www.remi-coulom.fr/Bayesian-Elo/PeterO wrote: ↑Sat Jan 04, 2020 11:42 pm Hi friends,
I read Ferdy‘s tests on talkchess - which engine (uci setting 1500 elo) plays THE MOST like a human FIDE elo 1500
Result: Rodent IV. So for my tests this will be the reference engine.
I never did an engine tournamen/elo calculation.
Questions:
1. What is THE MOST SIMPLE tool to calculate the elo of some engines? When I have the pgn
I want to set ONE engine with a FIX elo number (Rodent IV - 1500 with elo 1500) and calculate the elo number of the other engines. If this is possible I would like to FIX some other engines elo later too. I am using Windows 10.
2. How much games are necessary for a hobby tester to get a valid result?
3. Which time setting is a good one - 5 minutes/game?
Peter
2. It depends on what you are trying to achieve. If you want to know with the highest rating resolution for the programs tested. You can never have too many games. If you just need to know with a high level of certainty what program is better. This is variable depending on the strength difference between the programs. In other words 2 programs that are far apart in strength will take a lot less games to know with one is better. If the 2 programs are very close in strength it can take 100's to 1000's of games to achieve a high confidence of what program is better.
3. You are not limited to any one time control. If you want to know the 5 min ratings save them to a PGN. And If you want to know the rating at 1 min a game, or 10 mins a game or any time control. Just save those games to their own PGN file.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 215
- Joined: Sun Jul 31, 2016 6:35 pm
Re: Calculating an engines elo rating?
Hi,
thanks for your answer.
I have Chessbase 13 + Fritz 13.
1. How can I set FIX Elo points for ONE engine? And calculate all other engines from this?
Peter
thanks for your answer.
I have Chessbase 13 + Fritz 13.
1. How can I set FIX Elo points for ONE engine? And calculate all other engines from this?
Peter
-
- Posts: 266
- Joined: Fri Jul 10, 2015 9:23 pm
- Location: Russia
Re: Calculating an engines elo rating?
Depend on hardware and time control.
Eugene Kotlov
Hedgehog 2.1 64-bit coming soon...
Hedgehog 2.1 64-bit coming soon...
-
- Posts: 893
- Joined: Mon Jan 15, 2007 11:23 am
- Location: Warsza
Re: Calculating an engines elo rating?
Depends on time control and opponent's approach to weakening. Weak levels of Rodent run at low nodes per second, therefore should be the same across wide range of hardware (as long as you don't run it on a washing machine). But if other engine approximates elo by fixed nodes search, all comparisons are meaningless.Depend on hardware and time control.
Pawel Koziol
http://www.pkoziol.cal24.pl/rodent/rodent.htm
http://www.pkoziol.cal24.pl/rodent/rodent.htm
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: Calculating an engines elo rating?
In the chess base gui after you have the games played. You have the chess base gui calculate the ratings. Then you will have the option to gauge the rating to one engine as your fix point.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 4556
- Joined: Tue Jul 03, 2007 4:30 am
Re: Calculating an engines elo rating?
Just add or subtract a number from all ratings and you'll get what you want. For instance, if you end with Rodent IV 2500 ELO, subtract 1000 from the rating of all participants and you're done.
-
- Posts: 266
- Joined: Fri Jul 10, 2015 9:23 pm
- Location: Russia
Re: Calculating an engines elo rating?
I'm not mean exactly Rodent.PK wrote: ↑Sun Jan 05, 2020 5:21 pmDepends on time control and opponent's approach to weakening. Weak levels of Rodent run at low nodes per second, therefore should be the same across wide range of hardware (as long as you don't run it on a washing machine). But if other engine approximates elo by fixed nodes search, all comparisons are meaningless.Depend on hardware and time control.
Playing strength is relative value, not absolute.
What is 0 Elo in your measuring?
Eugene Kotlov
Hedgehog 2.1 64-bit coming soon...
Hedgehog 2.1 64-bit coming soon...
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Calculating an engines elo rating?
You can use ordo at https://github.com/michiguel/Ordo/releasesPeterO wrote: ↑Sat Jan 04, 2020 11:42 pm Hi friends,
I read Ferdy‘s tests on talkchess - which engine (uci setting 1500 elo) plays THE MOST like a human FIDE elo 1500
Result: Rodent IV. So for my tests this will be the reference engine.
I never did an engine tournamen/elo calculation.
Questions:
1. What is THE MOST SIMPLE tool to calculate the elo of some engines? When I have the pgn
I want to set ONE engine with a FIX elo number (Rodent IV - 1500 with elo 1500) and calculate the elo number of the other engines. If this is possible I would like to FIX some other engines elo later too. I am using Windows 10.
Download ordo-1.2.6-win.zip, and unzip it.
There is a sample batch file inside named ordo_example.bat, you can modify/edit it and use the following command lines.
Code: Select all
ordo-win64.exe -p elo_1500_test.pgn -g group_connection.txt
ordo-win64.exe -p elo_1500_test.pgn -o ordo_summary.txt -j ordo_head_to_head.txt -W -a 1500 -A "Rodent IV 022 Elo 1500" -G -s 100 -E
pause
There is also "Rodent IV 022 Elo 1500", because elo_1500_test.pgn file has that player. Replace that with the actual name from your pgn.
-p is for input pgn file
-E is for elo stats output
Read the manual for the meaning of options.
ordo_summary.txt would look something like this.
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 Rhetoric 1.4.3 Elo 1500 : 1760.8 224.0 10.0 12 83
2 Discocheck 5.2 Elo 1500 : 1631.9 217.8 9.0 13 69
3 Stockfish 2019-12-30 Elo 1500 : 1565.2 209.2 9.0 14 64
4 Rodent IV 022 Elo 1500 : 1500.0 ---- 6.5 13 50
5 Arasan 21.1 Elo 1500 : 1424.4 211.2 5.0 12 42
6 Hiarcs 14 Elo 1500 : 1290.4 240.4 3.0 12 25
7 Deuterium v2019.2.37.73 Elo 1500 : 1231.5 231.9 2.5 14 18
White advantage = 30.94 +/- 49.87
Draw rate (equal opponents) = 50.00 % +/- 0.00
Run ordo if pgn file has already some games in it.
More games is better, but important too is opening selection, try to have some popular openings, like slav, ruy lopez, nimzo, and others. You can use 8 plies or less. Twelve different openings are fine, do a round robin, engine1 vs engine2 will get 24 games with reversed color.2. How much games are necessary for a hobby tester to get a valid result?
Depends on your hardware. You can use 3m+2s blitz, most of these engines are holding back their strength by limiting its nodes, depths, eval position knowledge and the use of unoptimized piece values. After 1s or 2s of thinking most of these engine can already generate a move intended for Elo 1500. Others just delayed sending their move.3. Which time setting is a good one - 5 minutes/game?
It is also important that you have to play against these engines personally, and send feedback to the programmers, so they may develop their engines to play close to human at specific Elo rating range.