PeterO wrote: ↑Sat Jan 04, 2020 11:42 pm
Hi friends,
I read Ferdy‘s tests on talkchess - which engine (uci setting 1500 elo) plays THE MOST like a human FIDE elo 1500
Result: Rodent IV. So for my tests this will be the reference engine.
I never did an engine tournamen/elo calculation.
Questions:
1. What is THE MOST SIMPLE tool to calculate the elo of some engines? When I have the pgn
I want to set ONE engine with a FIX elo number (Rodent IV - 1500 with elo 1500) and calculate the elo number of the other engines. If this is possible I would like to FIX some other engines elo later too. I am using Windows 10.
You can use ordo at
https://github.com/michiguel/Ordo/releases
Download ordo-1.2.6-win.zip, and unzip it.
There is a sample batch file inside named ordo_example.bat, you can modify/edit it and use the following command lines.
Code: Select all
ordo-win64.exe -p elo_1500_test.pgn -g group_connection.txt
ordo-win64.exe -p elo_1500_test.pgn -o ordo_summary.txt -j ordo_head_to_head.txt -W -a 1500 -A "Rodent IV 022 Elo 1500" -G -s 100 -E
pause
There is elo_1500_test.pgn, If you have other filename, replace it.
There is also "Rodent IV 022 Elo 1500", because elo_1500_test.pgn file has that player. Replace that with the actual name from your pgn.
-p is for input pgn file
-E is for elo stats output
Read the manual for the meaning of options.
ordo_summary.txt would look something like this.
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 Rhetoric 1.4.3 Elo 1500 : 1760.8 224.0 10.0 12 83
2 Discocheck 5.2 Elo 1500 : 1631.9 217.8 9.0 13 69
3 Stockfish 2019-12-30 Elo 1500 : 1565.2 209.2 9.0 14 64
4 Rodent IV 022 Elo 1500 : 1500.0 ---- 6.5 13 50
5 Arasan 21.1 Elo 1500 : 1424.4 211.2 5.0 12 42
6 Hiarcs 14 Elo 1500 : 1290.4 240.4 3.0 12 25
7 Deuterium v2019.2.37.73 Elo 1500 : 1231.5 231.9 2.5 14 18
White advantage = 30.94 +/- 49.87
Draw rate (equal opponents) = 50.00 % +/- 0.00
Rodent IV 022 Elo 1500 is anchored at elo 1500.
Run ordo if pgn file has already some games in it.
2. How much games are necessary for a hobby tester to get a valid result?
More games is better, but important too is opening selection, try to have some popular openings, like slav, ruy lopez, nimzo, and others. You can use 8 plies or less. Twelve different openings are fine, do a round robin, engine1 vs engine2 will get 24 games with reversed color.
3. Which time setting is a good one - 5 minutes/game?
Depends on your hardware. You can use 3m+2s blitz, most of these engines are holding back their strength by limiting its nodes, depths, eval position knowledge and the use of unoptimized piece values. After 1s or 2s of thinking most of these engine can already generate a move intended for Elo 1500. Others just delayed sending their move.
It is also important that you have to play against these engines personally, and send feedback to the programmers, so they may develop their engines to play close to human at specific Elo rating range.