This thread shows perfectly why UCI_Elo ratings for different engines will never converge. There are several methods described already:
- strict node budget
- limiting nodes per second
- limiting nodes per second + random noise in eval function
Furthermore, a couple of other methods can be added:
- not calculating certain moves within the search (Phalanx)
- multiPV and choosing weaker move from time to time if it is not too weak (Stockfish level command)
Now, strict node budget and nodes per second are equal at one specific time control only (but both behave identically at different machines, as long as declared speed can be reached). This alone makes these methods not comparable.
Adding the other ones to the mix only makes matters worse. Using multi-pv brings computer speed as another variable. Random noise simulates the multi-PV way, meaning that weaker move can replace stronger, but with lesser degree of control/transparency of decision. Missing certain moves is like multi-pv method applied deeper in the tree.
In short, UCI_Elo comparisons will always compare apples to oranges.
UCI_Elo
Moderators: hgm, Rebel, chrisw
-
- Posts: 893
- Joined: Mon Jan 15, 2007 11:23 am
- Location: Warsza
Re: UCI_Elo
Pawel Koziol
http://www.pkoziol.cal24.pl/rodent/rodent.htm
http://www.pkoziol.cal24.pl/rodent/rodent.htm
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: UCI_Elo
UCI_Elo comparison is fine, the only difference is that each author has its own implementation to weaken their program, the user does not care about which method engine authors chooses. The goal is to target the expected UCI_Elo requested by the user. If the user wants UCI_Elo 1000, then each author should try so that their engine is close to that rating.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: UCI_Elo
Granting that the wishes of UCI designers is for the UCI_Elo to be equivalent to FIDE Elo, and using Kai's CCRL and FIDE relationship
it is easier now for the engine authors to estimate the UCI_Elo via CCRL.
If user wants UCI_Elo 2000, solving for ccrl.
fide = 0.7*ccrl + 840
0.7*ccrl = fide - 840
ccrl = (fide - 840) / 0.7
ccrl = (2000 - 840) / 0.7
ccrl = 1657
Then goto the CCRL page and find those engines with rating close to that of 1657 at CCRL 40/4 list
Make those as sparring partners in the development of UCI_Elo 2000.
Code: Select all
fide = 0.7*ccrl + 840
If user wants UCI_Elo 2000, solving for ccrl.
fide = 0.7*ccrl + 840
0.7*ccrl = fide - 840
ccrl = (fide - 840) / 0.7
ccrl = (2000 - 840) / 0.7
ccrl = 1657
Then goto the CCRL page and find those engines with rating close to that of 1657 at CCRL 40/4 list
Make those as sparring partners in the development of UCI_Elo 2000.
-
- Posts: 893
- Joined: Mon Jan 15, 2007 11:23 am
- Location: Warsza
Re: UCI_Elo
There's still a problem with time control. Node budget approach works for a single time control, so in order to approximate FIDE rating you would need at least to tune for two time controls (let's say 2+1 and 90+30) and then extrapolate (how? linearly? logarithmically?).Granting that the wishes of UCI designers is for the UCI_Elo to be equivalent to FIDE Elo...
Nodes per second approach runs into another roadblock. Part of a difference between a weak engine and a strong engine is scaling. So weakening a 2800 engine so that it will perform as 1600 at blitz time control does not guarantee that results will be similar at longer time control.
With Rodent I run into another quagmire: setting playing strength and style simultaneously. Exggerated styles, especially attackers, were weaker at full strength and relatively stronger when slowed down.
Pawel Koziol
http://www.pkoziol.cal24.pl/rodent/rodent.htm
http://www.pkoziol.cal24.pl/rodent/rodent.htm
-
- Posts: 1056
- Joined: Fri Mar 10, 2006 6:07 am
- Location: Basque Country (Spain)
Re: UCI_Elo
I have a problem with time control. When Ferdy tested my engine, I had the feeling that the engine was playing weaker than expected, so I asked him what time control he was using. And I've done the following fast test (danasah config as ucielo 1800):
Time control: 40/3 (similar CCRL 40/4 and the time control that I was using)
Time control: 1+1
Time control: 60/1
For some reason my engine plays much weaker with a little control time (even now after Ferdy's test I've seen that I had an error playing 1 + 1 and I've corrected it).
I'm not sure if this is a problem of adding random noise to the evaluation. I have the impression that random noise makes the time control different, the longer the engine has the evaluation ends up adjusting and stabilizing and playing better.
Time control: 40/3 (similar CCRL 40/4 and the time control that I was using)
Code: Select all
Motor Puntuación CD
1: DanaSah 7.9 LS 7,5/10 =1111=1=10
2: CDrill 1800 2,5/10 ··········
Code: Select all
Motor Puntuación CD
1: CDrill 1800 6,5/10 ··········
2: DanaSah 7.9 LS 3,5/10 0100=00011
Code: Select all
Motor Puntuación CD
1: CDrill 1800 9,5/10 ··········
2: DanaSah 7.9 LS 0,5/10 00=0000000
I'm not sure if this is a problem of adding random noise to the evaluation. I have the impression that random noise makes the time control different, the longer the engine has the evaluation ends up adjusting and stabilizing and playing better.
-
- Posts: 2487
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: UCI_Elo
In my case (CT800), it's basically a hack to get some sort of throttling. I did some tests for figuring out elo vs. speed and settled for about 56 Elo per doubling above 2100 and 80 Elo below that. This is because going from 5 plies depth to 6 plies brings more strength than going from 11 to 12 plies. Below 1900, I also introduced increasing eval noise and disabled selective deepening, and even mate may be overlooked.
What makes this engine especially difficult for Elo throttling tests is that the move time also impacts the throttling. What you configure is nominally assumed for 15 seconds per move. At 80 moves per game, that would be 20 minutes - rapid chess, but not blitz.
However, that is automatically modified within a +/- 50 Elo window from 5 seconds per move to 115 seconds per move. The idea is that humans make more mistakes at fast games, so the engine takes that into account for additional throttling.
The whole scale isn't really calibrated to anything and was just meant to let weaker players have fun while still keeping the basic overall playing style. In practice, it seems to work more or less.
The only "calibration" is the fact that the embedded target hardware of the CT800 (with the microcontroller) makes around 30 kNPS and scored around 2100 Elo in the standard tests for dedicated units, so I just took that as starting point.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: UCI_Elo
There is no problem with time control. When user requested a UCI_Elo of 1500, the engine should try to follow that requirement, even when the time control given by the user to the engine is say 10 minutes/move. However if the user gives the engine a very small time of say 5 ms/move, the engine still should try to achieve that Elo requirement, if the engine could not really achieve the 1500 Elo because it fails to get the minimum nodes (or other reasons) to get to the 1500 Elo level, then it is practical that the user should adjust and increase the engine thinking time. One challenge to the programmer is to deliver the requested Elo as fast as possible.PK wrote: ↑Sun Jul 07, 2019 11:31 amThere's still a problem with time control. Node budget approach works for a single time control, so in order to approximate FIDE rating you would need at least to tune for two time controls (let's say 2+1 and 90+30) and then extrapolate (how? linearly? logarithmically?).Granting that the wishes of UCI designers is for the UCI_Elo to be equivalent to FIDE Elo...
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: UCI_Elo
Trying to build a rating list (CCRL 40/4 condition, TC 40moves/2minutes on my PC) on engines set at UCI_Elo 1500. The anchor is NSVChess with ccrl rating of 946, and is close to FIDE elo 1500 / UCI_Elo 1500. Another engine Iota 1.0 with CCRL 1019 rating is also added so that performance of ucielo 1500 engines are not too dependent on NSVChess. Error bars are still high but clearly there are already engines that have high rating above 2000.
Nice elo estimates from CT800, Arasan and Hiarcs so far.
Played 2 games against NSV to see how it played at this level. I am around 2000 Lichess blitz and around 1900 FIDE Elo.
[pgn][Event "Human vs computer"] [Site "?"] [Date "2019.07.10"] [Round "?"] [White "Ferdy"] [Black "NSVChess 0.14"] [Result "1-0"] [TimeControl "1/5"] [Termination "Adjudication"] 1. d4 e6 { book } 2. c4 f5 { book } 3. Nc3 Nf6 { book } 4. Nf3 Bb4 { book } 5. Bg5 Nc6 6. Rc1 O-O 7. e3 d6 8. a3 h6 9. Bxf6 Bxc3+ 10. Rxc3 Rxf6 11. Be2 d5 12. b4 dxc4 13. Bxc4 f4 14. e4 g5 15. h3 Rb8 16. O-O Qd6 17. e5 Nxe5 18. dxe5 Qxd1 19. Rxd1 Rf8 20. Rcd3 b5 21. Bb3 Rb6 22. Rd8 Bb7 23. Rxf8+ Kxf8 24. Rd8+ Ke7 25. Rh8 Bxf3 26. gxf3 Rc6 27. Rxh6 Rc1+ 28. Kg2 Rc3 29. Bxe6 Rd3 30. Bf5 Rxa3 31. Rh7+ Kd8 32. Rd7+ Ke8 33. Rxc7 Ra4 34. e6 Ra6 35. e7 Kf7 36. Bg4 Rh6 37. Rxa7 Ke8 38. Bf5 Rd6 39. Be4 Re6 40. Bd5 Rxe7 41. Rxe7+ Kxe7 1-0 [Event "Human vs computer"] [Site "?"] [Date "2019.07.10"] [Round "?"] [White "NSVChess 0.14"] [Black "Ferdy"] [Result "0-1"] [TimeControl "1/5"] [Termination "Adjudication"] 1. d4 { book } 1... d5 2. c4 { book } 2... e6 3. Nc3 { book } 3... Nf6 4. cxd5 { book } 4... exd5 5. Bg5 { book } 5... c6 6. Qd3 Be7 7. Qe3 O-O 8. Qe5 Be6 9. Nf3 Nbd7 10. Qe3 Qb6 11. h3 Qxb2 12. Rb1 Qa3 13. Bxf6 Bxf6 14. Nxd5 Qa5+ 15. Nc3 c5 16. Rb5 Qa6 17. Rb2 Rfe8 18. dxc5 Nxc5 19. Qxc5 Rac8 20. e4 Rxc5 21. Bxa6 Bxc3+ 22. Rd2 bxa6 23. Ke2 Bc4+ 24. Kd1 Bxd2 25. Nxd2 Rd8 26. Ke1 Bxa2 27. Rh2 Rc1+ 28. Ke2 Rc2 29. g4 Rdxd2+ 30. Ke3 g5 31. Kf3 Bc4 32. Kg3 Bf1 33. e5 Rc3+ 34. f3 Rxf3+ 35. Kxf3 Rxh2 36. Kg3 Re2 37. h4 0-1[/pgn]
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 Amyan 1.72 ucielo 1500 : 2355.2 132.6 96.0 116 83
2 Cheese 2.1 ucielo 1500 : 2343.8 130.3 95.0 116 82
3 Cheng 4.39 ucielo 1500 : 2332.6 133.0 94.0 116 81
4 Fruit reloaded v3.21 ucielo 1500 : 2325.9 129.7 91.5 114 80
5 Ufim v8.02 ucielo 1500 : 2148.1 115.6 83.5 130 64
6 Rhetoric 1.4.3 ucielo 1500 : 2114.4 120.6 70.0 114 61
7 DanaSah 7.9 ucielo 1500 : 2104.2 113.4 79.5 132 60
8 MadChess 2.2 ucielo 1500 : 2090.1 114.8 76.0 130 58
9 Houdini 3 ucielo 1500 : 2063.3 127.6 65.5 96 68
10 D2019.2.37.53 ucielo 1500 : 2020.5 115.3 61.5 116 53
11 Discocheck 5.2 ucielo 1500 : 1844.9 112.0 43.5 116 38
12 Iota 1.0 ccrl 1019 : 1824.3 156.5 15.5 46 34
13 CT800 V1.34 ucielo 1500 : 1765.6 110.0 38.5 132 29
14 Arasan 21.3 ucielo 1500 : 1680.2 116.4 28.5 116 25
15 NSVChess v0.14 ccrl 946 : 1500.0 ---- 21.0 212 10
16 Hiarcs 14 ucielo 1500 : 1490.9 120.5 14.5 130 11
Played 2 games against NSV to see how it played at this level. I am around 2000 Lichess blitz and around 1900 FIDE Elo.
[pgn][Event "Human vs computer"] [Site "?"] [Date "2019.07.10"] [Round "?"] [White "Ferdy"] [Black "NSVChess 0.14"] [Result "1-0"] [TimeControl "1/5"] [Termination "Adjudication"] 1. d4 e6 { book } 2. c4 f5 { book } 3. Nc3 Nf6 { book } 4. Nf3 Bb4 { book } 5. Bg5 Nc6 6. Rc1 O-O 7. e3 d6 8. a3 h6 9. Bxf6 Bxc3+ 10. Rxc3 Rxf6 11. Be2 d5 12. b4 dxc4 13. Bxc4 f4 14. e4 g5 15. h3 Rb8 16. O-O Qd6 17. e5 Nxe5 18. dxe5 Qxd1 19. Rxd1 Rf8 20. Rcd3 b5 21. Bb3 Rb6 22. Rd8 Bb7 23. Rxf8+ Kxf8 24. Rd8+ Ke7 25. Rh8 Bxf3 26. gxf3 Rc6 27. Rxh6 Rc1+ 28. Kg2 Rc3 29. Bxe6 Rd3 30. Bf5 Rxa3 31. Rh7+ Kd8 32. Rd7+ Ke8 33. Rxc7 Ra4 34. e6 Ra6 35. e7 Kf7 36. Bg4 Rh6 37. Rxa7 Ke8 38. Bf5 Rd6 39. Be4 Re6 40. Bd5 Rxe7 41. Rxe7+ Kxe7 1-0 [Event "Human vs computer"] [Site "?"] [Date "2019.07.10"] [Round "?"] [White "NSVChess 0.14"] [Black "Ferdy"] [Result "0-1"] [TimeControl "1/5"] [Termination "Adjudication"] 1. d4 { book } 1... d5 2. c4 { book } 2... e6 3. Nc3 { book } 3... Nf6 4. cxd5 { book } 4... exd5 5. Bg5 { book } 5... c6 6. Qd3 Be7 7. Qe3 O-O 8. Qe5 Be6 9. Nf3 Nbd7 10. Qe3 Qb6 11. h3 Qxb2 12. Rb1 Qa3 13. Bxf6 Bxf6 14. Nxd5 Qa5+ 15. Nc3 c5 16. Rb5 Qa6 17. Rb2 Rfe8 18. dxc5 Nxc5 19. Qxc5 Rac8 20. e4 Rxc5 21. Bxa6 Bxc3+ 22. Rd2 bxa6 23. Ke2 Bc4+ 24. Kd1 Bxd2 25. Nxd2 Rd8 26. Ke1 Bxa2 27. Rh2 Rc1+ 28. Ke2 Rc2 29. g4 Rdxd2+ 30. Ke3 g5 31. Kf3 Bc4 32. Kg3 Bf1 33. e5 Rc3+ 34. f3 Rxf3+ 35. Kxf3 Rxh2 36. Kg3 Re2 37. h4 0-1[/pgn]
-
- Posts: 1056
- Joined: Fri Mar 10, 2006 6:07 am
- Location: Basque Country (Spain)
Re: UCI_Elo
Hi Ferdy,
I believe that DanaSah will have played in engine mode, that is, it is configured in the uci configuration as an opponent "engine". In this case DanaSah will play as in the CCRL list.
If you want to see how it would in the FIDE list, use that ucielo as 1500 but select in the configuration as opponent "human", in this case DanaSah internally will play with 400 points less than Elo and I estimate that its result will be more similar to CT800 or Arasan. I hope I explained myself well.
(It is possible that the version of the engine that I published the other day as beta plays different strength in a 40/2 control time than another 1 + 1. In the 1 + 1 control time I realized that my engine was barely using the time, I have corrected it, but before releasing a version with regulation I want to calibrate the regulation better because I lacked time).
Perhaps to differentiate between one mode and another you could specify something like:
DanaSah 7.9 ucielo 1500 (engine)
DanaSah 7.9 ucielo 1500 (human)
Thanks for your tests, they are interesting.
I believe that DanaSah will have played in engine mode, that is, it is configured in the uci configuration as an opponent "engine". In this case DanaSah will play as in the CCRL list.
If you want to see how it would in the FIDE list, use that ucielo as 1500 but select in the configuration as opponent "human", in this case DanaSah internally will play with 400 points less than Elo and I estimate that its result will be more similar to CT800 or Arasan. I hope I explained myself well.
(It is possible that the version of the engine that I published the other day as beta plays different strength in a 40/2 control time than another 1 + 1. In the 1 + 1 control time I realized that my engine was barely using the time, I have corrected it, but before releasing a version with regulation I want to calibrate the regulation better because I lacked time).
Perhaps to differentiate between one mode and another you could specify something like:
DanaSah 7.9 ucielo 1500 (engine)
DanaSah 7.9 ucielo 1500 (human)
Thanks for your tests, they are interesting.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: UCI_Elo
Currently running a gauntlet of Danasah human at TC 40 moves / 2 min.pedrox wrote: ↑Fri Jul 12, 2019 2:39 pm Hi Ferdy,
I believe that DanaSah will have played in engine mode, that is, it is configured in the uci configuration as an opponent "engine". In this case DanaSah will play as in the CCRL list.
If you want to see how it would in the FIDE list, use that ucielo as 1500 but select in the configuration as opponent "human", in this case DanaSah internally will play with 400 points less than Elo and I estimate that its result will be more similar to CT800 or Arasan. I hope I explained myself well.
Results so far.
Code: Select all
Rank Name Elo +/- Games Score Draws
0 DanaSah 7.9 human ucielo 1500 -545 nan 72 4.2% 0.0%
1 Ufim v8.02 ucielo 1500 inf nan 4 100.0% 0.0%
2 Rhetoric 1.4.3 ucielo 1500 inf nan 5 100.0% 0.0%
3 MadChess 2.2 ucielo 1500 inf nan 4 100.0% 0.0%
4 Houdini 3 ucielo 1500 inf nan 6 100.0% 0.0%
5 Fruit reloaded v3.21 ucielo 1500 inf nan 6 100.0% 0.0%
6 Discocheck 5.2 ucielo 1500 inf nan 6 100.0% 0.0%
7 D2019.2.37.53 ucielo 1500 inf nan 6 100.0% 0.0%
8 Cheng 4.39 ucielo 1500 inf nan 6 100.0% 0.0%
9 Cheese 2.1 ucielo 1500 inf nan 6 100.0% 0.0%
10 Amyan 1.72 ucielo 1500 inf nan 6 100.0% 0.0%
11 CT800 V1.34 ucielo 1500 280 nan 6 83.3% 0.0%
12 Arasan 21.3 ucielo 1500 280 nan 6 83.3% 0.0%
13 Hiarcs 14 ucielo 1500 241 nan 5 80.0% 0.0%
72 of 208 games finished.