UCI_Elo

PK · Post by PK » Sun Jul 07, 2019 9:36 am

This thread shows perfectly why UCI_Elo ratings for different engines will never converge. There are several methods described already:

- strict node budget
- limiting nodes per second
- limiting nodes per second + random noise in eval function

Furthermore, a couple of other methods can be added:

- not calculating certain moves within the search (Phalanx)
- multiPV and choosing weaker move from time to time if it is not too weak (Stockfish level command)

Now, strict node budget and nodes per second are equal at one specific time control only (but both behave identically at different machines, as long as declared speed can be reached). This alone makes these methods not comparable.

Adding the other ones to the mix only makes matters worse. Using multi-pv brings computer speed as another variable. Random noise simulates the multi-PV way, meaning that weaker move can replace stronger, but with lesser degree of control/transparency of decision. Missing certain moves is like multi-pv method applied deeper in the tree.

In short, UCI_Elo comparisons will always compare apples to oranges.

Ferdy · Post by **Ferdy** » Sun Jul 07, 2019 10:16 am

PK wrote: ↑Sun Jul 07, 2019 9:36 am In short, UCI_Elo comparisons will always compare apples to oranges.

UCI_Elo comparison is fine, the only difference is that each author has its own implementation to weaken their program, the user does not care about which method engine authors chooses. The goal is to target the expected UCI_Elo requested by the user. If the user wants UCI_Elo 1000, then each author should try so that their engine is close to that rating.

Ferdy · Post by **Ferdy** » Sun Jul 07, 2019 11:05 am

Granting that the wishes of UCI designers is for the UCI_Elo to be equivalent to FIDE Elo, and using Kai's CCRL and FIDE relationship

Code: Select all

fide = 0.7*ccrl + 840

it is easier now for the engine authors to estimate the UCI_Elo via CCRL.

If user wants UCI_Elo 2000, solving for ccrl.
fide = 0.7*ccrl + 840
0.7*ccrl = fide - 840
ccrl = (fide - 840) / 0.7
ccrl = (2000 - 840) / 0.7
ccrl = 1657

Then goto the CCRL page and find those engines with rating close to that of 1657 at CCRL 40/4 list

Make those as sparring partners in the development of UCI_Elo 2000.

PK · Post by PK » Sun Jul 07, 2019 11:31 am

Granting that the wishes of UCI designers is for the UCI_Elo to be equivalent to FIDE Elo...

There's still a problem with time control. Node budget approach works for a single time control, so in order to approximate FIDE rating you would need at least to tune for two time controls (let's say 2+1 and 90+30) and then extrapolate (how? linearly? logarithmically?).

Nodes per second approach runs into another roadblock. Part of a difference between a weak engine and a strong engine is scaling. So weakening a 2800 engine so that it will perform as 1600 at blitz time control does not guarantee that results will be similar at longer time control.

With Rodent I run into another quagmire: setting playing strength and style simultaneously. Exggerated styles, especially attackers, were weaker at full strength and relatively stronger when slowed down.

pedrox · Post by **pedrox** » Sun Jul 07, 2019 6:07 pm

I have a problem with time control. When Ferdy tested my engine, I had the feeling that the engine was playing weaker than expected, so I asked him what time control he was using. And I've done the following fast test (danasah config as ucielo 1800):

Time control: 40/3 (similar CCRL 40/4 and the time control that I was using)

Code: Select all

   Motor          Puntuación          CD
1: DanaSah 7.9 LS 7,5/10      =1111=1=10 
2: CDrill 1800    2,5/10      ··········

Time control: 1+1

Code: Select all

   Motor          Puntuación          CD
1: CDrill 1800    6,5/10      ·········· 
2: DanaSah 7.9 LS 3,5/10      0100=00011

Time control: 60/1

Code: Select all

   Motor          Puntuación          CD
1: CDrill 1800    9,5/10      ·········· 
2: DanaSah 7.9 LS 0,5/10      00=0000000

For some reason my engine plays much weaker with a little control time (even now after Ferdy's test I've seen that I had an error playing 1 + 1 and I've corrected it).

I'm not sure if this is a problem of adding random noise to the evaluation. I have the impression that random noise makes the time control different, the longer the engine has the evaluation ends up adjusting and stabilizing and playing better.

Ras · Post by **Ras** » Sun Jul 07, 2019 6:50 pm

Ferdy wrote: ↑Sat Jul 06, 2019 4:52 amDoes anyone know of what is the base engine and rating of UCI_Elo?

In my case (CT800), it's basically a hack to get some sort of throttling. I did some tests for figuring out elo vs. speed and settled for about 56 Elo per doubling above 2100 and 80 Elo below that. This is because going from 5 plies depth to 6 plies brings more strength than going from 11 to 12 plies. Below 1900, I also introduced increasing eval noise and disabled selective deepening, and even mate may be overlooked.

What makes this engine especially difficult for Elo throttling tests is that the move time also impacts the throttling. What you configure is nominally assumed for 15 seconds per move. At 80 moves per game, that would be 20 minutes - rapid chess, but not blitz.

However, that is automatically modified within a +/- 50 Elo window from 5 seconds per move to 115 seconds per move. The idea is that humans make more mistakes at fast games, so the engine takes that into account for additional throttling.

The whole scale isn't really calibrated to anything and was just meant to let weaker players have fun while still keeping the basic overall playing style. In practice, it seems to work more or less.

The only "calibration" is the fact that the embedded target hardware of the CT800 (with the microcontroller) makes around 30 kNPS and scored around 2100 Elo in the standard tests for dedicated units, so I just took that as starting point.

Ferdy · Post by **Ferdy** » Sun Jul 07, 2019 8:33 pm

PK wrote: ↑Sun Jul 07, 2019 11:31 am
Granting that the wishes of UCI designers is for the UCI_Elo to be equivalent to FIDE Elo...
There's still a problem with time control. Node budget approach works for a single time control, so in order to approximate FIDE rating you would need at least to tune for two time controls (let's say 2+1 and 90+30) and then extrapolate (how? linearly? logarithmically?).

There is no problem with time control. When user requested a UCI_Elo of 1500, the engine should try to follow that requirement, even when the time control given by the user to the engine is say 10 minutes/move. However if the user gives the engine a very small time of say 5 ms/move, the engine still should try to achieve that Elo requirement, if the engine could not really achieve the 1500 Elo because it fails to get the minimum nodes (or other reasons) to get to the 1500 Elo level, then it is practical that the user should adjust and increase the engine thinking time. One challenge to the programmer is to deliver the requested Elo as fast as possible.

Ferdy · Post by **Ferdy** » Fri Jul 12, 2019 8:37 am

Trying to build a rating list (CCRL 40/4 condition, TC 40moves/2minutes on my PC) on engines set at UCI_Elo 1500. The anchor is NSVChess with ccrl rating of 946, and is close to FIDE elo 1500 / UCI_Elo 1500. Another engine Iota 1.0 with CCRL 1019 rating is also added so that performance of ucielo 1500 engines are not too dependent on NSVChess. Error bars are still high but clearly there are already engines that have high rating above 2000.

Code: Select all

   # PLAYER                              :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Amyan 1.72 ucielo 1500              :  2355.2  132.6    96.0     116    83
   2 Cheese 2.1 ucielo 1500              :  2343.8  130.3    95.0     116    82
   3 Cheng 4.39 ucielo 1500              :  2332.6  133.0    94.0     116    81
   4 Fruit reloaded v3.21 ucielo 1500    :  2325.9  129.7    91.5     114    80
   5 Ufim v8.02 ucielo 1500              :  2148.1  115.6    83.5     130    64
   6 Rhetoric 1.4.3 ucielo 1500          :  2114.4  120.6    70.0     114    61
   7 DanaSah 7.9 ucielo 1500             :  2104.2  113.4    79.5     132    60
   8 MadChess 2.2 ucielo 1500            :  2090.1  114.8    76.0     130    58
   9 Houdini 3 ucielo 1500               :  2063.3  127.6    65.5      96    68
  10 D2019.2.37.53 ucielo 1500           :  2020.5  115.3    61.5     116    53
  11 Discocheck 5.2 ucielo 1500          :  1844.9  112.0    43.5     116    38
  12 Iota 1.0 ccrl 1019                  :  1824.3  156.5    15.5      46    34
  13 CT800 V1.34 ucielo 1500             :  1765.6  110.0    38.5     132    29
  14 Arasan 21.3 ucielo 1500             :  1680.2  116.4    28.5     116    25
  15 NSVChess v0.14 ccrl 946             :  1500.0   ----    21.0     212    10
  16 Hiarcs 14 ucielo 1500               :  1490.9  120.5    14.5     130    11

Nice elo estimates from CT800, Arasan and Hiarcs so far.

Played 2 games against NSV to see how it played at this level. I am around 2000 Lichess blitz and around 1900 FIDE Elo.

[pgn][Event "Human vs computer"] [Site "?"] [Date "2019.07.10"] [Round "?"] [White "Ferdy"] [Black "NSVChess 0.14"] [Result "1-0"] [TimeControl "1/5"] [Termination "Adjudication"] 1. d4 e6 { book } 2. c4 f5 { book } 3. Nc3 Nf6 { book } 4. Nf3 Bb4 { book } 5. Bg5 Nc6 6. Rc1 O-O 7. e3 d6 8. a3 h6 9. Bxf6 Bxc3+ 10. Rxc3 Rxf6 11. Be2 d5 12. b4 dxc4 13. Bxc4 f4 14. e4 g5 15. h3 Rb8 16. O-O Qd6 17. e5 Nxe5 18. dxe5 Qxd1 19. Rxd1 Rf8 20. Rcd3 b5 21. Bb3 Rb6 22. Rd8 Bb7 23. Rxf8+ Kxf8 24. Rd8+ Ke7 25. Rh8 Bxf3 26. gxf3 Rc6 27. Rxh6 Rc1+ 28. Kg2 Rc3 29. Bxe6 Rd3 30. Bf5 Rxa3 31. Rh7+ Kd8 32. Rd7+ Ke8 33. Rxc7 Ra4 34. e6 Ra6 35. e7 Kf7 36. Bg4 Rh6 37. Rxa7 Ke8 38. Bf5 Rd6 39. Be4 Re6 40. Bd5 Rxe7 41. Rxe7+ Kxe7 1-0 [Event "Human vs computer"] [Site "?"] [Date "2019.07.10"] [Round "?"] [White "NSVChess 0.14"] [Black "Ferdy"] [Result "0-1"] [TimeControl "1/5"] [Termination "Adjudication"] 1. d4 { book } 1... d5 2. c4 { book } 2... e6 3. Nc3 { book } 3... Nf6 4. cxd5 { book } 4... exd5 5. Bg5 { book } 5... c6 6. Qd3 Be7 7. Qe3 O-O 8. Qe5 Be6 9. Nf3 Nbd7 10. Qe3 Qb6 11. h3 Qxb2 12. Rb1 Qa3 13. Bxf6 Bxf6 14. Nxd5 Qa5+ 15. Nc3 c5 16. Rb5 Qa6 17. Rb2 Rfe8 18. dxc5 Nxc5 19. Qxc5 Rac8 20. e4 Rxc5 21. Bxa6 Bxc3+ 22. Rd2 bxa6 23. Ke2 Bc4+ 24. Kd1 Bxd2 25. Nxd2 Rd8 26. Ke1 Bxa2 27. Rh2 Rc1+ 28. Ke2 Rc2 29. g4 Rdxd2+ 30. Ke3 g5 31. Kf3 Bc4 32. Kg3 Bf1 33. e5 Rc3+ 34. f3 Rxf3+ 35. Kxf3 Rxh2 36. Kg3 Re2 37. h4 0-1[/pgn]

pedrox · Post by **pedrox** » Fri Jul 12, 2019 2:39 pm

Hi Ferdy,

I believe that DanaSah will have played in engine mode, that is, it is configured in the uci configuration as an opponent "engine". In this case DanaSah will play as in the CCRL list.

If you want to see how it would in the FIDE list, use that ucielo as 1500 but select in the configuration as opponent "human", in this case DanaSah internally will play with 400 points less than Elo and I estimate that its result will be more similar to CT800 or Arasan. I hope I explained myself well.

(It is possible that the version of the engine that I published the other day as beta plays different strength in a 40/2 control time than another 1 + 1. In the 1 + 1 control time I realized that my engine was barely using the time, I have corrected it, but before releasing a version with regulation I want to calibrate the regulation better because I lacked time).

Perhaps to differentiate between one mode and another you could specify something like:

DanaSah 7.9 ucielo 1500 (engine)
DanaSah 7.9 ucielo 1500 (human)

Thanks for your tests, they are interesting.

Ferdy · Post by **Ferdy** » Fri Jul 12, 2019 7:31 pm

pedrox wrote: ↑Fri Jul 12, 2019 2:39 pm Hi Ferdy,

I believe that DanaSah will have played in engine mode, that is, it is configured in the uci configuration as an opponent "engine". In this case DanaSah will play as in the CCRL list.

If you want to see how it would in the FIDE list, use that ucielo as 1500 but select in the configuration as opponent "human", in this case DanaSah internally will play with 400 points less than Elo and I estimate that its result will be more similar to CT800 or Arasan. I hope I explained myself well.

Currently running a gauntlet of Danasah human at TC 40 moves / 2 min.
Results so far.

Code: Select all

Rank Name                          Elo     +/-   Games   Score   Draws
   0 DanaSah 7.9 human ucielo 1500    -545     nan      72    4.2%    0.0%
   1 Ufim v8.02 ucielo 1500        inf     nan       4  100.0%    0.0%
   2 Rhetoric 1.4.3 ucielo 1500     inf     nan       5  100.0%    0.0%
   3 MadChess 2.2 ucielo 1500      inf     nan       4  100.0%    0.0%
   4 Houdini 3 ucielo 1500         inf     nan       6  100.0%    0.0%
   5 Fruit reloaded v3.21 ucielo 1500     inf     nan       6  100.0%    0.0%
   6 Discocheck 5.2 ucielo 1500     inf     nan       6  100.0%    0.0%
   7 D2019.2.37.53 ucielo 1500     inf     nan       6  100.0%    0.0%
   8 Cheng 4.39 ucielo 1500        inf     nan       6  100.0%    0.0%
   9 Cheese 2.1 ucielo 1500        inf     nan       6  100.0%    0.0%
  10 Amyan 1.72 ucielo 1500        inf     nan       6  100.0%    0.0%
  11 CT800 V1.34 ucielo 1500       280     nan       6   83.3%    0.0%
  12 Arasan 21.3 ucielo 1500       280     nan       6   83.3%    0.0%
  13 Hiarcs 14 ucielo 1500         241     nan       5   80.0%    0.0%
72 of 208 games finished.

I would suggest to remove dependency of UCI_Elo on opponent human or engine and others. Since this UCI_Elo refers to FIDE Elo, it is better to design it against human opponents.

UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo