UCI_Elo

lkaufman · Post by **lkaufman** » Thu Sep 19, 2019 4:03 am

pedrox wrote: ↑Mon Sep 16, 2019 11:50 am
lkaufman wrote: ↑Mon Sep 16, 2019 12:05 am
Ferdy wrote: ↑Sun Sep 15, 2019 11:13 pm
lkaufman wrote: ↑Sun Sep 15, 2019 8:51 pm
Ferdy wrote: ↑Sat Jul 06, 2019 8:38 pm
Patrice Duhamel wrote: ↑Sat Jul 06, 2019 12:29 pm Any advice to tune engine parameters to correspond to the right UCI_Elo value ?

In Cheese I have 2 parameters, one to make very small pauses every N nodes,
and another one to add randomness to the evaluation under 2000 ELO.
If we cannot find a reference engine and corresponding UCI_Elo value we might as well use the CCRL as a reference. CCRL is better compared to other rating lists in this regard because it has tested lots of weaker programs. Perhaps we can use single processor at CCRL 40/4, as a base for UCI_Elo.
If the goal is to make versions that play at the same level as a human with an arbitrary (2300 for example) FIDE rating would play, then of course it is critical to specify what time limit the human is presumed to be playing at. I have done a fair amount of work on the question of how to estimate equivalent human FIDE ratings from engine rating lists. All of the well-known rating lists underrate the engines in general relative to FIDE ratings, with the amount of underrating increasing with decreasing faster time limits. This is assuming they are running on whatever they specify as their standard or reference hardware. I agree that the CCRL list is the closest to human FIDE levels of the major lists, in part because its span is artificially contracted by the use of BayesElo, which largely offsets the tendency of engine vs engine rating differences to be larger than if they were rated vs. humans. If the goal is to predict the rating an engine would earn vs. humans at 40/2 hours running on the reference hardware, then I would use the CCRL 40/40 list and add perhaps 250 elo. If the goal is to estimate ratings at 40/40 minutes, I would add at least 300. If we're talking about estimating blitz ratings vs. humans, use the 40/4 list and add maybe 500 elo or so. These are just ballpark numbers based on the limited GM vs engine data I have; I hope this is of some help to your work.
I am primarily interested in the fide elo 2000 and below. The human will play at std tc. I start investigating which engines with uci elo feature set at ucielo 1500 that can play as close as possible to human at around elo1500. Currently I tested these engines at 1s per position for 5k pos to get some stats.
My best guess is that an engine with a CCRL 40/40 rating of 1500 would be an even match (running on their reference hardware) at 40/2 hours with a human with a Fide rating of about 1900.
This could be in accordance with Kai's formula:

Elo FIDE = (0.7 x Elo CCRL) + 840 = (0.7 x 1500) + 840 = 1890

But I find that this formula is not good when the Elo FIDE drops from about 1400 points.

For example if we use Elos FIDE very low as 1000:

Elo CCRL = (Elo FIDE - 840)/0.7 = 229

This is the value of an engine with random moves (CCRL 40/4) and I believe that a player with an Elo FIDE of 1000 easily wins this engine. For this case I find the formula Elo FIDE = (0.8 x CCRL) + 560 or Elo FIDE = (0.85 x CCRL) + 420 somewhat better

I think that Kai may have overlooked that CCRL uses BayesElo which contracts the ratings. His 0.7 factor is probably quite accurate for CEGT but for CCRL it should probably be something like 0.9. Then to make the FIDE rating come out the same with a starting CCRL rating of 2000 (a very rough guess for the midpoint) the constant to add drops from 840 to 440. This seems much more realistic to me at both ends.

PeterO · Post by **PeterO** » Tue Dec 31, 2019 5:45 pm

Hi Ferdy - I am FASCINATED by your tests!!!

Do I understand this right:

1. Rodent IV - Elo setting 1500 plays THE MOST like a human player 1500 Fide Elo?
2. If Rodent UCI 1500 = Human Fide 1500 - we have a REFERENCE ENGINE - right?

3. Question: Is ONE reference engine enough to calculate the strenght of ALL UCI engines (area 1500-2000 Elo) - or do we need more reference engines?

Peter

MikeB · Post by **MikeB** » Tue Dec 31, 2019 7:02 pm

Ferdy wrote: ↑Sat Jul 06, 2019 4:52 am Does anyone know of what is the base engine and rating of UCI_Elo?

Collected some engines with UCI_Elo support. Here is the result for UCI_Elo 2300, TC 1m+1s on i7 3.4ghz PC, format is RR and Gauntlet for Lc0 since it was added later. The anchor is CT800 with CCRL40/40 rating of around 2300.

UCI_Elo 2300

Code: Select all

   # PLAYER                              :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Cheng 4.39 ucielo 2300              :  3187.9  173.9    95.0     102    93
   2 Fruit reloaded v3.21 ucielo 2300    :  2945.0  141.5    78.5     102    77
   3 Amyan 1.72 ucielo 2300              :  2832.9  134.8    68.0     103    66
   4 Cheese 2.0 ucielo 2300              :  2823.7  135.1    66.5     102    65
   5 Lc0 0.21.2 w48x5 blas               :  2789.5  119.1   154.0     243    63
   6 Rhetoric 1.4.3 ucielo 2300          :  2726.9  129.6    56.0     102    55
   7 Discocheck 5.2 ucielo 2300          :  2652.3  129.2    48.0     102    47
   8 Arasan 21.3 ucielo 2300             :  2628.2  126.3    45.5     102    45
   9 Wasp 3.60 ucielo 2300               :  2535.3  125.1    36.5     102    36
  10 CT800 V1.34 ucielo 2300             :  2300.0   ----    19.0     102    19
  11 D2019.2.37.53 ucielo 2300           :  2159.2  143.1    11.5     102    11
  12 Hiarcs 14 ucielo 2300               :  1983.5  188.0     4.5     102     4

Added Lc0 48x5 on blas to see where it stands against UCI_Elo 2300 engines.
I hope more engine authors would support UCI_Elo, this would allow user to select engine opponent easily to suit their strength.

And the UCI_Elo 2000, ordo was run at 2000 average rating.

Code: Select all

   # PLAYER                        :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Cheng 4.39 ucielo 2000        :  2633.9  178.2    74.5      80    93
   2 Cheese 2.0 ucielo 2000        :  2508.2  146.9    69.5      80    87
   3 Amyan 1.72 ucielo 2000        :  2410.2  128.8    65.0      80    81
   4 Rhetoric 1.4.3 ucielo 2000    :  2171.5  108.8    52.5      80    66
   5 Ufim v8.02 ucielo 2000        :  2008.2  101.1    42.5      80    53
   6 Discocheck 5.2 ucielo 2000    :  1977.8  100.2    40.5      80    51
   7 MadChess 2.2 ucielo 2000      :  1817.3  102.8    29.5      80    37
   8 Arasan 21.3 ucielo 2000       :  1759.7  100.1    25.5      80    32
   9 CT800 V1.34 ucielo 2000       :  1624.1  112.6    16.5      80    21
  10 D2019.2.37.53 ucielo 2000     :  1607.9  111.7    15.5      80    19
  11 Hiarcs 14 ucielo 2000         :  1481.1  130.0     8.5      80    11

With Honey , rating of 1712 or so was anchored to Shallow Blue (CCRL rating of 1712 or so ). After it was all done , I then realized CCRL at lower ratings is much stronger than FIDE - CCRL 1700 is 2000+ FIDE , so Kai had posted a formula to convert from FIDE to CCRL , so I reversed the formula to go from CCCR to FIDE ( so users can Play FIDE Elo and not CCRL Elo). CCRL 2800 is about 2800 FIDE above 2800 , FIDE is stronger , I.e., 2900 FIDE is stronger than 2900 CCRL.

PeterO · Post by **PeterO** » Fri Jan 03, 2020 8:14 pm

Hi Mike,

1. so Honey has at the moment no FIDE elo - right?
2. Your reference engine was Shallow Blue - right?
- would it be better ( according to Ferdy’s tests) to take Rodend IV-Elo 1500 as reference engine to get REAL good results?

Peter

UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo

Re: UCI_Elo