Re: UCI_Elo
Posted: Thu Sep 19, 2019 4:03 am
I think that Kai may have overlooked that CCRL uses BayesElo which contracts the ratings. His 0.7 factor is probably quite accurate for CEGT but for CCRL it should probably be something like 0.9. Then to make the FIDE rating come out the same with a starting CCRL rating of 2000 (a very rough guess for the midpoint) the constant to add drops from 840 to 440. This seems much more realistic to me at both ends.pedrox wrote: ↑Mon Sep 16, 2019 11:50 amThis could be in accordance with Kai's formula:lkaufman wrote: ↑Mon Sep 16, 2019 12:05 amMy best guess is that an engine with a CCRL 40/40 rating of 1500 would be an even match (running on their reference hardware) at 40/2 hours with a human with a Fide rating of about 1900.Ferdy wrote: ↑Sun Sep 15, 2019 11:13 pmI am primarily interested in the fide elo 2000 and below. The human will play at std tc. I start investigating which engines with uci elo feature set at ucielo 1500 that can play as close as possible to human at around elo1500. Currently I tested these engines at 1s per position for 5k pos to get some stats.lkaufman wrote: ↑Sun Sep 15, 2019 8:51 pmIf the goal is to make versions that play at the same level as a human with an arbitrary (2300 for example) FIDE rating would play, then of course it is critical to specify what time limit the human is presumed to be playing at. I have done a fair amount of work on the question of how to estimate equivalent human FIDE ratings from engine rating lists. All of the well-known rating lists underrate the engines in general relative to FIDE ratings, with the amount of underrating increasing with decreasing faster time limits. This is assuming they are running on whatever they specify as their standard or reference hardware. I agree that the CCRL list is the closest to human FIDE levels of the major lists, in part because its span is artificially contracted by the use of BayesElo, which largely offsets the tendency of engine vs engine rating differences to be larger than if they were rated vs. humans. If the goal is to predict the rating an engine would earn vs. humans at 40/2 hours running on the reference hardware, then I would use the CCRL 40/40 list and add perhaps 250 elo. If the goal is to estimate ratings at 40/40 minutes, I would add at least 300. If we're talking about estimating blitz ratings vs. humans, use the 40/4 list and add maybe 500 elo or so. These are just ballpark numbers based on the limited GM vs engine data I have; I hope this is of some help to your work.Ferdy wrote: ↑Sat Jul 06, 2019 8:38 pmIf we cannot find a reference engine and corresponding UCI_Elo value we might as well use the CCRL as a reference. CCRL is better compared to other rating lists in this regard because it has tested lots of weaker programs. Perhaps we can use single processor at CCRL 40/4, as a base for UCI_Elo.Patrice Duhamel wrote: ↑Sat Jul 06, 2019 12:29 pm Any advice to tune engine parameters to correspond to the right UCI_Elo value ?
In Cheese I have 2 parameters, one to make very small pauses every N nodes,
and another one to add randomness to the evaluation under 2000 ELO.
Elo FIDE = (0.7 x Elo CCRL) + 840 = (0.7 x 1500) + 840 = 1890
But I find that this formula is not good when the Elo FIDE drops from about 1400 points.
For example if we use Elos FIDE very low as 1000:
Elo CCRL = (Elo FIDE - 840)/0.7 = 229
This is the value of an engine with random moves (CCRL 40/4) and I believe that a player with an Elo FIDE of 1000 easily wins this engine. For this case I find the formula Elo FIDE = (0.8 x CCRL) + 560 or Elo FIDE = (0.85 x CCRL) + 420 somewhat better