UCI_Elo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
lkaufman
Posts: 3724
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: UCI_Elo

Post by lkaufman » Thu Sep 19, 2019 2:03 am

pedrox wrote:
Mon Sep 16, 2019 9:50 am
lkaufman wrote:
Sun Sep 15, 2019 10:05 pm
Ferdy wrote:
Sun Sep 15, 2019 9:13 pm
lkaufman wrote:
Sun Sep 15, 2019 6:51 pm
Ferdy wrote:
Sat Jul 06, 2019 6:38 pm
Patrice Duhamel wrote:
Sat Jul 06, 2019 10:29 am
Any advice to tune engine parameters to correspond to the right UCI_Elo value ?

In Cheese I have 2 parameters, one to make very small pauses every N nodes,
and another one to add randomness to the evaluation under 2000 ELO.
If we cannot find a reference engine and corresponding UCI_Elo value we might as well use the CCRL as a reference. CCRL is better compared to other rating lists in this regard because it has tested lots of weaker programs. Perhaps we can use single processor at CCRL 40/4, as a base for UCI_Elo.
If the goal is to make versions that play at the same level as a human with an arbitrary (2300 for example) FIDE rating would play, then of course it is critical to specify what time limit the human is presumed to be playing at. I have done a fair amount of work on the question of how to estimate equivalent human FIDE ratings from engine rating lists. All of the well-known rating lists underrate the engines in general relative to FIDE ratings, with the amount of underrating increasing with decreasing faster time limits. This is assuming they are running on whatever they specify as their standard or reference hardware. I agree that the CCRL list is the closest to human FIDE levels of the major lists, in part because its span is artificially contracted by the use of BayesElo, which largely offsets the tendency of engine vs engine rating differences to be larger than if they were rated vs. humans. If the goal is to predict the rating an engine would earn vs. humans at 40/2 hours running on the reference hardware, then I would use the CCRL 40/40 list and add perhaps 250 elo. If the goal is to estimate ratings at 40/40 minutes, I would add at least 300. If we're talking about estimating blitz ratings vs. humans, use the 40/4 list and add maybe 500 elo or so. These are just ballpark numbers based on the limited GM vs engine data I have; I hope this is of some help to your work.
I am primarily interested in the fide elo 2000 and below. The human will play at std tc. I start investigating which engines with uci elo feature set at ucielo 1500 that can play as close as possible to human at around elo1500. Currently I tested these engines at 1s per position for 5k pos to get some stats.
My best guess is that an engine with a CCRL 40/40 rating of 1500 would be an even match (running on their reference hardware) at 40/2 hours with a human with a Fide rating of about 1900.
This could be in accordance with Kai's formula:

Elo FIDE = (0.7 x Elo CCRL) + 840 = (0.7 x 1500) + 840 = 1890

But I find that this formula is not good when the Elo FIDE drops from about 1400 points.

For example if we use Elos FIDE very low as 1000:

Elo CCRL = (Elo FIDE - 840)/0.7 = 229

This is the value of an engine with random moves (CCRL 40/4) and I believe that a player with an Elo FIDE of 1000 easily wins this engine. For this case I find the formula Elo FIDE = (0.8 x CCRL) + 560 or Elo FIDE = (0.85 x CCRL) + 420 somewhat better
I think that Kai may have overlooked that CCRL uses BayesElo which contracts the ratings. His 0.7 factor is probably quite accurate for CEGT but for CCRL it should probably be something like 0.9. Then to make the FIDE rating come out the same with a starting CCRL rating of 2000 (a very rough guess for the midpoint) the constant to add drops from 840 to 440. This seems much more realistic to me at both ends.
Komodo rules!

Post Reply