CCRL scaling versus human player

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

CCRL scaling versus human player

Post by xr_a_y »

I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.
User avatar
pedrox
Posts: 1056
Joined: Fri Mar 10, 2006 6:07 am
Location: Basque Country (Spain)

Re: CCRL scaling versus human player

Post by pedrox »

There is a website that has the Elo of the chess machines of the 80s-90s. These machines participated more in tournaments with humans than the current engines and hence it is assumed that this list has some Elo that are similar to Elo FIDE. The old SDDF lists also seem to have the most approximate Elos.

https://www.schach-computer.info/wiki/i ... -Elo-Liste

I played one of these machines, the Mephisto Roma32 that has an Elo wiki of 2075 and in CCRL only takes an Elo of about 1575 (currently it is possible to play many of these machines as uci engines), this is 500 points of difference!. This difference is not linear, approximately in the 700 points of Elo the 2 lists are equal.

My Elo list: https://sites.google.com/site/motoresde ... lo-compleo
Watch Mephisto Roma32 14 MHz, Roma32 plays as Darky.

In my engine when playing with ELo I distinguish if it plays against another engine or if it plays against another machine or human. To the human I put it another 100 points of ELo easier that the machines so that he does not complain.

Surely Larry Kaufman will be able to give good information.
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: CCRL scaling versus human player

Post by xr_a_y »

I'd really love to know a FIDE-like rating of Minic (Danasah is more or less at the same level). I try to put Minic on Lichess, but mostly engines are playing against it and most often only better engines probably trying to grab some elo point by point ... So Minic is "only" 2200 on lichess. But no human player ever win against it on lichess.

My plan is to work on a "level" functionnality and I'd like to propose a good scaling ...
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCRL scaling versus human player

Post by Laskos »

xr_a_y wrote: Thu Jun 20, 2019 9:52 am I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.
If you have a comparison only with engines and want to get human FIDE rating at tournament time control, if your engine is behaving regularly at longer TC, do roughly the following:

You get a computer rating of 2300 according to CCRL 40/4. The human FIDE rating at tournament time control is roughly 2800 - (2800 - 2300)*0.7 ~ 2450 FIDE Elo points at tournament time control. Add 100-150 Elo points (so 2550-2600 FIDE Elo) for blitz ratings, engines are strong versus humans at blitz. That factor of 0.7 is the "compression factor" of engine ratings when playing against humans.
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: CCRL scaling versus human player

Post by xr_a_y »

Laskos wrote: Thu Jun 20, 2019 8:30 pm
xr_a_y wrote: Thu Jun 20, 2019 9:52 am I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.
If you have a comparison only with engines and want to get human FIDE rating at tournament time control, if your engine is behaving regularly at longer TC, do roughly the following:

You get a computer rating of 2300 according to CCRL 40/4. The human FIDE rating at tournament time control is roughly 2800 - (2800 - 2300)*0.7 ~ 2450 FIDE Elo points at tournament time control. Add 100-150 Elo points (so 2550-2600 FIDE Elo) for blitz ratings, engines are strong versus humans at blitz. That factor of 0.7 is the "compression factor" of engine ratings when playing against humans.
So if I get you well, computer CCRL40/4 2800 and human FIDE 2800 fit.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCRL scaling versus human player

Post by Laskos »

xr_a_y wrote: Thu Jun 20, 2019 9:12 pm
Laskos wrote: Thu Jun 20, 2019 8:30 pm
xr_a_y wrote: Thu Jun 20, 2019 9:52 am I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.
If you have a comparison only with engines and want to get human FIDE rating at tournament time control, if your engine is behaving regularly at longer TC, do roughly the following:

You get a computer rating of 2300 according to CCRL 40/4. The human FIDE rating at tournament time control is roughly 2800 - (2800 - 2300)*0.7 ~ 2450 FIDE Elo points at tournament time control. Add 100-150 Elo points (so 2550-2600 FIDE Elo) for blitz ratings, engines are strong versus humans at blitz. That factor of 0.7 is the "compression factor" of engine ratings when playing against humans.
So if I get you well, computer CCRL40/4 2800 and human FIDE 2800 fit.
Yes, roughly.
jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: CCRL scaling versus human player

Post by jorose »

I feel it is conservative to estimate 2800 CCRL at only 2800 FIDE. That being said there are a lot of factors that complicate matters.

Humans memorize openings, this means they will play far stronger than their ratings in the opening, until they are out of book. I don't know if the context you are thinking of allows you to handle opening books, but to smooth things out I would consider adding an opening book. The opening book could be based on human games of the specified rating.

Engine strength is hardware dependent. You may want to consider trying to come up with ways to normalize engine strength across different hardware. I would bet on Carlsen against Minic on a Raspberry PI, but I would bet on Minic against Carlsen on TCEC hardware.
-Jonathan
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: CCRL scaling versus human player

Post by xr_a_y »

Yes, I think I'll play with the following things to defined level :
- fixed depth search
- activate or not some eval feature
- activate or not pruning
- not playing the best move (multi-pv search)

I'll give some result table soon.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL scaling versus human player

Post by lkaufman »

jorose wrote: Thu Jun 20, 2019 10:46 pm I feel it is conservative to estimate 2800 CCRL at only 2800 FIDE. That being said there are a lot of factors that complicate matters.

Humans memorize openings, this means they will play far stronger than their ratings in the opening, until they are out of book. I don't know if the context you are thinking of allows you to handle opening books, but to smooth things out I would consider adding an opening book. The opening book could be based on human games of the specified rating.

Engine strength is hardware dependent. You may want to consider trying to come up with ways to normalize engine strength across different hardware. I would bet on Carlsen against Minic on a Raspberry PI, but I would bet on Minic against Carlsen on TCEC hardware.
The CCRL list is indeed conservative by FIDE standards. Let's assume the engine gets a good book which tries to get out of theory early when it doesn't cost too much. Assume also the hardware specified as standard by CCRL, which is way below current PC speeds. Note that number of threads is not an issue as they rate 1 and 4 threads separately. Let's use CCRL 40/40 as it is obviously more relevant for tournament chess than 40/4. In 1998 Junior 5 earned a 2700 FIDE result in 9 games vs. top players, and while it is too old to even appear on CCRL, by extrapolation it would surely not be higher than 2600 on this list. But the hardware in 1998 was way below even the modest hardware specified by CCRL. The same conclusion would be reached by looking at the various matches from around 2003 or the Kramnik vs Fritz match of 2006; even a 2700 engine on CCRL would get a FIDE rating above 2800 with the above assumptions. Note that Deep Fritz 10, which beat Kramnik in that match, is at 2830 on the list, but it gave several handicaps to Kramnik such as letting him see the engine's book during the game (!), giving him a copy of the engine to practice against for months, limiting TBs to 5 man, etc. On the CCRL specified hardware with no special conditions like those, I believe that Fritz 10 on 4 cpus would easily defeat Carlsen in a match today.
One other point: CCRL uses Bayeselo which contracts the ratings considerably from normal elo, so although I totally agree with Kai about scaling engine results down to 70%, this is really correct just for CEGT. For CCRL much of the contraction is already done by bayeselo, so maybe 85% or so might be the right figure for CCRL 40/40. Of course blitz results are more spread out, so maybe 70% is actually about right for CCRL 40/4.
Komodo rules!
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: CCRL scaling versus human player

Post by xr_a_y »

Thanks a lot for this input.

Let me go back to my initial first idea anyway.
Would it be possible to ask to some human master to officialy play against middle range engines?
I guess some influent people here know some human masters or even that some members are masters themself. It won't be technicaly difficult to organize the thing, on lichess for example.