a direct comparison of FIDE and CCRL rating systems

Volker Pittlik · Post by **Volker Pittlik** » Tue Feb 23, 2016 7:23 pm

Laskos wrote:
JJJ wrote:So, the funny thing to do could be to ask a ~2800 elo to play normal chess with draw odds against Stockfish or Komodo. Draw = win for human.

Problem could be to get 10 games as minimum.
Maybe 5 games, draw odds, top GM always white, engine no book, tournament time control. But I doubt a GM success even in this case.

Why not top 10 GMs playing 10 games each vs top five engines? Then we have a nice calibration of the rating lists! The only problems may be

- humans may lose to much

- a sponsor may be necessary.

Volker

Laskos · Post by **Laskos** » Tue Feb 23, 2016 9:11 pm

Laskos wrote:I find your result interesting and intriguing. Up to now I have used the rule of thumb "CCRL computer ELO point is about 0.70 FIDE human ELO point". With CCRL and FIDE intersecting at the rating of 2800 for both. I plot here your result with my "rule of thumb" linear model:

We see the larger differences appear at CCRL ELO above 3500 and below 1800. And the models are pretty close otherwise. In the PDF you fit the "errors" with some power functions of ELO. Maybe if you do the fit with linear a*x+b, our results will look more similar? It is not clear from your plots that linear fit is so badly rejected, at least for FIDE ELO, although you might have observed a curvature there. I am a bit worried about your tails, for example a 2 ply engine is most likely below CCRL 1500, but most likely above FIDE 1000. Your plot seems to contradict this. Also, all the way from current top engines to perfect engine in FIDE ELO is worth less than 150 points. Isn't it a bit weird?

Having little experience with very weak engines to check the left tail, I tested Stockfish depth=2 against Zurichess Bern at 10''+0.1'', a weak but stable chess engine listed on CCRL. All in all, Stockfish depth=2 seems to have a CCRL 40/40 rating of 900-1000 ELO. At the same time, I estimate its FIDE rating to be above 1000 ELO, probably close to 1200 ELO. So, it seems that your left tail is significantly off.

Anyway your methodology seems sound, maybe more points are needed for better fitting. And it's nice to have an objective result based on measurable in-game quantities.

Guenther · Post by **Guenther** » Wed Feb 24, 2016 12:09 am

Laskos wrote:I fitted the points with FIDE = a + b/(CCRL+c)^3, the fit is almost perfect, R^2 being 0.99999990. The fit and the points given are here:

At a glance, I don't like too much the tails, but I will take a look later in the day.

I don't believe the ratings from CCRL in the region between 2200 and let's say 1700 are correctly fit to FIDE ratings.
E.g. IM players with 2400 must be clearly better than 2050 rated programs in CCRL, but that suggested table says they should be equal.
Moreover FIDE players under 1500 are extremely weak and should be weaker than their equivalent in CCRL not vice versa as the table shows.

Guenther

carldaman · Post by **carldaman** » Wed Feb 24, 2016 12:56 am

Laskos wrote:
JJJ wrote:So, the funny thing to do could be to ask a ~2800 elo to play normal chess with draw odds against Stockfish or Komodo. Draw = win for human.

Problem could be to get 10 games as minimum.
Maybe 5 games, draw odds, top GM always white, engine no book, tournament time control. But I doubt a GM success even in this case.

If the engine has no book, that would play right into the GM's hands. I'm afraid the lack of a book that can be worth at least extra 100 Elo points in the GM's account. Any skilled player, not just a 2800 Elo SuperGM, can steer the game into familiar and favorable lines/structures if the engine has no book.

CL

carldaman · Post by **carldaman** » Wed Feb 24, 2016 1:05 am

Guenther wrote:
Laskos wrote:I fitted the points with FIDE = a + b/(CCRL+c)^3, the fit is almost perfect, R^2 being 0.99999990. The fit and the points given are here:

At a glance, I don't like too much the tails, but I will take a look later in the day.
I don't believe the ratings from CCRL in the region between 2200 and let's say 1700 are correctly fit to FIDE ratings.
E.g. IM players with 2400 must be clearly better than 2050 rated programs in CCRL, but that suggested table says they should be equal.
Moreover FIDE players under 1500 are extremely weak and should be weaker than their equivalent in CCRL not vice versa as the table shows.

Guenther

CCRL-rated programs around 2050 will give human masters a lot of trouble. I'd say 2250 Elo would be more accurate on the human scale. Still not as strong as 2400 human Elo, so I'm not disagreeing.

CL

nimh · Post by **nimh** » Wed Feb 24, 2016 1:10 am

Actual results of course differ from theoretical results, because humans have the ability to use anti-computer strategy. But it's hard to say how big the effect is, because we have 3 unknown variables.

1) human's skill in employing anti-computer strategy. Positional players and those who are more experienced with playing against engines have advantage.
2) engine's susceptibility to such strategy. It seems to me that search depth is the most important factor. Which means, keeping time and hardware equal, better search algorithms with smaller branching factor can better cope with it.
3) the extent a certain positions allow that to be used. In open positions it virtually doesn't matter how you play; it is outcalculates you, but closed positions offer plenty of opportunities to build up your position without the engine having a clue before it's too late.

As time passes and advances in the search function and hardware, the gap between theoretical results and actual results diminishes.

Moreover FIDE players under 1500 are extremely weak and should be weaker than their equivalent in CCRL not vice versa as the table shows.

What makes you think so?

drj4759 · Post by **drj4759** » Wed Feb 24, 2016 7:56 am

I think adapting time to fit the published tournament time control is misleading. If you say 40/40 in CCRL means 40 moves in 40 minutes without proper disclosure is a great lie!

If you will be consistent, you will have different time control adaptation for each brand/model of CPU that CCRL uses. Not only that, you will also have different time control for each number of CPU cores like 12CPU, 8CPU, 4CPU, etc. as each of this produce different speeds in terms of nodes per second which is related to time. If you are serious, you will have to create a tournament time control for each of these and that will be a mess!

For the sake of legality 40/40 should mean 40 minutes in 40 moves. Other than this, you are inventing something.

Graham Banks · Post by **Graham Banks** » Wed Feb 24, 2016 8:31 am

drj4759 wrote:I think adapting time to fit the published tournament time control is misleading. If you say 40/40 in CCRL means 40 moves in 40 minutes without proper disclosure is a great lie!

If you will be consistent, you will have different time control adaptation for each brand/model of CPU that CCRL uses. Not only that, you will also have different time control for each number of CPU cores like 12CPU, 8CPU, 4CPU, etc. as each of this produce different speeds in terms of nodes per second which is related to time. If you are serious, you will have to create a tournament time control for each of these and that will be a mess!

For the sake of legality 40/40 should mean 40 minutes in 40 moves. Other than this, you are inventing something.

You'll find the following at the top of all our lists:

CCRL 40/40 Rating List — Single-CPU engines (Quote)
Ponder off, General book (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz)

drj4759 · Post by **drj4759** » Wed Feb 24, 2016 11:21 am

Very well then, but it would be proper to put it in the header where it is very visible.

Your method of calculating the time control based on a specific computer (Athlon 64 X2) will easily become obsolete and irrelevant. Not all people have an Athlon 64 X2 and there is no way that your system of time control calculation will become standard or adopted by most rating list producers.
Someday there will be no more Athlon and those that are new with rating list will wonder what an Athlon is.

velmarin · Post by **velmarin** » Wed Feb 24, 2016 11:40 am

drj4759 wrote: Your method of calculating the time control based on a specific computer (Athlon 64 X2) will easily become obsolete and irrelevant. Not all people have an Athlon 64 X2 and there is no way that your system of time control calculation will become standard or adopted by most rating list producers.
Someday there will be no more Athlon and those that are new with rating list will wonder what an Athlon is.

It will always be a formula like, is the easiest way to have a consensus on different computers.
Another question is if the motors handle equal 40 minutes, or 15 for example, how other tournaments or testing is used where the increase, TCEC case, I think that this produces very different results and games.

a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems