a direct comparison of FIDE and CCRL rating systems

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

nimh
Posts: 46
Joined: Sun Nov 30, 2014 12:06 am

Re: a direct comparison of FIDE and CCRL rating systems

Post by nimh »

Graham Banks wrote:
nimh wrote:The hardware that is used in creating CCRL lists is outdated by todays's standards, and time controls are ca 3x shorter.
That is not the actual hardware that we use.
We just used that machine as our original benchmark to find the adapted time controls to use on our computers.

For example, Nathanael's overclocked i7 Haswell quad uses 40/15, whereas an older Q6600 uses 40/32.
CCRL 40/40 equates to around 40/18 on modern hardware.

Hope that explains things.
Thanks, I see now that I misunderstood :) so the speed difference is not 16x but rather 2.2x?
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: a direct comparison of FIDE and CCRL rating systems

Post by JJJ »

Kai,
Let's assume the true strengh of Stockfish 7 is 3150.

What is the probabily of Magnus Carslen 2850 to win a game against it ?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: a direct comparison of FIDE and CCRL rating systems

Post by Laskos »

JJJ wrote:Kai,
Let's assume the true strengh of Stockfish 7 is 3150.

What is the probabily of Magnus Carslen 2850 to win a game against it ?
Well, if FIDE rating in this case denotes some strength relationship between a human and an engine, it is predicted that in 10 games Carlsen will get 1.5 points (say one win and one draw, or more probably 3 draws). That's a bit more than I would think. There are also some issues, for example, by this curve, it is predicted that a 2800 GM stands a better chance against Stockfish 7 than Rybka 2.1 on identical hardware. Somehow I doubt this. There are other issues too, on the low ELO, for example, but I will post later.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: a direct comparison of FIDE and CCRL rating systems

Post by JJJ »

So, the funny thing to do could be to ask a ~2800 elo to play normal chess with draw odds against Stockfish or Komodo. Draw = win for human.

Problem could be to get 10 games as minimum.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: a direct comparison of FIDE and CCRL rating systems

Post by Laskos »

I find your result interesting and intriguing. Up to now I have used the rule of thumb "CCRL computer ELO point is about 0.70 FIDE human ELO point". With CCRL and FIDE intersecting at the rating of 2800 for both. I plot here your result with my "rule of thumb" linear model:
Image

We see the larger differences appear at CCRL ELO above 3500 and below 1800. And the models are pretty close otherwise. In the PDF you fit the "errors" with some power functions of ELO. Maybe if you do the fit with linear a*x+b, our results will look more similar? It is not clear from your plots that linear fit is so badly rejected, at least for FIDE ELO, although you might have observed a curvature there. I am a bit worried about your tails, for example a 2 ply engine is most likely below CCRL 1500, but most likely above FIDE 1000. Your plot seems to contradict this. Also, all the way from current top engines to perfect engine in FIDE ELO is worth less than 150 points. Isn't it a bit weird?
nimh
Posts: 46
Joined: Sun Nov 30, 2014 12:06 am

Re: a direct comparison of FIDE and CCRL rating systems

Post by nimh »

Why do you assume that the relationship must be linear? I chose power functions because they had the best fit. It's not like I arbitralily picked them.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: a direct comparison of FIDE and CCRL rating systems

Post by Laskos »

nimh wrote:Why do you asume that the relationship must be linear? I chose power functions because they had the best fit. It's not like I arbitralily picked them.
Generally, linear model has one parameter less. Maybe you might need several more data-points for errors-ELO fits? I agree that my rule of thumb linear model is probably an over-simplification. And I guess nowadays top engines have less than a factor of 0.70 for gains, maybe 0.40-0.50, therefore the linear model breaks. That's why I find your result interesting.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: a direct comparison of FIDE and CCRL rating systems

Post by Laskos »

JJJ wrote:So, the funny thing to do could be to ask a ~2800 elo to play normal chess with draw odds against Stockfish or Komodo. Draw = win for human.

Problem could be to get 10 games as minimum.
Maybe 5 games, draw odds, top GM always white, engine no book, tournament time control. But I doubt a GM success even in this case.
User avatar
Nathanael Russell
Posts: 803
Joined: Sat Jan 31, 2015 11:50 pm
Location: Philadelphia, USA

Re: a direct comparison of FIDE and CCRL rating systems

Post by Nathanael Russell »

Frank Quisinsky wrote:Hi Graham,

I think on i7 haswell 40 in 13/14 is more correct if I compare with my 40/10 haswell results the CCRL Ratings with 40 in 40.

Best
Frank

But one minute more or less ...
Not important.
If I use the following formula for CCRL combined with the benched elapsed time of 17 from Crafty v19.17:

T minutes / 40 moves repeated, where T = 40 * <elapsed seconds> / 48 = <elapsed seconds> / 1.2

CCRL 40/40: 40 * 17 = 680 / 48 = (14.16666666666667 / 1.2) = 11.80555555555556

I would equate 40 moves in 11 or 12 minutes.

I use 15 minutes because under load, the clock frequency dips 100MHz when all 4 cores are utilized.
Core_Engine_Tester_CCRL
Image
nimh
Posts: 46
Joined: Sun Nov 30, 2014 12:06 am

Re: a direct comparison of FIDE and CCRL rating systems

Post by nimh »

Laskos wrote:
nimh wrote:Why do you asume that the relationship must be linear? I chose power functions because they had the best fit. It's not like I arbitralily picked them.
Generally, linear model has one parameter less. Maybe you might need several more data-points for errors-ELO fits? I agree that my rule of thumb linear model is probably an over-simplification. And I guess nowadays top engines have less than a factor of 0.70 for gains, maybe 0.40-0.50, therefore the linear model breaks. That's why I find your result interesting.
I have compared FIDE and the accuracy of play on two more occasions.

http://www.chessanalysis.ee/summary450.pdf

http://www.chessanalysis.ee/Quality%20o ... suring.pdf

Both of them exhibit a kind of logarithmic relationship. Perhaps there are some games that exhibit a linear relationship between accuracy and either rating system, but chess certaintly isn't one of those.
Also, all the way from current top engines to perfect engine in FIDE ELO is worth less than 150 points. Isn't it a bit weird?
It indeed seems weird, but I think it can be explained by the fact that my CPU is not strong enough to provide a reliable analysis of play by entities stronger than it. Unfortunately technology is not yet developed enough to tell what accuracy is needed to play at the level of 3500 FIDE and stronger.

The rule is that the whole package of CPU, engine and time per move used in analysis must absolutely surpass those of entities analyzed. That's why one cannot analyze contemporary correspondence games yet.