a direct comparison of FIDE and CCRL rating systems

nimh · Post by **nimh** » Tue Feb 23, 2016 9:26 am

Graham Banks wrote:
nimh wrote:The hardware that is used in creating CCRL lists is outdated by todays's standards, and time controls are ca 3x shorter.
That is not the actual hardware that we use.
We just used that machine as our original benchmark to find the adapted time controls to use on our computers.

For example, Nathanael's overclocked i7 Haswell quad uses 40/15, whereas an older Q6600 uses 40/32.
CCRL 40/40 equates to around 40/18 on modern hardware.

Hope that explains things.

Thanks, I see now that I misunderstood

so the speed difference is not 16x but rather 2.2x?

JJJ · Post by **JJJ** » Tue Feb 23, 2016 12:44 pm

Kai,
Let's assume the true strengh of Stockfish 7 is 3150.

What is the probabily of Magnus Carslen 2850 to win a game against it ?

Laskos · Post by **Laskos** » Tue Feb 23, 2016 1:41 pm

JJJ wrote:Kai,
Let's assume the true strengh of Stockfish 7 is 3150.

What is the probabily of Magnus Carslen 2850 to win a game against it ?

Well, if FIDE rating in this case denotes some strength relationship between a human and an engine, it is predicted that in 10 games Carlsen will get 1.5 points (say one win and one draw, or more probably 3 draws). That's a bit more than I would think. There are also some issues, for example, by this curve, it is predicted that a 2800 GM stands a better chance against Stockfish 7 than Rybka 2.1 on identical hardware. Somehow I doubt this. There are other issues too, on the low ELO, for example, but I will post later.

JJJ · Post by **JJJ** » Tue Feb 23, 2016 3:48 pm

So, the funny thing to do could be to ask a ~2800 elo to play normal chess with draw odds against Stockfish or Komodo. Draw = win for human.

Problem could be to get 10 games as minimum.

Laskos · Post by **Laskos** » Tue Feb 23, 2016 3:55 pm

I find your result interesting and intriguing. Up to now I have used the rule of thumb "CCRL computer ELO point is about 0.70 FIDE human ELO point". With CCRL and FIDE intersecting at the rating of 2800 for both. I plot here your result with my "rule of thumb" linear model:

We see the larger differences appear at CCRL ELO above 3500 and below 1800. And the models are pretty close otherwise. In the PDF you fit the "errors" with some power functions of ELO. Maybe if you do the fit with linear a*x+b, our results will look more similar? It is not clear from your plots that linear fit is so badly rejected, at least for FIDE ELO, although you might have observed a curvature there. I am a bit worried about your tails, for example a 2 ply engine is most likely below CCRL 1500, but most likely above FIDE 1000. Your plot seems to contradict this. Also, all the way from current top engines to perfect engine in FIDE ELO is worth less than 150 points. Isn't it a bit weird?

nimh · Post by **nimh** » Tue Feb 23, 2016 4:07 pm

Why do you assume that the relationship must be linear? I chose power functions because they had the best fit. It's not like I arbitralily picked them.

Laskos · Post by **Laskos** » Tue Feb 23, 2016 4:16 pm

nimh wrote:Why do you asume that the relationship must be linear? I chose power functions because they had the best fit. It's not like I arbitralily picked them.

Generally, linear model has one parameter less. Maybe you might need several more data-points for errors-ELO fits? I agree that my rule of thumb linear model is probably an over-simplification. And I guess nowadays top engines have less than a factor of 0.70 for gains, maybe 0.40-0.50, therefore the linear model breaks. That's why I find your result interesting.

Laskos · Post by **Laskos** » Tue Feb 23, 2016 5:46 pm

JJJ wrote:So, the funny thing to do could be to ask a ~2800 elo to play normal chess with draw odds against Stockfish or Komodo. Draw = win for human.

Problem could be to get 10 games as minimum.

Maybe 5 games, draw odds, top GM always white, engine no book, tournament time control. But I doubt a GM success even in this case.

Nathanael Russell · Post by **Nathanael Russell** » Tue Feb 23, 2016 6:01 pm

Frank Quisinsky wrote:Hi Graham,

I think on i7 haswell 40 in 13/14 is more correct if I compare with my 40/10 haswell results the CCRL Ratings with 40 in 40.

Best
Frank

But one minute more or less ...
Not important.

If I use the following formula for CCRL combined with the benched elapsed time of 17 from Crafty v19.17:

T minutes / 40 moves repeated, where T = 40 * <elapsed seconds> / 48 = <elapsed seconds> / 1.2

CCRL 40/40: 40 * 17 = 680 / 48 = (14.16666666666667 / 1.2) = 11.80555555555556

I would equate 40 moves in 11 or 12 minutes.

I use 15 minutes because under load, the clock frequency dips 100MHz when all 4 cores are utilized.

nimh · Post by **nimh** » Tue Feb 23, 2016 7:19 pm

Laskos wrote:
nimh wrote:Why do you asume that the relationship must be linear? I chose power functions because they had the best fit. It's not like I arbitralily picked them.
Generally, linear model has one parameter less. Maybe you might need several more data-points for errors-ELO fits? I agree that my rule of thumb linear model is probably an over-simplification. And I guess nowadays top engines have less than a factor of 0.70 for gains, maybe 0.40-0.50, therefore the linear model breaks. That's why I find your result interesting.

I have compared FIDE and the accuracy of play on two more occasions.

http://www.chessanalysis.ee/summary450.pdf

http://www.chessanalysis.ee/Quality%20o ... suring.pdf

Both of them exhibit a kind of logarithmic relationship. Perhaps there are some games that exhibit a linear relationship between accuracy and either rating system, but chess certaintly isn't one of those.

Also, all the way from current top engines to perfect engine in FIDE ELO is worth less than 150 points. Isn't it a bit weird?

It indeed seems weird, but I think it can be explained by the fact that my CPU is not strong enough to provide a reliable analysis of play by entities stronger than it. Unfortunately technology is not yet developed enough to tell what accuracy is needed to play at the level of 3500 FIDE and stronger.

The rule is that the whole package of CPU, engine and time per move used in analysis must absolutely surpass those of entities analyzed. That's why one cannot analyze contemporary correspondence games yet.

a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems