a direct comparison of FIDE and CCRL rating systems

Laskos · Post by **Laskos** » Wed Feb 24, 2016 11:56 am

I extracted the data-point values from the PDF, and performed goodness of the fit chi-square test. The main problem is there seem to be too few experimental data points. The linear regression was mildly rejected, and more complicated power fit mildly failed to be rejected, although all were unreliable and borderline. You simply need maybe twice the number of data points, and of not too scattered ones.

Blue curve and dots is your fit
Green line is linear fit - mildly rejected (chi-square)
Orange curve is non-linear fit - mildly "accepted" (chi-square)
Red line is my rule of thumb

Black rectangle - range of experimental data (from the PDF)

It's pretty clear that we are dealing with overfitting, and extrapolating to outside the rectangle is a useless endeavor. Your methodology is sound, but more data-points are needed on a wider range to be able to make some extrapolations. For now I will stick with my rule of thumb:

FIDE rating = 0.70*CCRL rating + 840 ELO points

Guenther · Post by **Guenther** » Wed Feb 24, 2016 3:17 pm

nimh wrote:Actual results of course differ from theoretical results, because humans have the ability to use anti-computer strategy. But it's hard to say how big the effect is, because we have 3 unknown variables.

1) human's skill in employing anti-computer strategy. Positional players and those who are more experienced with playing against engines have advantage.
2) engine's susceptibility to such strategy. It seems to me that search depth is the most important factor. Which means, keeping time and hardware equal, better search algorithms with smaller branching factor can better cope with it.
3) the extent a certain positions allow that to be used. In open positions it virtually doesn't matter how you play; it is outcalculates you, but closed positions offer plenty of opportunities to build up your position without the engine having a clue before it's too late.

As time passes and advances in the search function and hardware, the gap between theoretical results and actual results diminishes.

Moreover FIDE players under 1500 are extremely weak and should be weaker than their equivalent in CCRL not vice versa as the table shows.
What makes you think so?

I collect games for my chess club from all our teams down to very low
leagues for decades now and those players blunder wildly in tactical positions not only under time pressure.
Moreover I know a lot of 'weak' programs from my former RWBC activities and involvement in several CC projects.

A 7 ply search often would be enough to get a huge advantage already.
I highly doubt a 1200 FIDE player stands a chance against a 1600 CCRL rated standard chess program, but your table suggests so.
(e.g. good old Storm or Zotron - from the 40/4 list)

BTW it's already hard to find such weak players ;-) It seems we have none under 1600 if having an Elo at all...
Lower ones will only happen to exist in the future when the permanent
lowering of the Elo entry point reaches the masses.

Guenther

Graham Banks · Post by **Graham Banks** » Wed Feb 24, 2016 7:04 pm

drj4759 wrote:Very well then, but it would be proper to put it in the header where it is very visible.

Your method of calculating the time control based on a specific computer (Athlon 64 X2) will easily become obsolete and irrelevant. Not all people have an Athlon 64 X2 and there is no way that your system of time control calculation will become standard or adopted by most rating list producers.
Someday there will be no more Athlon and those that are new with rating list will wonder what an Athlon is.

Perhaps you should have a read through this:
http://kirill-kryukov.com/chess/discuss ... f=7&t=1486

Chris Tatham · Post by **Chris Tatham** » Wed Feb 24, 2016 9:41 pm

I highly doubt a 1200 FIDE player stands a chance against a 1600 CCRL rated standard chess program, but your table suggests so.

Any time I have for this hobby these days tends to be focused on the weaker engines and from my experience I would agree with Guenther.

My last official elo was 1456 some years ago and I am certainly lower than that now. I have played many games at 40/45 time control with my own engine which is rated at 1441 on CCRL 40/4 list (and earlier versions rated as low as 1325). I know my engine's many weaknesses (it has no strengths!) which is certainly an advantage and my score is only 28% against it.

Whilst not FIDE, after several thousand games against humans on FICS the engine has a consistent rating of between 2000-2100 across bullet to standard game categories which is higher than 90% of players.

nimh · Post by **nimh** » Thu Feb 25, 2016 12:08 am

Kai, why such obsession with linear relationships?

Given how different engines and humans are, plus the fact that chess is more suitable for engines makes, in my view, it quite improbable that there is a perfectly linear relationship between the two rating systems. Naturally, the same applies to rating vs quality of play relationships. I'd like, too, have a wider range, but there simply are not enough games below 1700 and above 2800 FIDE where both players are of equal strength. Therefore I must use extrapolation, a common and accepted technique in statistics.

I appreciate that you think my methodology is sound.

leagues for decades now and those players blunder wildly in tactical positions not only under time pressure.

Even GMs blunder under such conditions, the difference merely lies in the frequency. But chess is more than just wild positions. It may seem unlikey to you, but even 1200-rated players can make accurate moves. Also, I don't think they don't exist, all you need is to score 10% and lower against 1600-players.

Using blunders as a gauge of the level of play is the equivalent of using mountains to estimate the average altitude of a whole country. It won't indicate anything.

Chris Tatham wrote:My last official elo was 1456 some years ago and I am certainly lower than that now. I have played many games at 40/45 time control with my own engine which is rated at 1441 on CCRL 40/4 list (and earlier versions rated as low as 1325). I know my engine's many weaknesses (it has no strengths!) which is certainly an advantage and my score is only 28% against it.

Whilst not FIDE, after several thousand games against humans on FICS the engine has a consistent rating of between 2000-2100 across bullet to standard game categories which is higher than 90% of players.

This is interesting. Could you provide more information? Openings, hardware, whether you played aggressively or not?

lkaufman · Post by **lkaufman** » Thu Feb 25, 2016 5:35 am

I agree with Kai that although the method is reasonable, the resultant curve is not credible, perhaps as he says due to overfitting or to the insufficient strength of the "judge" as you point out. Rating differences based on computer vs. computer tests do overstate the differences they would have on the human scale, perhaps primarily because these engine lists use randomized opening books whereas humans do not; in general the weaker player often aims for drawish openings. Whatever the reason, a quarter century ago I advocated multiplying engine rating differences by 75%, and now Kai advocates 70%; not a big change for a quarter century of progress! Probably it does diminish with higher elos, but your formula implies that a hundred elo difference on engine lists at the top is only worth a dozen or so on human lists. This is clearly absurd.
Here is some relevant data. I played many matches of Rybka (from Rybka 2.3.2a to Rybka 3) with GMs 8 or 9 years ago with varying handicap conditions, and did so recently with Komodo. In round numbers, based on engine lists Komodo is maybe 300 elo stronger than the Rybkas used (maybe more if we allow for better hardware now), while the results against humans seem to be about 200 elo better, very close to Kai's 70% rule.
The effect of opening books on computer vs. human games is quite large. The main problem with playing normal chess games (with or without White or draw odds) against top players to determine engine elo, aside from the need for sponsors, is that the results depend hugely on the book. GM Joel Benjamin, then rated near 2600 FIDE, managed only two draws in eight games vs. Rybka (closer to 2.3.2a than to Rybka 3) despite White in every game, tournament level time limits, and getting paid same for draws as for wins. This is probably because I made a book to try to avoid easy drawing lines. With a normal book, where Black aims for a draw, he might have gotten several draws. I played a few test games recently with Komodo giving huge time odds to a 2600+ GM. With no book for Komodo and the GM getting the White pieces he got a big opening plus in the first game without thinking and easily held the draw. But when we started the game from the Exchange French (considered quite drawish) with each taking the White side once, Komodo won twice decisively. So I think that Carlsen could make some draws with White against Komodo if it has no book or a standard one that aims to draw as Black, but with a suitable anti-draw book he would rarely get a draw. Anyway, based on the Benjamin match with Rybka, it would have a rating of nearly FIDE 3000 based on the score and adjusting for Benjamin always getting White, using my book anyway. If you want datapoints at lower levels, using the SSDF list you can set ratings based on the Fidelity Mach III, Mach II, Par Excellence, and Novag Superconstellation, which got ratings based on 48 official tournament games (with money prizes) each. Those ratings were 2325, 2265, 2100, and 2018 respectivey, but they were USCF ratings so subtract 100 for FIDE equivalence. Then you can convert SSDF to CCRL ratings by comparing the lists. It's a lot of work, I don't have time to do it, but the data is there to provide some benchmarks for how to convert CCRL ratings to FIDE.

Guenther · Post by **Guenther** » Thu Feb 25, 2016 11:00 am

nimh wrote:
leagues for decades now and those players blunder wildly in tactical positions not only under time pressure.
Even GMs blunder under such conditions, the difference merely lies in the frequency. But chess is more than just wild positions. It may seem unlikey to you, but even 1200-rated players can make accurate moves. Also, I don't think they don't exist, all you need is to score 10% and lower against 1600-players. :) Using blunders as a gauge of the level of play is the equivalent of using mountains to estimate the average altitude of a whole country. It won't indicate anything.

You missread my quote above. I said blunder 'wildly', not blunder in wild positions.
It also seems we have different definitions of 'blunder'. For me this must not even be a full pawn equivalent. It can also be a permanent weakness,
like spoiling the pawnshield on Kside or allowing an outpost without need or whatever...
For those players (rated lower 1600 FIDE) every position involving more than one consecutive exchange can already be called tactical.
No sacrifices ore open Kings are needed.
The frequency is horrible. I have no idea why you don't believe me.
May I ask if you are a chess player too? ;-)

There are much too less data points in CCRL at least in 40/40 for programs under 2100 to make any comparisons to Human ratings.
(CCRL just started very late to add programs under 2400 in their scale)

Guenther · Post by **Guenther** » Thu Feb 25, 2016 11:26 am

Guenther wrote:
nimh wrote:
leagues for decades now and those players blunder wildly in tactical positions not only under time pressure.
Even GMs blunder under such conditions, the difference merely lies in the frequency. But chess is more than just wild positions. It may seem unlikey to you, but even 1200-rated players can make accurate moves. Also, I don't think they don't exist, all you need is to score 10% and lower against 1600-players. :) Using blunders as a gauge of the level of play is the equivalent of using mountains to estimate the average altitude of a whole country. It won't indicate anything.
You missread my quote above. I said blunder 'wildly', not blunder in wild positions.
It also seems we have different definitions of 'blunder'. For me this must not even be a full pawn equivalent. It can also be a permanent weakness,
like spoiling the pawnshield on Kside or allowing an outpost without need or whatever...
For those players (rated lower 1600 FIDE) every position involving more than one consecutive exchange can already be called tactical.
No sacrifices ore open Kings are needed.
The frequency is horrible. I have no idea why you don't believe me.
May I ask if you are a chess player too? ;-)

There are much too less data points in CCRL at least in 40/40 for programs under 2100 to make any comparisons to Human ratings.
(CCRL just started very late to add programs under 2400 in their scale)

BTW in that link you practically had it all the other way round for lower
CCRL ratings? ;-)
http://www.chessanalysis.ee/CCRL%20vs%20FIDE.pdf

Code: Select all

CCRL 40/40 FIDE 2008 40/90+30
3200 2910
3100 2900
3000 2900
2900 2880
2800 2870
2700 2860
2600 2850
2500 2830
2400 2810
2300 2780
2200 2760
2100 2730
2000 2690
1900 2620
1800 2540
1700 2430
1600 2280
1500 2080

Here you say 1500 CCRL is equivalent to 2080 FIDE...

In your newer(?) table you say now:

1500 CCRL ~ 624

Code: Select all

FIDE CCRL CCRL FIDE
3100 3375 3600 3130
3000 2915 3500 3118
2900 2646 3400 3104
2800 2461 3300 3088
2700 2324 3200 3069
2600 2216 3100 3048
2500 2127 3000 3024
2400 2053 2900 2996
2300 1989 2800 2963
2200 1934 2700 2924
2100 1885 2600 2878
2000 1841 2500 2824
1900 1802 2400 2759
1800 1767 2300 2680
1700 1734 2200 2584
1600 1704 2100 2466
1500 1677 2000 2318
1400 1651 1900 2132
1300 1628 1800 1894
1200 1605 1700 1584
1100 1585 1600 1175
1000 1565 1500 624

I must say both tables are way off in most parts of it.
There is simply not enough data there for doing such a comparison.

Guenther · Post by **Guenther** » Thu Feb 25, 2016 1:34 pm

BTW when I did some research today inspired by this thread and
because of some website work for my chess club I found the youngest
FIDE rated player.

Mikus,Mate (HUN) *2014 rated 1490! since 02/16

I thought what the hell?! and looked into his profile history,
according to it he played 2 tournaments already, when the first
one was in May 2014... assumed he was born in 01/14 he must have been
at max four months old when he got a win and three draws in his first
tournament! :-)

Well there must be some typo in the birth year...

https://ratings.fide.com/hist.phtml?event=784311
(IIRC you need to be registered for detailed infos on FIDE ratings,
but I think you can do an anonymously search for a name)

There are also around 15 players with a probably generic year of birth '1900'. But the real oldest one should be 'Gerhard Vogel' (GER) *1910.
It seems with some luck there are plenty of decades of chess to go.

Edit: Well some more research on the German chess federation revealed he regrettably died at the age of 100 in 2010.
http://www.schachbund.de/news/id-100jae ... orben.html
Shortly before his death he still played in a senior tournament.

Guenther

*If the post seems to OT despite mentioning in the title, please give it another place*

Uri Blass · Post by **Uri Blass** » Thu Feb 25, 2016 6:29 pm

Guenther wrote:
nimh wrote:Actual results of course differ from theoretical results, because humans have the ability to use anti-computer strategy. But it's hard to say how big the effect is, because we have 3 unknown variables.

1) human's skill in employing anti-computer strategy. Positional players and those who are more experienced with playing against engines have advantage.
2) engine's susceptibility to such strategy. It seems to me that search depth is the most important factor. Which means, keeping time and hardware equal, better search algorithms with smaller branching factor can better cope with it.
3) the extent a certain positions allow that to be used. In open positions it virtually doesn't matter how you play; it is outcalculates you, but closed positions offer plenty of opportunities to build up your position without the engine having a clue before it's too late.

As time passes and advances in the search function and hardware, the gap between theoretical results and actual results diminishes.

Moreover FIDE players under 1500 are extremely weak and should be weaker than their equivalent in CCRL not vice versa as the table shows.
What makes you think so?
I collect games for my chess club from all our teams down to very low
leagues for decades now and those players blunder wildly in tactical positions not only under time pressure.
Moreover I know a lot of 'weak' programs from my former RWBC activities and involvement in several CC projects.

A 7 ply search often would be enough to get a huge advantage already.
I highly doubt a 1200 FIDE player stands a chance against a 1600 CCRL rated standard chess program, but your table suggests so.
(e.g. good old Storm or Zotron - from the 40/4 list)

BTW it's already hard to find such weak players It seems we have none under 1600 if having an Elo at all...
Lower ones will only happen to exist in the future when the permanent
lowering of the Elo entry point reaches the masses.

Guenther

There are many players under 1600 fide elo and also many players under 1200 fide elo if you search in the right country.

India has 3345 active players with fide rating of 1000-1200 out of 10546 active players with fide rating

https://ratings.fide.com/advaction.phtm ... &line=desc

https://ratings.fide.com/advaction.phtm ... &line=desc

a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

Re: a direct comparison of FIDE and CCRL rating systems

FIDE ratings slightly OT

Re: a direct comparison of FIDE and CCRL rating systems