CCRL scaling versus human player

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Chessqueen
Posts: 5576
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: CCRL scaling versus human player

Post by Chessqueen »

Sven wrote: Sun Jun 23, 2019 10:08 pm
Chessqueen wrote: Sun Jun 23, 2019 8:09 pm Here is the top endgame player of all time Carlsen blundering a 5 pieces endgame ==> https://www.youtube.com/watch?v=5b-1u5XjaFU
This is OT of course but in the video I only see KRBPPP vs KRBPPP positions, that is 12 pieces. What did I miss?
In this game Alekhine, Alexander versus Bogoljubow, Efim blunder#6 there is a 4 pieces blunder check it on youtube. and on the game Bronstein, Davis vs Bovinnik Mikhail where Bronstein lost blunder#4 a simple endgame with plenty time left on the Clock remember those tournament where played with much much longer time control then nowadays ==> https://www.youtube.com/watch?v=-3vLdHSuASE
Do NOT worry and be happy, we all live a short life :roll:
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL scaling versus human player

Post by lkaufman »

Chessqueen wrote: Sun Jun 23, 2019 8:09 pm
lkaufman wrote: Fri Jun 21, 2019 8:32 pm
jorose wrote: Thu Jun 20, 2019 10:46 pm I feel it is conservative to estimate 2800 CCRL at only 2800 FIDE. That being said there are a lot of factors that complicate matters.

Humans memorize openings, this means they will play far stronger than their ratings in the opening, until they are out of book. I don't know if the context you are thinking of allows you to handle opening books, but to smooth things out I would consider adding an opening book. The opening book could be based on human games of the specified rating.

Engine strength is hardware dependent. You may want to consider trying to come up with ways to normalize engine strength across different hardware. I would bet on Carlsen against Minic on a Raspberry PI, but I would bet on Minic against Carlsen on TCEC hardware.
The CCRL list is indeed conservative by FIDE standards. Let's assume the engine gets a good book which tries to get out of theory early when it doesn't cost too much. Assume also the hardware specified as standard by CCRL, which is way below current PC speeds. Note that number of threads is not an issue as they rate 1 and 4 threads separately. Let's use CCRL 40/40 as it is obviously more relevant for tournament chess than 40/4. In 1998 Junior 5 earned a 2700 FIDE result in 9 games vs. top players, and while it is too old to even appear on CCRL, by extrapolation it would surely not be higher than 2600 on this list. But the hardware in 1998 was way below even the modest hardware specified by CCRL. The same conclusion would be reached by looking at the various matches from around 2003 or the Kramnik vs Fritz match of 2006; even a 2700 engine on CCRL would get a FIDE rating above 2800 with the above assumptions. Note that Deep Fritz 10, which beat Kramnik in that match, is at 2830 on the list, but it gave several handicaps to Kramnik such as letting him see the engine's book during the game (!), giving him a copy of the engine to practice against for months, limiting TBs to 5 man, etc. On the CCRL specified hardware with no special conditions like those, I believe that Fritz 10 on 4 cpus would easily defeat Carlsen in a match today.
One other point: CCRL uses Bayeselo which contracts the ratings considerably from normal elo, so although I totally agree with Kai about scaling engine results down to 70%, this is really correct just for CEGT. For CCRL much of the contraction is already done by bayeselo, so maybe 85% or so might be the right figure for CCRL 40/40. Of course blitz results are more spread out, so maybe 70% is actually about right for CCRL 40/4.

You wrote limiting TBs to 5 man, etc. But how how many GMs play perfectly all the positions with 5 man ?
Here is the top endgame player of all time Carlsen blundering a 5 pieces endgame ==> https://www.youtube.com/watch?v=5b-1u5XjaFU
I don't see how that relates to the topic. Computers can play openings and endgames perfectly by lookup, just as humans can do by using their memories, only computers are better at it. Limiting the use of memory for either openings or endgames by computers is fine as a type of handicap, but when it's done let's call it that, not pretend that having a better memory is somehow unfair. Computers and humans think very differently, and there is no way to even come close to equal conditions for the two.
Komodo rules!
Chessqueen
Posts: 5576
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: CCRL scaling versus human player

Post by Chessqueen »

lkaufman wrote: Mon Jun 24, 2019 3:17 am
Chessqueen wrote: Sun Jun 23, 2019 8:09 pm
lkaufman wrote: Fri Jun 21, 2019 8:32 pm
jorose wrote: Thu Jun 20, 2019 10:46 pm I feel it is conservative to estimate 2800 CCRL at only 2800 FIDE. That being said there are a lot of factors that complicate matters.

Humans memorize openings, this means they will play far stronger than their ratings in the opening, until they are out of book. I don't know if the context you are thinking of allows you to handle opening books, but to smooth things out I would consider adding an opening book. The opening book could be based on human games of the specified rating.

Engine strength is hardware dependent. You may want to consider trying to come up with ways to normalize engine strength across different hardware. I would bet on Carlsen against Minic on a Raspberry PI, but I would bet on Minic against Carlsen on TCEC hardware.
The CCRL list is indeed conservative by FIDE standards. Let's assume the engine gets a good book which tries to get out of theory early when it doesn't cost too much. Assume also the hardware specified as standard by CCRL, which is way below current PC speeds. Note that number of threads is not an issue as they rate 1 and 4 threads separately. Let's use CCRL 40/40 as it is obviously more relevant for tournament chess than 40/4. In 1998 Junior 5 earned a 2700 FIDE result in 9 games vs. top players, and while it is too old to even appear on CCRL, by extrapolation it would surely not be higher than 2600 on this list. But the hardware in 1998 was way below even the modest hardware specified by CCRL. The same conclusion would be reached by looking at the various matches from around 2003 or the Kramnik vs Fritz match of 2006; even a 2700 engine on CCRL would get a FIDE rating above 2800 with the above assumptions. Note that Deep Fritz 10, which beat Kramnik in that match, is at 2830 on the list, but it gave several handicaps to Kramnik such as letting him see the engine's book during the game (!), giving him a copy of the engine to practice against for months, limiting TBs to 5 man, etc. On the CCRL specified hardware with no special conditions like those, I believe that Fritz 10 on 4 cpus would easily defeat Carlsen in a match today.
One other point: CCRL uses Bayeselo which contracts the ratings considerably from normal elo, so although I totally agree with Kai about scaling engine results down to 70%, this is really correct just for CEGT. For CCRL much of the contraction is already done by bayeselo, so maybe 85% or so might be the right figure for CCRL 40/40. Of course blitz results are more spread out, so maybe 70% is actually about right for CCRL 40/4.

You wrote limiting TBs to 5 man, etc. But how how many GMs play perfectly all the positions with 5 man ?
Here is the top endgame player of all time Carlsen blundering a 5 pieces endgame ==> https://www.youtube.com/watch?v=5b-1u5XjaFU
I don't see how that relates to the topic. Computers can play openings and endgames perfectly by lookup, just as humans can do by using their memories, only computers are better at it. Limiting the use of memory for either openings or endgames by computers is fine as a type of handicap, but when it's done let's call it that, not pretend that having a better memory is somehow unfair. Computers and humans think very differently, and there is no way to even come close to equal conditions for the two.
What I meant to say is that by limiting the computer to TBs of only 5 man, is not a handicap to the computer since top GMs had blundered several times with position that are less than 6 or 8 pieces on the board. Therefore, when you provide the TBs 5 man to the computer the human are the one that have a disadvantage of blundering not the computer. We know that computers and humans thinks differently most of the time humans rely on knowledge and instinct whereas the computer follow the algorithm that were provided with limited knowledge compared to a top GM but the computer always think much much deeply into any line and that is where the computer gets their advantage over humans not by having more knowledge.

PS: In this game Alekhine, Alexander versus Bogoljubow, Efim blundered position#6 there is a 4 pieces blunder, and on the game Bronstein, Davis vs Bovinnik Mikhail where Bronstein lost blunder#4 a simple endgame with plenty time left on the Clock remember those tournaments and championships where played with much much longer time control then nowadays ==> https://www.youtube.com/watch?v=-3vLdHSuASE
Do NOT worry and be happy, we all live a short life :roll:
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: CCRL scaling versus human player

Post by Sven »

Chessqueen wrote: Mon Jun 24, 2019 1:53 am
Sven wrote: Sun Jun 23, 2019 10:08 pm
Chessqueen wrote: Sun Jun 23, 2019 8:09 pm Here is the top endgame player of all time Carlsen blundering a 5 pieces endgame ==> https://www.youtube.com/watch?v=5b-1u5XjaFU
This is OT of course but in the video I only see KRBPPP vs KRBPPP positions, that is 12 pieces. What did I miss?
You missed the moment when Carlsen blundered which was at the end of the video with 5 pieces :shock:
For me the video ends after 7:09 minutes. If you are talking about the position where Carlsen played Ba3-c1 (in the video at 5:22) and Mamedyarov replied e6-e7 then I suggest that you try to count the pieces on the board. If you are talking about a different position with less pieces then it must be another game, please post the correct link in that case.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL scaling versus human player

Post by lkaufman »

Chessqueen wrote: Mon Jun 24, 2019 6:29 am
lkaufman wrote: Mon Jun 24, 2019 3:17 am
Chessqueen wrote: Sun Jun 23, 2019 8:09 pm
lkaufman wrote: Fri Jun 21, 2019 8:32 pm
jorose wrote: Thu Jun 20, 2019 10:46 pm I feel it is conservative to estimate 2800 CCRL at only 2800 FIDE. That being said there are a lot of factors that complicate matters.

Humans memorize openings, this means they will play far stronger than their ratings in the opening, until they are out of book. I don't know if the context you are thinking of allows you to handle opening books, but to smooth things out I would consider adding an opening book. The opening book could be based on human games of the specified rating.

Engine strength is hardware dependent. You may want to consider trying to come up with ways to normalize engine strength across different hardware. I would bet on Carlsen against Minic on a Raspberry PI, but I would bet on Minic against Carlsen on TCEC hardware.
The CCRL list is indeed conservative by FIDE standards. Let's assume the engine gets a good book which tries to get out of theory early when it doesn't cost too much. Assume also the hardware specified as standard by CCRL, which is way below current PC speeds. Note that number of threads is not an issue as they rate 1 and 4 threads separately. Let's use CCRL 40/40 as it is obviously more relevant for tournament chess than 40/4. In 1998 Junior 5 earned a 2700 FIDE result in 9 games vs. top players, and while it is too old to even appear on CCRL, by extrapolation it would surely not be higher than 2600 on this list. But the hardware in 1998 was way below even the modest hardware specified by CCRL. The same conclusion would be reached by looking at the various matches from around 2003 or the Kramnik vs Fritz match of 2006; even a 2700 engine on CCRL would get a FIDE rating above 2800 with the above assumptions. Note that Deep Fritz 10, which beat Kramnik in that match, is at 2830 on the list, but it gave several handicaps to Kramnik such as letting him see the engine's book during the game (!), giving him a copy of the engine to practice against for months, limiting TBs to 5 man, etc. On the CCRL specified hardware with no special conditions like those, I believe that Fritz 10 on 4 cpus would easily defeat Carlsen in a match today.
One other point: CCRL uses Bayeselo which contracts the ratings considerably from normal elo, so although I totally agree with Kai about scaling engine results down to 70%, this is really correct just for CEGT. For CCRL much of the contraction is already done by bayeselo, so maybe 85% or so might be the right figure for CCRL 40/40. Of course blitz results are more spread out, so maybe 70% is actually about right for CCRL 40/4.

You wrote limiting TBs to 5 man, etc. But how how many GMs play perfectly all the positions with 5 man ?
Here is the top endgame player of all time Carlsen blundering a 5 pieces endgame ==> https://www.youtube.com/watch?v=5b-1u5XjaFU
I don't see how that relates to the topic. Computers can play openings and endgames perfectly by lookup, just as humans can do by using their memories, only computers are better at it. Limiting the use of memory for either openings or endgames by computers is fine as a type of handicap, but when it's done let's call it that, not pretend that having a better memory is somehow unfair. Computers and humans think very differently, and there is no way to even come close to equal conditions for the two.
What I meant to say is that by limiting the computer to TBs of only 5 man, is not a handicap to the computer since top GMs had blundered several times with position that are less than 6 or 8 pieces on the board. Therefore, when you provide the TBs 5 man to the computer the human are the one that have a disadvantage of blundering not the computer. We know that computers and humans thinks differently most of the time humans rely on knowledge and instinct whereas the computer follow the algorithm that were provided with limited knowledge compared to a top GM but the computer always think much much deeply into any line and that is where the computer gets their advantage over humans not by having more knowledge.

PS: In this game Alekhine, Alexander versus Bogoljubow, Efim blundered position#6 there is a 4 pieces blunder, and on the game Bronstein, Davis vs Bovinnik Mikhail where Bronstein lost blunder#4 a simple endgame with plenty time left on the Clock remember those tournaments and championships where played with much much longer time control then nowadays ==> https://www.youtube.com/watch?v=-3vLdHSuASE
It's exactly the same situation in the opening; humans can't remember everything in a huge book, computers can. Either we let the computers use their superior calculating and memory abilities, or we handicap them by limiting time, memory, etc. Limiting the computer's use of its memory to some estimate of what a human can remember is handicapping the computer. Of course I'm the number 1 exponent of handicapped matches between engines and humans, I just didn't like that Fritz was handicapped against Kramnik in ways that other engines were not previously handicapped against Kasparov or Kramnik, without acknowledgement that it was in effect a handicapped match.
Komodo rules!
Fritz 0
Posts: 145
Joined: Fri Mar 11, 2022 12:10 pm
Full name: Branislav Đošić

Re: CCRL scaling versus human player

Post by Fritz 0 »

lkaufman wrote: Sun Jun 23, 2019 6:23 pm
xr_a_y wrote: Sun Jun 23, 2019 1:01 pm
lkaufman wrote: Sat Jun 22, 2019 7:37 pm
xr_a_y wrote: Thu Jun 20, 2019 9:52 am I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.
One easy way to do this: chess.com has established blitz ratings for all the Komodo levels, and I think their blitz ratings are not too far away from normal human ratings. So you could just run 40/4 games of the engines you are interested in against appropriate Komodo levels to determine human blitz ratings for them. I would avoid the top few levels, as their chess.com ratings are deflated by cheating (yes, players use computers to cheat to beat computers!!), but the first 15 or so levels are not affected by this, cheaters challenge only the top few levels. Note also that these Komodo levels (up thru 19) don't use noticeable time, so their rating is absolute, not based on time control. But for a 40/4 list, no problem.
That would be helpul indeed !
Is there an UCI option in the free Komodo 10 in order to specify this "level" ?
Also do you have a link to a page with the komodo level versus elo rating table, I couldn't find it by myself yesterday.
I think that the levels were introduced in the Komodo 11.xx series. Komodo 11 isn't free but is very cheap so if you buy that and send us an email requesting the version with the levels, we'll do that for you. The rough estimate for the chess.com ratings is 700 + level * 100. To get the latest current ratings, go to chess.com, then try to play a game against a computer, and you should see all the levels with their blitz ratings displayed. But the top few levels are better than the displayed (or calculated) ratings. Level 19 scored overwhelmingly in 5' + 2" games against GMs except for 2 of the world's top 3 blitz players (Nakamura and MVL) and Level 18 made a healthy plus score against same, so perhaps by FIDE blitz standards they would be about 2850 and 2750 respectively. Level 20 is full Komodo strength but on just one thread, no ponder, no TBS, and limited opening book. Nakamura scored one out of about 20 games with it, so it would be about 3400 FIDE Blitz.
Larry, is it possible to get the version of Komodo 11 with the levels? I would like to test them against the levels of Komodo 12, 13, 14 and Dragon. I'll pay for it, of course.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL scaling versus human player

Post by lkaufman »

Fritz 0 wrote: Sat Apr 16, 2022 10:16 am
lkaufman wrote: Sun Jun 23, 2019 6:23 pm
xr_a_y wrote: Sun Jun 23, 2019 1:01 pm
lkaufman wrote: Sat Jun 22, 2019 7:37 pm
xr_a_y wrote: Thu Jun 20, 2019 9:52 am I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.
One easy way to do this: chess.com has established blitz ratings for all the Komodo levels, and I think their blitz ratings are not too far away from normal human ratings. So you could just run 40/4 games of the engines you are interested in against appropriate Komodo levels to determine human blitz ratings for them. I would avoid the top few levels, as their chess.com ratings are deflated by cheating (yes, players use computers to cheat to beat computers!!), but the first 15 or so levels are not affected by this, cheaters challenge only the top few levels. Note also that these Komodo levels (up thru 19) don't use noticeable time, so their rating is absolute, not based on time control. But for a 40/4 list, no problem.
That would be helpul indeed !
Is there an UCI option in the free Komodo 10 in order to specify this "level" ?
Also do you have a link to a page with the komodo level versus elo rating table, I couldn't find it by myself yesterday.
I think that the levels were introduced in the Komodo 11.xx series. Komodo 11 isn't free but is very cheap so if you buy that and send us an email requesting the version with the levels, we'll do that for you. The rough estimate for the chess.com ratings is 700 + level * 100. To get the latest current ratings, go to chess.com, then try to play a game against a computer, and you should see all the levels with their blitz ratings displayed. But the top few levels are better than the displayed (or calculated) ratings. Level 19 scored overwhelmingly in 5' + 2" games against GMs except for 2 of the world's top 3 blitz players (Nakamura and MVL) and Level 18 made a healthy plus score against same, so perhaps by FIDE blitz standards they would be about 2850 and 2750 respectively. Level 20 is full Komodo strength but on just one thread, no ponder, no TBS, and limited opening book. Nakamura scored one out of about 20 games with it, so it would be about 3400 FIDE Blitz.
Larry, is it possible to get the version of Komodo 11 with the levels? I would like to test them against the levels of Komodo 12, 13, 14 and Dragon. I'll pay for it, of course.
I don't recall which version of 11.x introduced the levels, if it was near the end, as I suspect, it's probably too close to Komodo 12 to be worth testing separately. If someone identifies the version number and you still want it, I can see if we can get it to you. But since we give out Komodo 12 free, measuring the levels on it would be much more interesting than on a version no one cares about now.
Komodo rules!
Fritz 0
Posts: 145
Joined: Fri Mar 11, 2022 12:10 pm
Full name: Branislav Đošić

Re: CCRL scaling versus human player

Post by Fritz 0 »

I found it - it's Komodo 11.3. I would be thankful to have it, for collector's reasons if nothing else.
Fritz 0
Posts: 145
Joined: Fri Mar 11, 2022 12:10 pm
Full name: Branislav Đošić

Re: CCRL scaling versus human player

Post by Fritz 0 »

lkaufman wrote: Sat Apr 16, 2022 5:53 pm
Fritz 0 wrote: Sat Apr 16, 2022 10:16 am
lkaufman wrote: Sun Jun 23, 2019 6:23 pm
xr_a_y wrote: Sun Jun 23, 2019 1:01 pm
lkaufman wrote: Sat Jun 22, 2019 7:37 pm
xr_a_y wrote: Thu Jun 20, 2019 9:52 am I guess this is already a well documented subject but let me ask it my way.

I wonder if some kind (very) good chess player would accept to play some games against middle range engines (2200-2700 CCRL elo rating) at CCRL 40/4 TC in order to rescale CCRL rating to FIDE rating.
One easy way to do this: chess.com has established blitz ratings for all the Komodo levels, and I think their blitz ratings are not too far away from normal human ratings. So you could just run 40/4 games of the engines you are interested in against appropriate Komodo levels to determine human blitz ratings for them. I would avoid the top few levels, as their chess.com ratings are deflated by cheating (yes, players use computers to cheat to beat computers!!), but the first 15 or so levels are not affected by this, cheaters challenge only the top few levels. Note also that these Komodo levels (up thru 19) don't use noticeable time, so their rating is absolute, not based on time control. But for a 40/4 list, no problem.
That would be helpul indeed !
Is there an UCI option in the free Komodo 10 in order to specify this "level" ?
Also do you have a link to a page with the komodo level versus elo rating table, I couldn't find it by myself yesterday.
I think that the levels were introduced in the Komodo 11.xx series. Komodo 11 isn't free but is very cheap so if you buy that and send us an email requesting the version with the levels, we'll do that for you. The rough estimate for the chess.com ratings is 700 + level * 100. To get the latest current ratings, go to chess.com, then try to play a game against a computer, and you should see all the levels with their blitz ratings displayed. But the top few levels are better than the displayed (or calculated) ratings. Level 19 scored overwhelmingly in 5' + 2" games against GMs except for 2 of the world's top 3 blitz players (Nakamura and MVL) and Level 18 made a healthy plus score against same, so perhaps by FIDE blitz standards they would be about 2850 and 2750 respectively. Level 20 is full Komodo strength but on just one thread, no ponder, no TBS, and limited opening book. Nakamura scored one out of about 20 games with it, so it would be about 3400 FIDE Blitz.
Larry, is it possible to get the version of Komodo 11 with the levels? I would like to test them against the levels of Komodo 12, 13, 14 and Dragon. I'll pay for it, of course.
I don't recall which version of 11.x introduced the levels, if it was near the end, as I suspect, it's probably too close to Komodo 12 to be worth testing separately. If someone identifies the version number and you still want it, I can see if we can get it to you. But since we give out Komodo 12 free, measuring the levels on it would be much more interesting than on a version no one cares about now.
I found it - it's Komodo 11.3. I would be thankful to get it, for collection reasons if nothing else.