Carlsen vs. CCRL 2850 engines in Rapid?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by lkaufman »

Milos wrote: Fri Aug 20, 2021 4:20 am
lkaufman wrote: Wed Aug 18, 2021 11:11 pm
Chessqueen wrote: Wed Aug 18, 2021 9:14 pm
lkaufman wrote: Wed Aug 18, 2021 4:40 pm
Chessqueen wrote: Wed Aug 18, 2021 12:42 pm
lkaufman wrote: Tue Aug 17, 2021 1:05 am
mehmet123 wrote: Mon Aug 16, 2021 10:53 pm In 1999 Fritz 5.32 beats Judit Polgar with a 5.5-2.5 score at 30 minutes per game match. The performance of Fritz 5.32 at this match is 2814 elo.
Fritz 5.32 had played on a Pentium II/350 Mhz hardware. Fritz 5.32 is ~100 elo weaker than Fritz 8. The rating of Fritz 8 Bilbao is 2700 elo at CCRL 40/15 rating list.
In 1998 Rebel beat Anand at 2 semi-blitz games (15 minutes) match with a 1.5- 0.5 score (2986 elo performance). At 4 blitz games (5 min + 5 sec) Rebel beat Anand with a 3-1 score (2986 elo performance). Rebel had played on a K6-2 450 Mhz hardware. Anand had 2795 elo at 1998 July.
Fritz 5.32 at Pentium II/350 Mhz and Rebel K6-2 450 Mhz probably have a rating of around 2400 elo for CCRL (40/15).

Considering that Magnus Carlsen is the best player in history in rapid games a 2400-2450 elo chess engine has a low change to beat him.
But a chess engine around 2600 CCRL elo can beat Magnus Carlsen without much difficulty at rapid games.
The first result you mention is the most relevant since 5' + 10" Rapid is about like game in 25'. I think that a 2800 player today is a stronger Rapid player than a 2800 player of 20 years ago, due to getting so much more practice at fast play on the internet, but even so I agree with you that somewhere between 2450 and 2600 CCRL Rapid (let's say 2525, the middle) would be an equal opponent for Magnus in Rapid based on this historical evidence. But that's roughly Skill level 22 on Komodo Dragon 2, which was a close opponent for Jorge Sammour, whose FIDE is only 2458, four hundred elo below Carlsen! So I'm having a really tough time reconciling these facts. Obviously the crippled dragon plays quite differently than a full strength twenty year old engine, but it's not obvious why it would perform much worse vs. humans than an engine of equal strength based on direct play. Some mystery here....
At what time control you can give GM Nakamura a Knight Odds, can Komodo Dragon 2 beat him at at TC 10'+5" ?
Sorry meant to write TC 5'+10" Not the other way around :oops:

If you have been following our events, you should know that this is silly. We were about even giving GM Lenderman knight odds at time controls averaging around 6' + 1", and that was with draws counting as wins for Dragon. Lenderman is a pretty strong GM, but he's no Hikaru Nakamura. The fair time control for Nakamura at knight odds would be something like 2' + 1" or even 1' + 1". I believe that in a few years we'll be able to give knight odds to Nakamura (or another 2800+ FIDE Rapid player) at 15' + 10" rapid, but not by just improving the engine; it will require a breakthru in terms of setting problems for falllible, human opponents, not just playing to postpone losing.
Well I meant probably in 5 years from now with Computer 3 times stronger and much better chess algorithm, and probably at TC 5'+10", will it ever be possible ?
You wrote "Dragon 2", I guess you meant "Dragon 7" or so. We don't need faster computers or better chess algorithms to do this, we need to create a program specifically designed for the goal. I actually think I know how we could do it with current technology, but it would take some time to implement and more time to perfect. Not an easy task, but beating Stockfish is also not an easy task; not sure which is harder! By the way, based on games I've run so far at Rapid (15' + 10") tc against CCRL engines, the CCRL ratings for Dragon 2 Skill levels 21 thru 24 would be about 2070, 2331, 2556, and 2668 respectively. So since your friend Jorge appeared to be midway between levels 21 and 22 at that time control, it would seem that about 2200 CCRL is equal to about 2450 FIDE at that Rapid tc. This seems reasonably consistent with opinions expressed here. It's still a bit strange that Dragon can give knight odds to level 23 with fairly even results, but only performed (in 15 Rapid games vs. GM and IM humans) at 2460 FIDE, same as your friend Jorge is rated roughly.
I recently played ICCF game against a decent opponent (in terms of ranking, not in terms of actual understanding of the game) where I gave up a bishop for a pawn on move 8 by mistyping a move. The game dragged for almost a year and 100 moves before I finally lost it (I resigned on move 98 when SF eval went over +10, and on move 8 when I entered the wrong move it was +5).
Ofc against top human GM and FIDE TC I'd lose in probably 40 moves.
That is strange if we assume the opponent was using a top engine; perhaps he didn't bother to consult it anymore after winning the piece and was a rather weak player himself? (just a guess). I had a similar experience myself OTB in a classical tournament shortly before the pandemic; I fell into a very beautiful opening trap against a player rated above myself (over 2400 US Chess, which is like 2300 FIDE) and lost a knight for just a pawn, nothing more. I decided not to resign, played on and on, and somehow managed to reach rook vs rook and bishop, which I defended for the required fifty moves, claiming the draw on the move before I would have been mated or lost my rook!! But of course knight odds is much more than this, there being no pawn as compensation.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by lkaufman »

amanjpro wrote: Fri Aug 20, 2021 5:42 am I was analyzing Mamedyarov and Rapport game today at the Sinquefield cup, with my engine Zahak (dev branch, which is 50 elos stronger than Zahak 5. Zahak 5 has CCRL rating 2712)

I used two threads, 1024MB Hash. And My engine found all the best moves and avoided all the blunders that happened in the game all in less than 1 minute per move... I know it is a small sample, but it somehow tells me, that at least for analysis, my engine is way stronger than top GMs
Here is what I have so far. Jorge Sammour, 2458 FIDE, made an even score in five Rapid games (15' + 10") against Dragon 2 skill levels 21 and 22 (2 games with 21, three with 22). Based on 500 game matches with suitable CCRL engines, these levels would have CCRL Rapid ratings of 2107 and 2331, giving a weighted average for the five games of 2241. So based on this tiny sample of human games, 2241 CCRL = 2458 FIDE in Rapid. If we assume that 100 ccrl point difference = 75 human elo difference (my best guess), this means that human FIDE = 3/4 CCRL Rapid + 777. Zahak 5 is 2671 CCRL Rapid (you quote the blitz rating, we're talking Rapid here). Adding 50 for latest version and perhaps another 40 for two threads puts it about 2760 CCRL Rapid. My conversion formula would give 2070 + 777 = 2847 estimated FIDE rating, above all the players in the event, but not "way" above them, and not above Magnus Carlsen. But of course the five game sample could be misleading, or my estimate of 75 FIDE per 100 CCRL could be too low. Certainly top humans make more serious mistakes than a 2800 CCRL engine, but perhaps they play better moves on average when there are no tactics. I'd like to get more human data to refine my conversion formula. If you are right that my formula is unfair to the engines, is it because the low end is wrong or because the 75 per 100 assumption is too low? No way for me to tell right now. We need a match between your Zahak and Magnus Carlsen, but I doubt that you are in a position to make that happen!
Komodo rules!
Chessqueen
Posts: 5580
Joined: Wed Sep 05, 2018 2:16 am
Location: Moving
Full name: Jorge Picado

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by Chessqueen »

lkaufman wrote: Fri Aug 20, 2021 6:28 am
amanjpro wrote: Fri Aug 20, 2021 5:42 am I was analyzing Mamedyarov and Rapport game today at the Sinquefield cup, with my engine Zahak (dev branch, which is 50 elos stronger than Zahak 5. Zahak 5 has CCRL rating 2712)

I used two threads, 1024MB Hash. And My engine found all the best moves and avoided all the blunders that happened in the game all in less than 1 minute per move... I know it is a small sample, but it somehow tells me, that at least for analysis, my engine is way stronger than top GMs
Here is what I have so far. Jorge Sammour, 2458 FIDE, made an even score in five Rapid games (15' + 10") against Dragon 2 skill levels 21 and 22 (2 games with 21, three with 22). Based on 500 game matches with suitable CCRL engines, these levels would have CCRL Rapid ratings of 2107 and 2331, giving a weighted average for the five games of 2241. So based on this tiny sample of human games, 2241 CCRL = 2458 FIDE in Rapid. If we assume that 100 ccrl point difference = 75 human elo difference (my best guess), this means that human FIDE = 3/4 CCRL Rapid + 777. Zahak 5 is 2671 CCRL Rapid (you quote the blitz rating, we're talking Rapid here). Adding 50 for latest version and perhaps another 40 for two threads puts it about 2760 CCRL Rapid. My conversion formula would give 2070 + 777 = 2847 estimated FIDE rating, above all the players in the event, but not "way" above them, and not above Magnus Carlsen. But of course the five game sample could be misleading, or my estimate of 75 FIDE per 100 CCRL could be too low. Certainly top humans make more serious mistakes than a 2800 CCRL engine, but perhaps they play better moves on average when there are no tactics. I'd like to get more human data to refine my conversion formula. If you are right that my formula is unfair to the engines, is it because the low end is wrong or because the 75 per 100 assumption is too low? No way for me to tell right now. We need a match between your Zahak and Magnus Carlsen, but I doubt that you are in a position to make that happen!
Jorge reported yesterday that he played another 5 games (15' + 10") against Dragon 2 skill levels 22 and he was happy since he won 2, draw 2, and lost 1. He said that he kept pressuring to trade at every chance until Komodo was forced to trade in some position and he kept the pressure until the very end. I asked him again to send me the games and he told me that he would prefer to keep it for himself since he is learning how to benefit from playing again Komodo. He has not been an active player for a long time, he told me that the moment he realized that he was not going to be in the list of the best 100 player by the age of 26 he lost interest in competing. He is considering to compete once he reached a senior age in the U.S. Senior Championship, but that would be in another 7.5 years or so.
Do NOT worry and be happy, we all live a short life :roll:
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by lkaufman »

Chessqueen wrote: Sat Aug 21, 2021 3:02 pm
lkaufman wrote: Fri Aug 20, 2021 6:28 am
amanjpro wrote: Fri Aug 20, 2021 5:42 am I was analyzing Mamedyarov and Rapport game today at the Sinquefield cup, with my engine Zahak (dev branch, which is 50 elos stronger than Zahak 5. Zahak 5 has CCRL rating 2712)

I used two threads, 1024MB Hash. And My engine found all the best moves and avoided all the blunders that happened in the game all in less than 1 minute per move... I know it is a small sample, but it somehow tells me, that at least for analysis, my engine is way stronger than top GMs
Here is what I have so far. Jorge Sammour, 2458 FIDE, made an even score in five Rapid games (15' + 10") against Dragon 2 skill levels 21 and 22 (2 games with 21, three with 22). Based on 500 game matches with suitable CCRL engines, these levels would have CCRL Rapid ratings of 2107 and 2331, giving a weighted average for the five games of 2241. So based on this tiny sample of human games, 2241 CCRL = 2458 FIDE in Rapid. If we assume that 100 ccrl point difference = 75 human elo difference (my best guess), this means that human FIDE = 3/4 CCRL Rapid + 777. Zahak 5 is 2671 CCRL Rapid (you quote the blitz rating, we're talking Rapid here). Adding 50 for latest version and perhaps another 40 for two threads puts it about 2760 CCRL Rapid. My conversion formula would give 2070 + 777 = 2847 estimated FIDE rating, above all the players in the event, but not "way" above them, and not above Magnus Carlsen. But of course the five game sample could be misleading, or my estimate of 75 FIDE per 100 CCRL could be too low. Certainly top humans make more serious mistakes than a 2800 CCRL engine, but perhaps they play better moves on average when there are no tactics. I'd like to get more human data to refine my conversion formula. If you are right that my formula is unfair to the engines, is it because the low end is wrong or because the 75 per 100 assumption is too low? No way for me to tell right now. We need a match between your Zahak and Magnus Carlsen, but I doubt that you are in a position to make that happen!

So now he has an overall even score with Skill level 22, which is 2331 CCRL Rapid per my testing. I'm starting to think that there is no real need to contract the CCRL ratings to fit the human scale now; it is already somewhat contracted by use of BayesElo, and the human list has widened greatly over the years. GM Sam Shankland wrote in NIC that he scored better against 2200 players when he was rated 2400 (like 15 year ago) than he does now at 2700, because they just play much better now than then. So much of what we thought was contraction of ratings for humans was really just progress. The explanation is simple; thirty years ago 2200 was the lowest rating FIDE would award to a man (they allowed lower ratings for women, but there were too few playing against men to affect anything much), and even rather weak players could get 2200 just by having two short lucky tournaments with 2200 performances. Now with the minimum at 1000 kids have to climb a long way to reach 2200 and are almost always very strong players when they reach this level. So maybe all we have to do to convert CCRL Rapid Ratings to FIDE Rapid ratings is to add a constant. Based only on Jorge, that constant is (2458 -2331) = 127 elo points. My own experience is that Skill 21 is the closest match to me in Rapid (though I need to play more games with it to say who is favored), and since I have that at 2107 CCRL, 2107 + 127 = 2234 which I think is roughly what my current FIDE strength is (my actual FIDE rating is lower due to playing primarily underrated kids, but my USCF rating, which is more realistic, is 2311, which I think is roughly comparable to FIDE 2234 give or take a bit (the USCF-FIDE gap is somewhat under 100 elo, not sure just how much under). So this seems about right to me.
Jorge reported yesterday that he played another 5 games (15' + 10") against Dragon 2 skill levels 22 and he was happy since he won 2, draw 2, and lost 1. He said that he kept pressuring to trade at every chance until Komodo was forced to trade in some position and he kept the pressure until the very end. I asked him again to send me the games and he told me that he would prefer to keep it for himself since he is learning how to benefit from playing again Komodo. He has not been an active player for a long time, he told me that the moment he realized that he was not going to be in the list of the best 100 player by the age of 26 he lost interest in competing. He is considering to compete once he reached a senior age in the U.S. Senior Championship, but that would be in another 7.5 years or so.
Komodo rules!
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by Ferdy »

lkaufman wrote: Mon Aug 16, 2021 8:21 pm How would Magnus Carlsen (FIDE Classical and Rapid ratings both close to 2850) perform in a serious Rapid (15' + 10") match with engines rated around 2850 on the CCRL Rapid (40/15) rating list (assuming they used a good but varied modern opening book), and ran on the reference i7 computer used by that list? Some example engines running on just one thread include Deep Shredder 11, Stockfish 1.4 (not 14!), Minic 1.39, Fritz 11, Toga II 3.0, Naum 3, Gaviota 1.0, and Arasan 18.0. I realize that there is probably no data on this exactly, opinions must be based on results of engines against other top players with a bit of extrapolation. We know that the rating lists spread out the ratings relative to human scale, but I'm trying to determine at least whether this list is accurate relative to FIDE ratings of the best human player. What do you think?
Here is one method to estimate the perf of Magnus.
1. Get some positions where Magnus is to move from the rapid games.
2. Evaluate those positions with a strong engine like SF.
3. If engine best move is not the same with Magnus's move, calculate the score of Magnus's move using SF. Get the rating difference from the score difference. If move is the same error is zero and rating difference is zero.
4. Do the same to a 2850 CCRL engine say engine1. Let it analyze all those Magnus's positions. Use SF to get the rating difference.

At the start Magnus will take a start rating, if there is error as determined by SF, get the rating difference and update magnus rating for that position.
Example:
start rating = 3000
pos: 1, Magnus move: Qg5, Magnus move score: -158, SF move: 0-0, SF move score: -136, error: -136 - (-158) or 22, rating diff: 10
Magnus perf rating: 3000-10 or 2990
pos: 2, Magnus and SF moves are the same,
Magnus perf rating: 2990 (just take the last perf rating since rating diff is zero)
pos: 3 ...
...

After a game take Magnus's average perf rating.

Do the same for engine1. Since engine1 has no move yet, let it analyze the position from Magnus. Compare it with SF too and calculate the rating difference and finally the average rating.

You can now compare the average rating of Magnus and engine1 from the positions of Magnus.

I took 6 games from skilling open prelim where Magnus played, 2 wins, 2 loses and 2 draws. Use Cheese 2.2 around rapid CCRL 2850 as engine1. engine1 is set to analyze at 20s/pos around TC 40/15m CCRL hardware, single core.

Main engine is SF14 set at 3496 (CCRL 40/15). Magnus and Cheese start at 3496 and update the perf rating move by move.

Code: Select all

              name  games  rating
            Cheese      6    3341
   Carlsen, Magnus      6    3295
Calculate the expected score given the rating diff of 3341-3295 or 46.

Code: Select all

Expected scores:
   name  expscore
 Cheese     0.566
Carlsen     0.434

Code: Select all

scores in 12 games:
   name  score
 Cheese    7.0
Carlsen    5.0
Sample log:

Code: Select all

game: 4
pos: r1bq1rk1/pp1n1ppp/2pbpn2/3p4/2PP4/2N1PN2/PPQ1BPPP/R1B2RK1 b - - 6 8
main engine: Stockfish 14, bm: dxc4, score: -25, depth: 26
player: Carlsen, Magnus, bm: b6, score: -51
test engine: Cheese 2.2 64 bits, bm: e5, score: -53
test engine error: 28, player error: 26
Cheese 2.2 64 bits perf: 3484, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbpn2/3p4/2PPP3/2N2N2/PPQ1BPPP/R1B2RK1 b - - 0 9
main engine: Stockfish 14, bm: Nxe4, score: -44, depth: 32
player: Carlsen, Magnus, bm: Nxe4, score: -44
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -54
test engine error: 10, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbp3/3p4/2PPN3/5N2/PPQ1BPPP/R1B2RK1 b - - 0 10
main engine: Stockfish 14, bm: dxe4, score: -46, depth: 33
player: Carlsen, Magnus, bm: dxe4, score: -46
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -46
test engine error: 0, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by lkaufman »

Ferdy wrote: Sun Aug 22, 2021 1:01 am
lkaufman wrote: Mon Aug 16, 2021 8:21 pm How would Magnus Carlsen (FIDE Classical and Rapid ratings both close to 2850) perform in a serious Rapid (15' + 10") match with engines rated around 2850 on the CCRL Rapid (40/15) rating list (assuming they used a good but varied modern opening book), and ran on the reference i7 computer used by that list? Some example engines running on just one thread include Deep Shredder 11, Stockfish 1.4 (not 14!), Minic 1.39, Fritz 11, Toga II 3.0, Naum 3, Gaviota 1.0, and Arasan 18.0. I realize that there is probably no data on this exactly, opinions must be based on results of engines against other top players with a bit of extrapolation. We know that the rating lists spread out the ratings relative to human scale, but I'm trying to determine at least whether this list is accurate relative to FIDE ratings of the best human player. What do you think?
Here is one method to estimate the perf of Magnus.
1. Get some positions where Magnus is to move from the rapid games.
2. Evaluate those positions with a strong engine like SF.
3. If engine best move is not the same with Magnus's move, calculate the score of Magnus's move using SF. Get the rating difference from the score difference. If move is the same error is zero and rating difference is zero.
4. Do the same to a 2850 CCRL engine say engine1. Let it analyze all those Magnus's positions. Use SF to get the rating difference.

At the start Magnus will take a start rating, if there is error as determined by SF, get the rating difference and update magnus rating for that position.
Example:
start rating = 3000
pos: 1, Magnus move: Qg5, Magnus move score: -158, SF move: 0-0, SF move score: -136, error: -136 - (-158) or 22, rating diff: 10
Magnus perf rating: 3000-10 or 2990
pos: 2, Magnus and SF moves are the same,
Magnus perf rating: 2990 (just take the last perf rating since rating diff is zero)
pos: 3 ...
...

After a game take Magnus's average perf rating.

Do the same for engine1. Since engine1 has no move yet, let it analyze the position from Magnus. Compare it with SF too and calculate the rating difference and finally the average rating.

You can now compare the average rating of Magnus and engine1 from the positions of Magnus.

I took 6 games from skilling open prelim where Magnus played, 2 wins, 2 loses and 2 draws. Use Cheese 2.2 around rapid CCRL 2850 as engine1. engine1 is set to analyze at 20s/pos around TC 40/15m CCRL hardware, single core.

Main engine is SF14 set at 3496 (CCRL 40/15). Magnus and Cheese start at 3496 and update the perf rating move by move.

Code: Select all

              name  games  rating
            Cheese      6    3341
   Carlsen, Magnus      6    3295
Calculate the expected score given the rating diff of 3341-3295 or 46.

Code: Select all

Expected scores:
   name  expscore
 Cheese     0.566
Carlsen     0.434

Code: Select all

scores in 12 games:
   name  score
 Cheese    7.0
Carlsen    5.0
Sample log:

Code: Select all

game: 4
pos: r1bq1rk1/pp1n1ppp/2pbpn2/3p4/2PP4/2N1PN2/PPQ1BPPP/R1B2RK1 b - - 6 8
main engine: Stockfish 14, bm: dxc4, score: -25, depth: 26
player: Carlsen, Magnus, bm: b6, score: -51
test engine: Cheese 2.2 64 bits, bm: e5, score: -53
test engine error: 28, player error: 26
Cheese 2.2 64 bits perf: 3484, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbpn2/3p4/2PPP3/2N2N2/PPQ1BPPP/R1B2RK1 b - - 0 9
main engine: Stockfish 14, bm: Nxe4, score: -44, depth: 32
player: Carlsen, Magnus, bm: Nxe4, score: -44
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -54
test engine error: 10, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbp3/3p4/2PPN3/5N2/PPQ1BPPP/R1B2RK1 b - - 0 10
main engine: Stockfish 14, bm: dxe4, score: -46, depth: 33
player: Carlsen, Magnus, bm: dxe4, score: -46
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -46
test engine error: 0, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485
This sounds promising, but the estimated ratings of 3295 and 3341 for 2850 players are ridiculous, which makes the difference suspect as well. What is wrong here, or do I misinterpret something?
Komodo rules!
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by Ferdy »

lkaufman wrote: Sun Aug 22, 2021 3:21 am
Ferdy wrote: Sun Aug 22, 2021 1:01 am
lkaufman wrote: Mon Aug 16, 2021 8:21 pm How would Magnus Carlsen (FIDE Classical and Rapid ratings both close to 2850) perform in a serious Rapid (15' + 10") match with engines rated around 2850 on the CCRL Rapid (40/15) rating list (assuming they used a good but varied modern opening book), and ran on the reference i7 computer used by that list? Some example engines running on just one thread include Deep Shredder 11, Stockfish 1.4 (not 14!), Minic 1.39, Fritz 11, Toga II 3.0, Naum 3, Gaviota 1.0, and Arasan 18.0. I realize that there is probably no data on this exactly, opinions must be based on results of engines against other top players with a bit of extrapolation. We know that the rating lists spread out the ratings relative to human scale, but I'm trying to determine at least whether this list is accurate relative to FIDE ratings of the best human player. What do you think?
Here is one method to estimate the perf of Magnus.
1. Get some positions where Magnus is to move from the rapid games.
2. Evaluate those positions with a strong engine like SF.
3. If engine best move is not the same with Magnus's move, calculate the score of Magnus's move using SF. Get the rating difference from the score difference. If move is the same error is zero and rating difference is zero.
4. Do the same to a 2850 CCRL engine say engine1. Let it analyze all those Magnus's positions. Use SF to get the rating difference.

At the start Magnus will take a start rating, if there is error as determined by SF, get the rating difference and update magnus rating for that position.
Example:
start rating = 3000
pos: 1, Magnus move: Qg5, Magnus move score: -158, SF move: 0-0, SF move score: -136, error: -136 - (-158) or 22, rating diff: 10
Magnus perf rating: 3000-10 or 2990
pos: 2, Magnus and SF moves are the same,
Magnus perf rating: 2990 (just take the last perf rating since rating diff is zero)
pos: 3 ...
...

After a game take Magnus's average perf rating.

Do the same for engine1. Since engine1 has no move yet, let it analyze the position from Magnus. Compare it with SF too and calculate the rating difference and finally the average rating.

You can now compare the average rating of Magnus and engine1 from the positions of Magnus.

I took 6 games from skilling open prelim where Magnus played, 2 wins, 2 loses and 2 draws. Use Cheese 2.2 around rapid CCRL 2850 as engine1. engine1 is set to analyze at 20s/pos around TC 40/15m CCRL hardware, single core.

Main engine is SF14 set at 3496 (CCRL 40/15). Magnus and Cheese start at 3496 and update the perf rating move by move.

Code: Select all

              name  games  rating
            Cheese      6    3341
   Carlsen, Magnus      6    3295
Calculate the expected score given the rating diff of 3341-3295 or 46.

Code: Select all

Expected scores:
   name  expscore
 Cheese     0.566
Carlsen     0.434

Code: Select all

scores in 12 games:
   name  score
 Cheese    7.0
Carlsen    5.0
Sample log:

Code: Select all

game: 4
pos: r1bq1rk1/pp1n1ppp/2pbpn2/3p4/2PP4/2N1PN2/PPQ1BPPP/R1B2RK1 b - - 6 8
main engine: Stockfish 14, bm: dxc4, score: -25, depth: 26
player: Carlsen, Magnus, bm: b6, score: -51
test engine: Cheese 2.2 64 bits, bm: e5, score: -53
test engine error: 28, player error: 26
Cheese 2.2 64 bits perf: 3484, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbpn2/3p4/2PPP3/2N2N2/PPQ1BPPP/R1B2RK1 b - - 0 9
main engine: Stockfish 14, bm: Nxe4, score: -44, depth: 32
player: Carlsen, Magnus, bm: Nxe4, score: -44
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -54
test engine error: 10, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbp3/3p4/2PPN3/5N2/PPQ1BPPP/R1B2RK1 b - - 0 10
main engine: Stockfish 14, bm: dxe4, score: -46, depth: 33
player: Carlsen, Magnus, bm: dxe4, score: -46
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -46
test engine error: 0, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485
This sounds promising, but the estimated ratings of 3295 and 3341 for 2850 players are ridiculous, which makes the difference suspect as well. What is wrong here, or do I misinterpret something?
It is applied to measure the strength difference between players (46 in this case). One may use a different start rating, but the rating difference is still the same. Just ignore the resulting 3295 and 3341 those figures will change if the start rating of 3496 will be changed.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by lkaufman »

Ferdy wrote: Sun Aug 22, 2021 5:43 am
lkaufman wrote: Sun Aug 22, 2021 3:21 am
Ferdy wrote: Sun Aug 22, 2021 1:01 am
lkaufman wrote: Mon Aug 16, 2021 8:21 pm How would Magnus Carlsen (FIDE Classical and Rapid ratings both close to 2850) perform in a serious Rapid (15' + 10") match with engines rated around 2850 on the CCRL Rapid (40/15) rating list (assuming they used a good but varied modern opening book), and ran on the reference i7 computer used by that list? Some example engines running on just one thread include Deep Shredder 11, Stockfish 1.4 (not 14!), Minic 1.39, Fritz 11, Toga II 3.0, Naum 3, Gaviota 1.0, and Arasan 18.0. I realize that there is probably no data on this exactly, opinions must be based on results of engines against other top players with a bit of extrapolation. We know that the rating lists spread out the ratings relative to human scale, but I'm trying to determine at least whether this list is accurate relative to FIDE ratings of the best human player. What do you think?
Here is one method to estimate the perf of Magnus.
1. Get some positions where Magnus is to move from the rapid games.
2. Evaluate those positions with a strong engine like SF.
3. If engine best move is not the same with Magnus's move, calculate the score of Magnus's move using SF. Get the rating difference from the score difference. If move is the same error is zero and rating difference is zero.
4. Do the same to a 2850 CCRL engine say engine1. Let it analyze all those Magnus's positions. Use SF to get the rating difference.

At the start Magnus will take a start rating, if there is error as determined by SF, get the rating difference and update magnus rating for that position.
Example:
start rating = 3000
pos: 1, Magnus move: Qg5, Magnus move score: -158, SF move: 0-0, SF move score: -136, error: -136 - (-158) or 22, rating diff: 10
Magnus perf rating: 3000-10 or 2990
pos: 2, Magnus and SF moves are the same,
Magnus perf rating: 2990 (just take the last perf rating since rating diff is zero)
pos: 3 ...
...

After a game take Magnus's average perf rating.

Do the same for engine1. Since engine1 has no move yet, let it analyze the position from Magnus. Compare it with SF too and calculate the rating difference and finally the average rating.

You can now compare the average rating of Magnus and engine1 from the positions of Magnus.

I took 6 games from skilling open prelim where Magnus played, 2 wins, 2 loses and 2 draws. Use Cheese 2.2 around rapid CCRL 2850 as engine1. engine1 is set to analyze at 20s/pos around TC 40/15m CCRL hardware, single core.

Main engine is SF14 set at 3496 (CCRL 40/15). Magnus and Cheese start at 3496 and update the perf rating move by move.

Code: Select all

              name  games  rating
            Cheese      6    3341
   Carlsen, Magnus      6    3295
Calculate the expected score given the rating diff of 3341-3295 or 46.

Code: Select all

Expected scores:
   name  expscore
 Cheese     0.566
Carlsen     0.434

Code: Select all

scores in 12 games:
   name  score
 Cheese    7.0
Carlsen    5.0
Sample log:

Code: Select all

game: 4
pos: r1bq1rk1/pp1n1ppp/2pbpn2/3p4/2PP4/2N1PN2/PPQ1BPPP/R1B2RK1 b - - 6 8
main engine: Stockfish 14, bm: dxc4, score: -25, depth: 26
player: Carlsen, Magnus, bm: b6, score: -51
test engine: Cheese 2.2 64 bits, bm: e5, score: -53
test engine error: 28, player error: 26
Cheese 2.2 64 bits perf: 3484, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbpn2/3p4/2PPP3/2N2N2/PPQ1BPPP/R1B2RK1 b - - 0 9
main engine: Stockfish 14, bm: Nxe4, score: -44, depth: 32
player: Carlsen, Magnus, bm: Nxe4, score: -44
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -54
test engine error: 10, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485

pos: r1bq1rk1/p2n1ppp/1ppbp3/3p4/2PPN3/5N2/PPQ1BPPP/R1B2RK1 b - - 0 10
main engine: Stockfish 14, bm: dxe4, score: -46, depth: 33
player: Carlsen, Magnus, bm: dxe4, score: -46
test engine: Cheese 2.2 64 bits, bm: dxe4, score: -46
test engine error: 0, player error: 0
Cheese 2.2 64 bits perf: 3479, Carlsen, Magnus perf: 3485
This sounds promising, but the estimated ratings of 3295 and 3341 for 2850 players are ridiculous, which makes the difference suspect as well. What is wrong here, or do I misinterpret something?
It is applied to measure the strength difference between players (46 in this case). One may use a different start rating, but the rating difference is still the same. Just ignore the resulting 3295 and 3341 those figures will change if the start rating of 3496 will be changed.
Yes, I understand this, but the 3496 figure sounds like the rating of full strength Stockfish. Don’t these numbers imply an estimate of 155 elo for the gap between Stockfish and a 2850 engine? If so, that’s absurd and the method is flawed. What is my mistake here?
Komodo rules!
Cornfed
Posts: 511
Joined: Sun Apr 26, 2020 11:40 pm
Full name: Brian D. Smith

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by Cornfed »

Magnus is a modern Lasker in a way - more often than some other players, when he wants to play for a win, he plays the move most likely to cause his opponent problems, not always the 'objectively best' move.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Carlsen vs. CCRL 2850 engines in Rapid?

Post by lkaufman »

Cornfed wrote: Sun Aug 22, 2021 6:21 pm Magnus is a modern Lasker in a way - more often than some other players, when he wants to play for a win, he plays the move most likely to cause his opponent problems, not always the 'objectively best' move.
I agree with the statement, but how does it affect the predicted result of this topic? Since 2850 engines/levels have various weaknesses, I would expect Carlsen would be the best at figuring out how to modify his play to set problems for those crippled or (relatively) weak engines, just as he does for human opponents.
Komodo rules!