What engine breaks even with GMs in blitz?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10280
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: What engine breaks even with GMs in blitz?

Post by Uri Blass »

Laskos wrote: Tue Apr 09, 2019 9:22 am
lkaufman wrote: Tue Apr 09, 2019 5:41 am
Laskos wrote: Mon Apr 08, 2019 9:56 pm
lkaufman wrote: Mon Apr 08, 2019 9:04 pm
Laskos wrote: Mon Apr 08, 2019 8:06 am
lkaufman wrote: Mon Apr 08, 2019 6:43 am I ran a twenty game blitz (3' + 2") knight odds match (two ply book for variety, both b1 and g1 knights) between Lc0 11248 (on my 2080) vs. Giraffe (best version, on my 5 GHz i7 laptop), with Giraffe as a proxy for Magnus Carlsen, a good one since it "knows" to simplify when up a piece while some similarly rated A/B engines may not know or appreciate this. "Magnus" won by 12 to 8 (no draws!). So perhaps it's not yet time to bet against the champ if such a match took place, but we're getting close it seems.
I am not sure at 3' + 2'' blitz, Lc0 11248 on 2080 is probably close at Knight odds to Carlsen, but my claim here:
http://www.talkchess.com/forum3/viewtop ... =2&t=69956

was that at longer 45' + 15'' a top GM might not win all 10 games out of 10 being a Knight up, more probably 7-9, 1-3 being drawn or even lost by the human. It would be fun to watch such a match, as the top human will be happy having an upper hand most of the times, while enjoying some 1-3 setbacks at Knight odds in 10 games! The prize could be proportional to (Wins - Draws - Losses) or even (Wins - Draws - 2*Losses), to give incentive to the human to not lose any points. Lc0 should be left with some temperature (say 0.5) for the first 4-5 moves, for diversity and for not playing into prepared openings. Or by providing Lc0 with a small, prepared book.
Since we already know that this same LCO network defeated GM Naroditsky by a wide margin in blitz games averaging around knight odds, and he is probably within a class of Magnus in blitz strength, we already have good reason to think knight odds vs. Carlsen would be close, so this test is mostly just to confirm that we can make meaningful predictions of human results by these simulartions. As for 45' + 15", what CCRL 40/40 rating do you think would come closest to matching Magnus in strength at 45' + 15" ?
I am on the phone now, but that's not that hard, there were games against top humans in 2003-2004 and we can extrapolate tc and hardware. Take some Fritz 8 or Junior 9 on one core CCRL 40/40 level. My estimate it is some Kasparov or Magnus level at 45' + 15''.
I believe those were roughly the engines that tied matches with Kasparov at standard time back then, probably running on four threads, though I'm not sure about that. I'm not sure how many times faster the hardware is today; I suppose we have to clarify whether we are talking about playing on the old machines CCRL uses as a standard or on our current machines. Since you are running your matches on your new machine, not on an old AMD like the ones used as reference by CCRL, I suppose we should pick an engine that would be Carlsen level on that hardware, and if so I think you are picking too strong an engine; I imagine that those two engines on your current machine on 1 thread are as good as whatever they had around 2003 on four threads, so it should be 2800+ level even at 40/2 hours, and hence a big favorite at 45' + 15". But you are more knowledgeable about hardware and a much better mathematician than I am, so please correct me if I am wrong.
I got fairly confused this morning, bad sleep probably :).

First baseline, which I remembered correctly:

From Wiki:
X3D Fritz was a version of the Fritz chess program, which in November 2003 played a four-game Human–computer chess match against world number one Grandmaster Garry Kasparov. The match was tied 2–2. Fritz ran on four Intel Pentium 4 Xeon CPUs at 2.8 GHz.

X3D Fritz is something in-between Fritz 8 and Fritz 8 Bilbao, and close in strength to them. Fritz 8 Bilbao itself played some 12 games against top humans (weaker than Kasparov) and won. So, all the data corroborated (similar data on some Junior 8 and 9 matches against top humans), it's reasonable to say that Fritz 8 Bilbao and Junior 9 on "four Intel Pentium 4 Xeon CPUs at 2.8 GHz" are some Kasparov/Kramnik level of 2003-4 at 40/2 hours.

One my i7 core is close to being 2.5 faster than one of those cores of Xeon. Their effective speed-up on 4 cores in those times was not that good, maybe around 2.8-3.0, so basically these Fritz and Junior engines on one my i7 core are level with Kasparov/Kramnik at 40/2 hours.

No need to extrapolate up to now.

There are two scaling issues needed for extrapolation:

1/ Scaling of human versus machine with TC
2/ Scaling of Knight odds with TC

And here I got stuck with this Leela thing. I am playing Lc0 11248 as the ODDS TAKER (handicapped engine by one Knight) and the conventional Arasan 14.3 as taking the KNIGHT UP chances. It's quite the opposite of handicapped very strong classical AB engine versus human, as we used to see all the handicaps and scaling behaviors. Here I am mimicking basically a super-human Lc0 being a Knight down against a pure classical engine Fritz 8.

Going to 1/ and 2/ --- what is "machine" in that scaling issue? The scaling of Knight odds is clear as a slope, it increases with TC, but I am not sure of magnitude, again, depending on what is "machine".

I understand your reasoning that Fritz 8, Junior 9 (both some Arasan 14.3 level) seem too strong at 45' + 15'', but my doubts about the validity of my and your usual reasoning are validated by the crazy result I got:

At 15' + 5'' in 10 games Lc0 11248 on 2070 (first 4 moves using temperature of 0.5) scored 4 Wins and 6 Losses against Arasan 14.3 on one strong i7 core at Knight Odds. I do not know what to make of this result and what it means. In our usual reasoning, that would mean that Lc0 11248 can score several wins against Magnus at Knight Odds in 10 games at 45' + 15'', but let's not say stupid things out of "usual reasoning".

I clearly need to change the opponent of Lc0 11248 from Arasan 14.3 to another Lc0 to mimic a human opponent. By the way, Arasan seems quite dumb in converting the advantage, and I need a good Leela net ID which knows how to convert large advantages. Do you know a net ID to play well being a Knight up? I will adjust its TC (or nodes) to mimic a 2800 FIDE level human, and I will explain how I did it.

And to add: CCRL ratings don't help me much here, they actually confused me more :). Humans are not obeying them, Leela is not obeying them, all this mess.
I do not see a reason to assume that LC0 with adjusting the time control can mimic a 2800 fide level human opponent better than A-B engines.
I guess that LC0 knows too much things in the evaluation that 2800 GM's do not know.

I guess that if A-B engines fail to mimic 2800 humans because of a relatively stupid evaluation then lc0 nets fail to mimic 2800 humans because of a relatively stupid search.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: What engine breaks even with GMs in blitz?

Post by jp »

Uri Blass wrote: Tue Apr 09, 2019 11:42 am I do not see a reason to assume that LC0 with adjusting the time control can mimic a 2800 fide level human opponent better than A-B engines.
I guess that LC0 knows too much things in the evaluation that 2800 GM's do not know.

I guess that if A-B engines fail to mimic 2800 humans because of a relatively stupid evaluation then lc0 nets fail to mimic 2800 humans because of a relatively stupid search.
People have suggested running tests to see whether we can tell the difference between Lc0 & AB engines' games just looking at them. It's not clear you can, at least if you don't look at their worst sequences of moves.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: What engine breaks even with GMs in blitz?

Post by mwyoung »

mwyoung wrote: Tue Apr 09, 2019 8:05 am
lkaufman wrote: Tue Apr 09, 2019 7:21 am
mwyoung wrote: Tue Apr 09, 2019 6:59 am
mwyoung wrote: Tue Apr 09, 2019 6:38 am
lkaufman wrote: Sun Apr 07, 2019 6:32 pm In blitz (let's say 3' + 2" or as close to this as possible), the top engines today are far beyond human level. But how far down the list do we have to go to find engines (and specified hardware) that score evenly against GMs, preferably ones with known identities and ratings? I'm sure there is plenty of data to answer this question as countless games have been played online over the years, but does anyone actually have some data, such as "Engine xyz on one thread scored 50% against GMs averaging 2600 FIDE" for example? The question I'd like to answer is: How much would we have to add to CCRL blitz ratings to estimate the FIDE blitz rating of a human GM who would score 50% against it at 3' + 2"?
Hello Larry,

It looks like you are going to have to go down to the bottom of the list.

Here is a news report from 1994. About how Fritz 2 won against all the worlds best in 1994 in 5m blitz games.

https://www.independent.co.uk/arts-ente ... 38085.html

In 1994 at the time of the news report. The best processor was the Pentium.

And the report says: "When Intel sponsored the World Chess Express Challenge in Munich last Friday, they could never have hoped for such a good advertisement for their high-speed Pentium processor. It turned a good computer - Fritz 2 - into a world beater.:"

March 1994:
Intel introduces and ships faster Pentium chips, based on 0.6 micron BiCMOS manufacturing. The processor now includes clock-doubling of 1.5 or 2 time the external clock rate, allowing processor speeds of up to 100 MHz on a 50-66 MHz system bus. The processor also includes power management capabilities to allow stopping and restarting the processor. Code-name during development was P54C. The 60/90 MHz Pentium 735 processor is rated at 149.8 MIPS, and is priced at US$849 in 1000 unit quantities. The 66/100 MHz Pentium 815 processor is rated at 166.3 MIPS, and is priced at US$995 in 1000 unit quantities. [205.98] [265] [62] [550.29] [551.168,259] [557.134] [584.43] [689.115] [276]
I found 2 games of Fritz 2 from 1992. Scoring 1-1 Playing GM Kasparov. Fritz could be playing on a 386 or 486 processor in 1992. My guess would be the 486.

[pgn] [Event "Koln (5')"] [Site "Koln (5')"] [Date "1992.??.??"] [EventDate "?"] [Round "?"] [Result "0-1"] [White "Fritz (Computer)"] [Black "Garry Kasparov"] [ECO "B30"] [WhiteElo "?"] [BlackElo "?"] [PlyCount "77"] 1.e4 c5 2.Nf3 Nc6 3.Nc3 g6 4.d4 cxd4 5.Nxd4 Bg7 6.Be3 Nf6 7.Nxc6 bxc6 8.e5 Ng8 9.f4 Nh6 10.Qd2 O-O 11.O-O-O d6 12.exd6 exd6 13.Qxd6 Qxd6 14.Rxd6 Nf5 15.Rd3 Ba6 16.Bc5 Bxd3 17.Bxf8 Bxf1 18.Bxg7 Bxg2+ 19.Rg1 Kxg7 20.Rxg2 Rb8 21.Re2 Rh8 22.b3 h5 23.Kb2 h4 24.h3 Rd8 25.Ne4 Ng3 26.Nxg3 hxg3 27.Rg2 Rd4 28.Rxg3 Rxf4 29.Rc3 Rh4 30.Rxc6 Rxh3 31.Ra6 g5 32.Rxa7 g4 33.a4 g3 34.Ra5 Rh6 35.Rg5+ Rg6 36.Rxg3 Rxg3 37.c4 f5 38.b4 f4 39.Kc2 0-1[/pgn]

[pgn][Event "Koln (5')"] [Site "Koln (5')"] [Date "1992.??.??"] [EventDate "?"] [Round "?"] [Result "1-0"] [White "Fritz (Computer)"] [Black "Garry Kasparov"] [ECO "A07"] [WhiteElo "?"] [BlackElo "?"] [PlyCount "87"] 1.e4 c5 2.d3 Nc6 3.g3 g6 4.Bg2 Bg7 5.Nf3 d6 6.O-O e5 7.Bg5 f6 8.Be3 Nge7 9.a3 O-O 10.Nc3 Kh8 11.b4 b6 12.Rb1 Be6 13.b5 Nd4 14.a4 f5 15.Ng5 Bg8 16.exf5 Nexf5 17.Bxa8 Qxa8 18.Bxd4 Nxd4 19.Ne2 Nf5 20.c3 d5 21.Re1 h6 22.Nf3 g5 23.d4 e4 24.Ne5 Qe8 25.Nc6 Nd6 26.Nc1 Qd7 27.Nb3 Qh3 28.Qe2 Be6 29.dxc5 bxc5 30.Nxc5 Bg4 31.Qf1 Qh5 32.Rb3 Nc4 33.Nxe4 dxe4 34.Qxc4 Rxf2 35.h4 gxh4 36.Kxf2 hxg3 37.Ke3 Qg5+ 38.Kxe4 Qf5+ 39.Ke3 Qf3+ 40.Kd2 g2 41.Re8+ Bf8 42.Qd4+ Kh7 43.Rb1 Bg7 44.Re7 1-0 [/pgn]
I didn't remember these two events, but I suppose it makes sense that overall Fritz 2 would perform maybe 2750 or so at blitz overall mostly on a Pentium, because Rexchess performed in the 2500s around 1990 on a 486, and Fritz 2 was later and stronger. Considering the hardware avancement since the Pentium, I suppose that the estimates of raising CCRL blitz ratings by 500 for FIDE blitz rating equivalence should be revised upward quite a bit. I don't know if I even have any engine weak enough to play the same level on my 5 Ghz I7 as Fritz 2 did on a Pentium! Well, there's always the handicapped levels of Komodo, one of them must be suitable. But I'd have to have a weak enough engine to run it against to determine which level that would be! Any suggestions of engines of that level that are easy to download and problem-free?
I have tested all of these in the past, and they have worked. And they are the right vintage...
Put your laptop on power saving mode, and or use less time.

http://rebel13.nl/windows/rebel's%20with%20uci.html


MGP 1993.jpg
I looked up the rating for Fritz 2 and Gideon pro. From the 1993 computer chess reports. Gideon pro was rated about 100 elo better then Fritz 2. Tested on a 486 with 4mb of HT.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: What engine breaks even with GMs in blitz?

Post by Laskos »

jp wrote: Tue Apr 09, 2019 2:45 pm
Uri Blass wrote: Tue Apr 09, 2019 11:42 am I do not see a reason to assume that LC0 with adjusting the time control can mimic a 2800 fide level human opponent better than A-B engines.
I guess that LC0 knows too much things in the evaluation that 2800 GM's do not know.

I guess that if A-B engines fail to mimic 2800 humans because of a relatively stupid evaluation then lc0 nets fail to mimic 2800 humans because of a relatively stupid search.
People have suggested running tests to see whether we can tell the difference between Lc0 & AB engines' games just looking at them. It's not clear you can, at least if you don't look at their worst sequences of moves.
In general not close to the borders game-play, one probably has to be above 2000-2100 FIDE to see clearly the differences. But if you put a specified Lc0 net in a pool of regular engines from the standard opening position to the outcome by chess rules (no adjudications), there are some markers even I can see with several hours of studying the play of that Lc0 net. Starting opening choice and very late endgames are probably one of the best markers.

Lc0 and regular engines DO differ A LOT, especially seen in test-suites.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: What engine breaks even with GMs in blitz?

Post by JJJ »

Larry, don't you have enough database with Komodo playing on chess.com against human ?
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: What engine breaks even with GMs in blitz?

Post by lkaufman »

Laskos wrote: Tue Apr 09, 2019 9:22 am
lkaufman wrote: Tue Apr 09, 2019 5:41 am
Laskos wrote: Mon Apr 08, 2019 9:56 pm
lkaufman wrote: Mon Apr 08, 2019 9:04 pm
Laskos wrote: Mon Apr 08, 2019 8:06 am
lkaufman wrote: Mon Apr 08, 2019 6:43 am I ran a twenty game blitz (3' + 2") knight odds match (two ply book for variety, both b1 and g1 knights) between Lc0 11248 (on my 2080) vs. Giraffe (best version, on my 5 GHz i7 laptop), with Giraffe as a proxy for Magnus Carlsen, a good one since it "knows" to simplify when up a piece while some similarly rated A/B engines may not know or appreciate this. "Magnus" won by 12 to 8 (no draws!). So perhaps it's not yet time to bet against the champ if such a match took place, but we're getting close it seems.
I am not sure at 3' + 2'' blitz, Lc0 11248 on 2080 is probably close at Knight odds to Carlsen, but my claim here:
http://www.talkchess.com/forum3/viewtop ... =2&t=69956

was that at longer 45' + 15'' a top GM might not win all 10 games out of 10 being a Knight up, more probably 7-9, 1-3 being drawn or even lost by the human. It would be fun to watch such a match, as the top human will be happy having an upper hand most of the times, while enjoying some 1-3 setbacks at Knight odds in 10 games! The prize could be proportional to (Wins - Draws - Losses) or even (Wins - Draws - 2*Losses), to give incentive to the human to not lose any points. Lc0 should be left with some temperature (say 0.5) for the first 4-5 moves, for diversity and for not playing into prepared openings. Or by providing Lc0 with a small, prepared book.
Since we already know that this same LCO network defeated GM Naroditsky by a wide margin in blitz games averaging around knight odds, and he is probably within a class of Magnus in blitz strength, we already have good reason to think knight odds vs. Carlsen would be close, so this test is mostly just to confirm that we can make meaningful predictions of human results by these simulartions. As for 45' + 15", what CCRL 40/40 rating do you think would come closest to matching Magnus in strength at 45' + 15" ?
I am on the phone now, but that's not that hard, there were games against top humans in 2003-2004 and we can extrapolate tc and hardware. Take some Fritz 8 or Junior 9 on one core CCRL 40/40 level. My estimate it is some Kasparov or Magnus level at 45' + 15''.
I believe those were roughly the engines that tied matches with Kasparov at standard time back then, probably running on four threads, though I'm not sure about that. I'm not sure how many times faster the hardware is today; I suppose we have to clarify whether we are talking about playing on the old machines CCRL uses as a standard or on our current machines. Since you are running your matches on your new machine, not on an old AMD like the ones used as reference by CCRL, I suppose we should pick an engine that would be Carlsen level on that hardware, and if so I think you are picking too strong an engine; I imagine that those two engines on your current machine on 1 thread are as good as whatever they had around 2003 on four threads, so it should be 2800+ level even at 40/2 hours, and hence a big favorite at 45' + 15". But you are more knowledgeable about hardware and a much better mathematician than I am, so please correct me if I am wrong.
I got fairly confused this morning, bad sleep probably :).

First baseline, which I remembered correctly:

From Wiki:
X3D Fritz was a version of the Fritz chess program, which in November 2003 played a four-game Human–computer chess match against world number one Grandmaster Garry Kasparov. The match was tied 2–2. Fritz ran on four Intel Pentium 4 Xeon CPUs at 2.8 GHz.

X3D Fritz is something in-between Fritz 8 and Fritz 8 Bilbao, and close in strength to them. Fritz 8 Bilbao itself played some 12 games against top humans (weaker than Kasparov) and won. So, all the data corroborated (similar data on some Junior 8 and 9 matches against top humans), it's reasonable to say that Fritz 8 Bilbao and Junior 9 on "four Intel Pentium 4 Xeon CPUs at 2.8 GHz" are some Kasparov/Kramnik level of 2003-4 at 40/2 hours.

One my i7 core is close to being 2.5 faster than one of those cores of Xeon. Their effective speed-up on 4 cores in those times was not that good, maybe around 2.8-3.0, so basically these Fritz and Junior engines on one my i7 core are level with Kasparov/Kramnik at 40/2 hours.

No need to extrapolate up to now.

There are two scaling issues needed for extrapolation:

1/ Scaling of human versus machine with TC
2/ Scaling of Knight odds with TC

And here I got stuck with this Leela thing. I am playing Lc0 11248 as the ODDS TAKER (handicapped engine by one Knight) and the conventional Arasan 14.3 as taking the KNIGHT UP chances. It's quite the opposite of handicapped very strong classical AB engine versus human, as we used to see all the handicaps and scaling behaviors. Here I am mimicking basically a super-human Lc0 being a Knight down against a pure classical engine Fritz 8.

Going to 1/ and 2/ --- what is "machine" in that scaling issue? The scaling of Knight odds is clear as a slope, it increases with TC, but I am not sure of magnitude, again, depending on what is "machine".

I understand your reasoning that Fritz 8, Junior 9 (both some Arasan 14.3 level) seem too strong at 45' + 15'', but my doubts about the validity of my and your usual reasoning are validated by the crazy result I got:

At 15' + 5'' in 10 games Lc0 11248 on 2070 (first 4 moves using temperature of 0.5) scored 4 Wins and 6 Losses against Arasan 14.3 on one strong i7 core at Knight Odds. I do not know what to make of this result and what it means. In our usual reasoning, that would mean that Lc0 11248 can score several wins against Magnus at Knight Odds in 10 games at 45' + 15'', but let's not say stupid things out of "usual reasoning".

I clearly need to change the opponent of Lc0 11248 from Arasan 14.3 to another Lc0 to mimic a human opponent. By the way, Arasan seems quite dumb in converting the advantage, and I need a good Leela net ID which knows how to convert large advantages. Do you know a net ID to play well being a Knight up? I will adjust its TC (or nodes) to mimic a 2800 FIDE level human, and I will explain how I did it.

And to add: CCRL ratings don't help me much here, they actually confused me more :). Humans are not obeying them, Leela is not obeying them, all this mess.
I ran the same test as you did overnight, except that I ran at the actual 45' + 15" level under discussion instead of your 15' + 5" level. So far I have five wins for Arasan 14.3, no wins for Lc0 11248, one draw, and the current game pretty clearly a draw so call it two draws. My 2080 is about 20% faster than your 2070 we determined, but perhaps my 4.9 GHz i7 is also a bit faster than yours? Anyway, it seems that tripling the time limit made a big difference, 6-1 instead of 6-4.
I agree with using Lc0 on both ends. It is pretty symmetrical in eval, so the same networks that are good at playing a knight down should also be good at playing a knight up. So 11248 would be fine. If you think it's better not to use the same network for both, 11258 is also good when up or down a knight. I found that 11248 CPU only, one thread, was a reasonable opponent for full power 11248 at knight odds at fast levels (it seems to need 2 seconds minimum increment to avoid time forfeits) but the results favored the cpu version. Anyway I'll leave it to you to do the math as to whether this is enough of a speed handicap to simulate Carlsen.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: What engine breaks even with GMs in blitz?

Post by lkaufman »

JJJ wrote: Tue Apr 09, 2019 4:12 pm Larry, don't you have enough database with Komodo playing on chess.com against human ?
If you are referring to the games anyone can play against the various levels, the problem is that only weak players play against them, giving them very unrealistically low ratings at the higher levels, making the strong players unwilling to play them. A self-destructive circle. In the official matches, we have only had a few games between the top levels and MVL/Nakamura.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: What engine breaks even with GMs in blitz?

Post by lkaufman »

Uri Blass wrote: Tue Apr 09, 2019 11:42 am
Laskos wrote: Tue Apr 09, 2019 9:22 am
lkaufman wrote: Tue Apr 09, 2019 5:41 am
Laskos wrote: Mon Apr 08, 2019 9:56 pm
lkaufman wrote: Mon Apr 08, 2019 9:04 pm
Laskos wrote: Mon Apr 08, 2019 8:06 am
lkaufman wrote: Mon Apr 08, 2019 6:43 am I ran a twenty game blitz (3' + 2") knight odds match (two ply book for variety, both b1 and g1 knights) between Lc0 11248 (on my 2080) vs. Giraffe (best version, on my 5 GHz i7 laptop), with Giraffe as a proxy for Magnus Carlsen, a good one since it "knows" to simplify when up a piece while some similarly rated A/B engines may not know or appreciate this. "Magnus" won by 12 to 8 (no draws!). So perhaps it's not yet time to bet against the champ if such a match took place, but we're getting close it seems.
I am not sure at 3' + 2'' blitz, Lc0 11248 on 2080 is probably close at Knight odds to Carlsen, but my claim here:
http://www.talkchess.com/forum3/viewtop ... =2&t=69956

was that at longer 45' + 15'' a top GM might not win all 10 games out of 10 being a Knight up, more probably 7-9, 1-3 being drawn or even lost by the human. It would be fun to watch such a match, as the top human will be happy having an upper hand most of the times, while enjoying some 1-3 setbacks at Knight odds in 10 games! The prize could be proportional to (Wins - Draws - Losses) or even (Wins - Draws - 2*Losses), to give incentive to the human to not lose any points. Lc0 should be left with some temperature (say 0.5) for the first 4-5 moves, for diversity and for not playing into prepared openings. Or by providing Lc0 with a small, prepared book.
Since we already know that this same LCO network defeated GM Naroditsky by a wide margin in blitz games averaging around knight odds, and he is probably within a class of Magnus in blitz strength, we already have good reason to think knight odds vs. Carlsen would be close, so this test is mostly just to confirm that we can make meaningful predictions of human results by these simulartions. As for 45' + 15", what CCRL 40/40 rating do you think would come closest to matching Magnus in strength at 45' + 15" ?
I am on the phone now, but that's not that hard, there were games against top humans in 2003-2004 and we can extrapolate tc and hardware. Take some Fritz 8 or Junior 9 on one core CCRL 40/40 level. My estimate it is some Kasparov or Magnus level at 45' + 15''.
I believe those were roughly the engines that tied matches with Kasparov at standard time back then, probably running on four threads, though I'm not sure about that. I'm not sure how many times faster the hardware is today; I suppose we have to clarify whether we are talking about playing on the old machines CCRL uses as a standard or on our current machines. Since you are running your matches on your new machine, not on an old AMD like the ones used as reference by CCRL, I suppose we should pick an engine that would be Carlsen level on that hardware, and if so I think you are picking too strong an engine; I imagine that those two engines on your current machine on 1 thread are as good as whatever they had around 2003 on four threads, so it should be 2800+ level even at 40/2 hours, and hence a big favorite at 45' + 15". But you are more knowledgeable about hardware and a much better mathematician than I am, so please correct me if I am wrong.
I got fairly confused this morning, bad sleep probably :).

First baseline, which I remembered correctly:

From Wiki:
X3D Fritz was a version of the Fritz chess program, which in November 2003 played a four-game Human–computer chess match against world number one Grandmaster Garry Kasparov. The match was tied 2–2. Fritz ran on four Intel Pentium 4 Xeon CPUs at 2.8 GHz.

X3D Fritz is something in-between Fritz 8 and Fritz 8 Bilbao, and close in strength to them. Fritz 8 Bilbao itself played some 12 games against top humans (weaker than Kasparov) and won. So, all the data corroborated (similar data on some Junior 8 and 9 matches against top humans), it's reasonable to say that Fritz 8 Bilbao and Junior 9 on "four Intel Pentium 4 Xeon CPUs at 2.8 GHz" are some Kasparov/Kramnik level of 2003-4 at 40/2 hours.

One my i7 core is close to being 2.5 faster than one of those cores of Xeon. Their effective speed-up on 4 cores in those times was not that good, maybe around 2.8-3.0, so basically these Fritz and Junior engines on one my i7 core are level with Kasparov/Kramnik at 40/2 hours.

No need to extrapolate up to now.

There are two scaling issues needed for extrapolation:

1/ Scaling of human versus machine with TC
2/ Scaling of Knight odds with TC

And here I got stuck with this Leela thing. I am playing Lc0 11248 as the ODDS TAKER (handicapped engine by one Knight) and the conventional Arasan 14.3 as taking the KNIGHT UP chances. It's quite the opposite of handicapped very strong classical AB engine versus human, as we used to see all the handicaps and scaling behaviors. Here I am mimicking basically a super-human Lc0 being a Knight down against a pure classical engine Fritz 8.

Going to 1/ and 2/ --- what is "machine" in that scaling issue? The scaling of Knight odds is clear as a slope, it increases with TC, but I am not sure of magnitude, again, depending on what is "machine".

I understand your reasoning that Fritz 8, Junior 9 (both some Arasan 14.3 level) seem too strong at 45' + 15'', but my doubts about the validity of my and your usual reasoning are validated by the crazy result I got:

At 15' + 5'' in 10 games Lc0 11248 on 2070 (first 4 moves using temperature of 0.5) scored 4 Wins and 6 Losses against Arasan 14.3 on one strong i7 core at Knight Odds. I do not know what to make of this result and what it means. In our usual reasoning, that would mean that Lc0 11248 can score several wins against Magnus at Knight Odds in 10 games at 45' + 15'', but let's not say stupid things out of "usual reasoning".

I clearly need to change the opponent of Lc0 11248 from Arasan 14.3 to another Lc0 to mimic a human opponent. By the way, Arasan seems quite dumb in converting the advantage, and I need a good Leela net ID which knows how to convert large advantages. Do you know a net ID to play well being a Knight up? I will adjust its TC (or nodes) to mimic a 2800 FIDE level human, and I will explain how I did it.

And to add: CCRL ratings don't help me much here, they actually confused me more :). Humans are not obeying them, Leela is not obeying them, all this mess.
I do not see a reason to assume that LC0 with adjusting the time control can mimic a 2800 fide level human opponent better than A-B engines.
I guess that LC0 knows too much things in the evaluation that 2800 GM's do not know.

I guess that if A-B engines fail to mimic 2800 humans because of a relatively stupid evaluation then lc0 nets fail to mimic 2800 humans because of a relatively stupid search.
I think the important point here is just that you want the engine mimicking the GM to fully appreciate the importance of simplifying when up a piece. It's easy to tell whether a given engine understands this or not, just set up knight odds on the full board, get the eval after say 30", then repeat with the queens removed and see if the score goes up significantly. Lc0 11248 shows a reasonable difference; Arasan 14.3 shows a significant but not a large enough difference; Arasan 14.1 showed essentially no difference. Some engines understand this (to varyind degrees), some don't. As long as you pick one that does, it's probably ok. We can't expect any engine to simulate a human in general, but at least it should simulate the humans' guiding principle in such games, to trade down when ahead in material.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: What engine breaks even with GMs in blitz?

Post by lkaufman »

lkaufman wrote: Tue Apr 09, 2019 6:48 pm
I ran the same test as you did overnight, except that I ran at the actual 45' + 15" level under discussion instead of your 15' + 5" level. So far I have five wins for Arasan 14.3, no wins for Lc0 11248, one draw, and the current game pretty clearly a draw so call it two draws. My 2080 is about 20% faster than your 2070 we determined, but perhaps my 4.9 GHz i7 is also a bit faster than yours? Anyway, it seems that tripling the time limit made a big difference, 6-1 instead of 6-4.
Much to my surprise, Lc0 won that "drawn" game making the score 1.5 -5.5, not 1-6. Lc0 had a lone queen against bishop, knight, and three pawns, and so I assumed (and the evals indicated) that Lc0 would seek perpetual check. But somehow it picked up all three of the pawns, one by one over many moves, and won the queen vs. two minors endgame (no TBs used).
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: What engine breaks even with GMs in blitz?

Post by Laskos »

lkaufman wrote: Tue Apr 09, 2019 8:28 pm
lkaufman wrote: Tue Apr 09, 2019 6:48 pm
I ran the same test as you did overnight, except that I ran at the actual 45' + 15" level under discussion instead of your 15' + 5" level. So far I have five wins for Arasan 14.3, no wins for Lc0 11248, one draw, and the current game pretty clearly a draw so call it two draws. My 2080 is about 20% faster than your 2070 we determined, but perhaps my 4.9 GHz i7 is also a bit faster than yours? Anyway, it seems that tripling the time limit made a big difference, 6-1 instead of 6-4.
Much to my surprise, Lc0 won that "drawn" game making the score 1.5 -5.5, not 1-6. Lc0 had a lone queen against bishop, knight, and three pawns, and so I assumed (and the evals indicated) that Lc0 would seek perpetual check. But somehow it picked up all three of the pawns, one by one over many moves, and won the queen vs. two minors endgame (no TBs used).
Might start looking similar to my result, although I expect Arasan to perform worse at 45' + 15'' than at 15' + 5'' (and your result will probably show that). That endgame you describe seems a bit funny.

I am getting quite interesting results with ThothFish, a SF derivative which can be adjusted to like or dislike swapping pieces to desired degree. I am playing with some parameters at fast TC, and got a "weak" (small number of nodes) ThothFish which likes very much swapping pieces and overperforms heavily the regular "weak" (small number of nodes) SF, both being Knight up against "strong" (and handicapped) Lc0 11248. Adjusted in this way SF can probably model somehow a human too.

So, Arasan results are sure not the final word.