What engine breaks even with GMs in blitz?

lkaufman · Post by **lkaufman** » Tue Apr 16, 2019 7:47 pm

Laskos wrote: ↑Mon Apr 15, 2019 8:30 pm
lkaufman wrote: ↑Wed Apr 10, 2019 12:04 am
Laskos wrote: ↑Tue Apr 09, 2019 10:48 pm
lkaufman wrote: ↑Tue Apr 09, 2019 8:28 pm
lkaufman wrote: ↑Tue Apr 09, 2019 6:48 pm
I ran the same test as you did overnight, except that I ran at the actual 45' + 15" level under discussion instead of your 15' + 5" level. So far I have five wins for Arasan 14.3, no wins for Lc0 11248, one draw, and the current game pretty clearly a draw so call it two draws. My 2080 is about 20% faster than your 2070 we determined, but perhaps my 4.9 GHz i7 is also a bit faster than yours? Anyway, it seems that tripling the time limit made a big difference, 6-1 instead of 6-4.

Much to my surprise, Lc0 won that "drawn" game making the score 1.5 -5.5, not 1-6. Lc0 had a lone queen against bishop, knight, and three pawns, and so I assumed (and the evals indicated) that Lc0 would seek perpetual check. But somehow it picked up all three of the pawns, one by one over many moves, and won the queen vs. two minors endgame (no TBs used).
Might start looking similar to my result, although I expect Arasan to perform worse at 45' + 15'' than at 15' + 5'' (and your result will probably show that). That endgame you describe seems a bit funny.

I am getting quite interesting results with ThothFish, a SF derivative which can be adjusted to like or dislike swapping pieces to desired degree. I am playing with some parameters at fast TC, and got a "weak" (small number of nodes) ThothFish which likes very much swapping pieces and overperforms heavily the regular "weak" (small number of nodes) SF, both being Knight up against "strong" (and handicapped) Lc0 11248. Adjusted in this way SF can probably model somehow a human too.

So, Arasan results are sure not the final word.
Well, Lc0 won the last (8th) game vs. Arasan, so the final score was 2.5 to 5.5 for Lc0 giving knight odds at 45' + 15", pretty much what we would expect based on your result at 15' + 5". The ThothFish test sounds interesting; I wonder how strong the incentive to exchange (especially queens) should be for optimum results at knight odds. I suppose though that it's not perfect in that it might still try to trade even if it loses back the piece. For example if it will pay a pawn to trade queens, that might be okay when up a full piece but is certainly not ok when up a piece for two pawns.
I performed this longish test (2 days) using ThothFish. First at fast time control I checked which settings work to convert best the Knight advantage. At some fast disbalanced TC I got for ThothFish with "Exchange" factors = 200 (all of them) compared to SF_dev:
59/100 conversions for ThothFish
41/100 conversions for SF_dev

Now, using those settings (=200) I found to work best, I played real games with TC = 45' + 15'' for Lc0 11248. The human opponent is "ThothFish200" at 1 million nodes per move. I found 1-2 years ago and I repeated my reasoning these days, and I am am pretty confident that SF_dev at about 1 million nodes per move is a 2750 - 2850 FIDE Elo opponent for a human at 45' + 15'' time control. That is about 0.15s/move on my 4 core PC, or game in 10 seconds. I will not repeat the reasoning as it is long and not illuminating, but I am quite confident of it.

Real games:
Lc0 11248 at 45' + 15'' at Knight odds versus SF_dev at 1 million nodes/move:
+17 =3 -0
for SF_dev (mimicking human). So it seems that Lc0 could draw 3/20 games against a top-GM (2800 +/ 50 FIDE) at Knight odds.

But here comes ThothFish, which knows better to convert Knight advantage (but is no stronger than SF_dev in normal play, actually a bit weaker), and the result is different:
Lc0 11248 at 45' + 15'' at Knight odds versus ThothFIsh at 1 million nodes/move:
+20 =0 -0
for ThothFish (mimicking human). It seems Lc0 cannot draw or win any game of out of 20 against a top-GM at Knight odds.

I am not sure how well ThothFIsh at 1 million nodes/move mimics a human, it probably blunders tactically less than a human. Probably we got pretty good results at Knight odds against Arasan mimicking human because Arasan knows worse than SF how to convert a Knight advantage, and MUCH worse than ThothFish. So, it seems Knight odds against a very top GM at 45' + 15'' are possible only if human top GM blunders pretty badly. That's not exactly what I had hoped for. Maybe against Lc0 playing very aggressively and forcing the human to blunder it could be fun, but I guess the playing top GM would be annoyed by Lc0 play.

This is quite surprising to me, not that the simulated GM won, but a 20 to zero score seems unbelievable. I feel something must be wrong with your estimated elo for SF 1 million nodes. You are saying that current SF on one thread on a fast pc with about half a sec per move, roughly like game in half a minute, is even with a 2800 at roughly game in one hour. So 120 to 1 time odds. I don't believe this is nearly enough, assuming a good opening book. Maybe your estimate assumed no opening book, but since GMs have one in their heads, that wouldn't be fair. Komodo one thread achieved about a 2900 perf. rating in four games vs. GM Erenburg giving 30 to 1 time ratio odds, plus playing Black every game, no opening book past move 3, no ponder, no TBs, and that was a couple years ago.
There is an easy way to test your simulation. Presumably if 1 million nodes simulates 2800 at 45' + 15", then 66,667 nodes would do the same for 3' + 1". We have the 16 game series of 11248 vs Naroditsky to compare. The time limits varied, and the handicaps ranged from 2 pawns and move up to bishop and move, but I think it's roughly fair to treat it as a knight odds match at 3' + 1", which Naroditsky lost badly (11.5 to 4.5 I think, perhaps that was after adjusting a tine loss presumably due to lag to the proper draw result). Naroditsky is not an Elite level GM, but at blitz he is rated up there with the top guys on chess.com, so I think we can call him a 2750 blitz player by FIDE standard rating levels. So I propose that you rerun your ThothFIsh match with nodes at 66,667 and time limit for Lc0 3' + 1". If it is a good simulation then Lc0 should win the match, maybe 10 to 6 or so. But I predict that ThothFish will win the match fairly easily, though there should be several draws I guess.

Laskos · Post by **Laskos** » Wed Apr 17, 2019 12:16 am

lkaufman wrote: ↑Tue Apr 16, 2019 7:47 pm
This is quite surprising to me, not that the simulated GM won, but a 20 to zero score seems unbelievable. I feel something must be wrong with your estimated elo for SF 1 million nodes. You are saying that current SF on one thread on a fast pc with about half a sec per move, roughly like game in half a minute, is even with a 2800 at roughly game in one hour. So 120 to 1 time odds. I don't believe this is nearly enough, assuming a good opening book. Maybe your estimate assumed no opening book, but since GMs have one in their heads, that wouldn't be fair. Komodo one thread achieved about a 2900 perf. rating in four games vs. GM Erenburg giving 30 to 1 time ratio odds, plus playing Black every game, no opening book past move 3, no ponder, no TBs, and that was a couple years ago.
There is an easy way to test your simulation. Presumably if 1 million nodes simulates 2800 at 45' + 15", then 66,667 nodes would do the same for 3' + 1". We have the 16 game series of 11248 vs Naroditsky to compare. The time limits varied, and the handicaps ranged from 2 pawns and move up to bishop and move, but I think it's roughly fair to treat it as a knight odds match at 3' + 1", which Naroditsky lost badly (11.5 to 4.5 I think, perhaps that was after adjusting a tine loss presumably due to lag to the proper draw result). Naroditsky is not an Elite level GM, but at blitz he is rated up there with the top guys on chess.com, so I think we can call him a 2750 blitz player by FIDE standard rating levels. So I propose that you rerun your ThothFIsh match with nodes at 66,667 and time limit for Lc0 3' + 1". If it is a good simulation then Lc0 should win the match, maybe 10 to 6 or so. But I predict that ThothFish will win the match fairly easily, though there should be several draws I guess.

I don't think it's that easy. Humans and engines scale differently from 45' + 15'' to 3' + 1''. I guess an equal score at first time control would mean 200-300 Elo points advantage (or more) for engine at second time control. Also, we are not clear about the scaling of Lc0 with tc, although it seems to resemble a human scaling (weaker than traditional engines at fast tc). Modeling a human top GM at 3' + 1'' at some nodes of SF is probably beyond my abilities, as I have no data on that. If we can just divide (or multiply) the time control (or nodes per move) by 30 to derive relative human-engine strength, the things would have been very easy, for example, extrapolating from short tc to long tc and viceversa. It's not the case.

Also, that's ThothFish with 20-0 result, SF had 17 - 3 draws, so in serious games knowing to convert seems to matter quite a bit, and human GM do know what to do. Also, your 30:1 tc odds match is surely well within error margins of my 1:120 time odds (one core). The difference in our opinions is no more than 100 Elo points, not much at all, and wouldn't explain much a 20-0 score. And I would by now still stand by my opinion that SF_dev at 1 million nodes per move (no book or a general short book for variety) is about 2800 +/- 50 FIDE Elo opponent to a top human GM at 45' + 15''. Might be that the nodes are closer to 0.5 milion, but that just means that 2850 FIDE for 1 million nodes per move is a closer estimate. Isn't it plausible that 1:240 time odds (SF on one thread, no book or a general short book for variety) are in 2750-2800 FIDE range at 45' + 15''?

lkaufman · Post by **lkaufman** » Wed Apr 17, 2019 12:45 am

Laskos wrote: ↑Wed Apr 17, 2019 12:16 am
lkaufman wrote: ↑Tue Apr 16, 2019 7:47 pm
This is quite surprising to me, not that the simulated GM won, but a 20 to zero score seems unbelievable. I feel something must be wrong with your estimated elo for SF 1 million nodes. You are saying that current SF on one thread on a fast pc with about half a sec per move, roughly like game in half a minute, is even with a 2800 at roughly game in one hour. So 120 to 1 time odds. I don't believe this is nearly enough, assuming a good opening book. Maybe your estimate assumed no opening book, but since GMs have one in their heads, that wouldn't be fair. Komodo one thread achieved about a 2900 perf. rating in four games vs. GM Erenburg giving 30 to 1 time ratio odds, plus playing Black every game, no opening book past move 3, no ponder, no TBs, and that was a couple years ago.
There is an easy way to test your simulation. Presumably if 1 million nodes simulates 2800 at 45' + 15", then 66,667 nodes would do the same for 3' + 1". We have the 16 game series of 11248 vs Naroditsky to compare. The time limits varied, and the handicaps ranged from 2 pawns and move up to bishop and move, but I think it's roughly fair to treat it as a knight odds match at 3' + 1", which Naroditsky lost badly (11.5 to 4.5 I think, perhaps that was after adjusting a tine loss presumably due to lag to the proper draw result). Naroditsky is not an Elite level GM, but at blitz he is rated up there with the top guys on chess.com, so I think we can call him a 2750 blitz player by FIDE standard rating levels. So I propose that you rerun your ThothFIsh match with nodes at 66,667 and time limit for Lc0 3' + 1". If it is a good simulation then Lc0 should win the match, maybe 10 to 6 or so. But I predict that ThothFish will win the match fairly easily, though there should be several draws I guess.
I don't think it's that easy. Humans and engines scale differently from 45' + 15'' to 3' + 1''. I bet an equal score at first time control would mean 200-300 Elo points advantage for engine at second time control. Also, we are not clear about the scaling of Lc0 with tc, although it seems to resemble a human scaling (weaker than traditional engines at fast tc). Modeling a human top GM at 3' + 1'' at some nodes of SF is probably beyond my abilities, as I have no data on that. If we can just divide (or multiply) the time control (or nodes per move) by 30 to derive relative human-engine strength, the things would have been very easy, for example, extrapolating from short TC to long TC and viceversa. It's not the case.

Also, that's ThothFish with 20-0 result, SF had 17 - 3 draws, so in serious games knowing to convert seems to matter quite a bit, and human GM do know what to do. Also, your 30:1 tc odds match is surely well within error margins of my 1:120 time odds (one core). The difference in our opinions is about 50-100 Elo points, not much at all, and I would by now still stand by my opinion (quite an educated guess ) that SF_dev at 1 million nodes per move (no book) is about 2800 +/- 50 FIDE Elo opponent to a top human GM at 45' + 15''. Might be that the nodes are closer to 0.5 milion, but that just means that 2850 FIDE for 1 million nodes per move is a closer estimate. Isn't it plausible that 1:240 time odds (one thread, no book) are in 2750-2800 FIDE range at 45' + 15''?

Yes, I realized shortly after posting that it wasn't valid to assume that human and engine scaling were equal. It would still be an interesting test, but it would only give us a maximum error of your estimate, not a realistic one. I agree that with no book half a million nodes might be enough to put SF in the stated range at the stated time control, I just question whether "no book" is the fair comparison. With knight odds there is essentially no theory beyond move 1 or 2, but with normal chess GMs have quite a deep and high quality book in memory. Either SF (or ThothFish) should also have maybe a ten move deep book, or else games should start with rare openings (like 1.a3 a6 for example) that negate the value of memorized theory for both sides. Under those conditions I'd bet on SF even against Magnus with 240 to 1 time odds.
I could probably run some 45' + 15" games between Lc0 11248 on my RTX2080 against FMs at knight odds offering just nominal prize money. They aren't close to GM level, but at least the performance ratings they earn should be more or less valid against much stronger opposition. Perhaps I'd give Lc0 somewhat less time to increase the number of games that can be played in a day, assuming it would have only a slight adverse effect on results at knight odds. It would be nice to know whether your original conjecture or your simulation is more correct.

What engine breaks even with GMs in blitz?

Re: What engine breaks even with GMs in blitz?

Re: What engine breaks even with GMs in blitz?

Re: What engine breaks even with GMs in blitz?