This is quite surprising to me, not that the simulated GM won, but a 20 to zero score seems unbelievable. I feel something must be wrong with your estimated elo for SF 1 million nodes. You are saying that current SF on one thread on a fast pc with about half a sec per move, roughly like game in half a minute, is even with a 2800 at roughly game in one hour. So 120 to 1 time odds. I don't believe this is nearly enough, assuming a good opening book. Maybe your estimate assumed no opening book, but since GMs have one in their heads, that wouldn't be fair. Komodo one thread achieved about a 2900 perf. rating in four games vs. GM Erenburg giving 30 to 1 time ratio odds, plus playing Black every game, no opening book past move 3, no ponder, no TBs, and that was a couple years ago.Laskos wrote: ↑Mon Apr 15, 2019 8:30 pmI performed this longish test (2 days) using ThothFish. First at fast time control I checked which settings work to convert best the Knight advantage. At some fast disbalanced TC I got for ThothFish with "Exchange" factors = 200 (all of them) compared to SF_dev:lkaufman wrote: ↑Wed Apr 10, 2019 12:04 amWell, Lc0 won the last (8th) game vs. Arasan, so the final score was 2.5 to 5.5 for Lc0 giving knight odds at 45' + 15", pretty much what we would expect based on your result at 15' + 5". The ThothFish test sounds interesting; I wonder how strong the incentive to exchange (especially queens) should be for optimum results at knight odds. I suppose though that it's not perfect in that it might still try to trade even if it loses back the piece. For example if it will pay a pawn to trade queens, that might be okay when up a full piece but is certainly not ok when up a piece for two pawns.Laskos wrote: ↑Tue Apr 09, 2019 10:48 pmMight start looking similar to my result, although I expect Arasan to perform worse at 45' + 15'' than at 15' + 5'' (and your result will probably show that). That endgame you describe seems a bit funny.lkaufman wrote: ↑Tue Apr 09, 2019 8:28 pmMuch to my surprise, Lc0 won that "drawn" game making the score 1.5 -5.5, not 1-6. Lc0 had a lone queen against bishop, knight, and three pawns, and so I assumed (and the evals indicated) that Lc0 would seek perpetual check. But somehow it picked up all three of the pawns, one by one over many moves, and won the queen vs. two minors endgame (no TBs used).lkaufman wrote: ↑Tue Apr 09, 2019 6:48 pm
I ran the same test as you did overnight, except that I ran at the actual 45' + 15" level under discussion instead of your 15' + 5" level. So far I have five wins for Arasan 14.3, no wins for Lc0 11248, one draw, and the current game pretty clearly a draw so call it two draws. My 2080 is about 20% faster than your 2070 we determined, but perhaps my 4.9 GHz i7 is also a bit faster than yours? Anyway, it seems that tripling the time limit made a big difference, 6-1 instead of 6-4.
I am getting quite interesting results with ThothFish, a SF derivative which can be adjusted to like or dislike swapping pieces to desired degree. I am playing with some parameters at fast TC, and got a "weak" (small number of nodes) ThothFish which likes very much swapping pieces and overperforms heavily the regular "weak" (small number of nodes) SF, both being Knight up against "strong" (and handicapped) Lc0 11248. Adjusted in this way SF can probably model somehow a human too.
So, Arasan results are sure not the final word.
59/100 conversions for ThothFish
41/100 conversions for SF_dev
Now, using those settings (=200) I found to work best, I played real games with TC = 45' + 15'' for Lc0 11248. The human opponent is "ThothFish200" at 1 million nodes per move. I found 1-2 years ago and I repeated my reasoning these days, and I am am pretty confident that SF_dev at about 1 million nodes per move is a 2750 - 2850 FIDE Elo opponent for a human at 45' + 15'' time control. That is about 0.15s/move on my 4 core PC, or game in 10 seconds. I will not repeat the reasoning as it is long and not illuminating, but I am quite confident of it.
Real games:
Lc0 11248 at 45' + 15'' at Knight odds versus SF_dev at 1 million nodes/move:
+17 =3 -0
for SF_dev (mimicking human). So it seems that Lc0 could draw 3/20 games against a top-GM (2800 +/ 50 FIDE) at Knight odds.
But here comes ThothFish, which knows better to convert Knight advantage (but is no stronger than SF_dev in normal play, actually a bit weaker), and the result is different:
Lc0 11248 at 45' + 15'' at Knight odds versus ThothFIsh at 1 million nodes/move:
+20 =0 -0
for ThothFish (mimicking human). It seems Lc0 cannot draw or win any game of out of 20 against a top-GM at Knight odds.
I am not sure how well ThothFIsh at 1 million nodes/move mimics a human, it probably blunders tactically less than a human. Probably we got pretty good results at Knight odds against Arasan mimicking human because Arasan knows worse than SF how to convert a Knight advantage, and MUCH worse than ThothFish. So, it seems Knight odds against a very top GM at 45' + 15'' are possible only if human top GM blunders pretty badly. That's not exactly what I had hoped for. Maybe against Lc0 playing very aggressively and forcing the human to blunder it could be fun, but I guess the playing top GM would be annoyed by Lc0 play.
There is an easy way to test your simulation. Presumably if 1 million nodes simulates 2800 at 45' + 15", then 66,667 nodes would do the same for 3' + 1". We have the 16 game series of 11248 vs Naroditsky to compare. The time limits varied, and the handicaps ranged from 2 pawns and move up to bishop and move, but I think it's roughly fair to treat it as a knight odds match at 3' + 1", which Naroditsky lost badly (11.5 to 4.5 I think, perhaps that was after adjusting a tine loss presumably due to lag to the proper draw result). Naroditsky is not an Elite level GM, but at blitz he is rated up there with the top guys on chess.com, so I think we can call him a 2750 blitz player by FIDE standard rating levels. So I propose that you rerun your ThothFIsh match with nodes at 66,667 and time limit for Lc0 3' + 1". If it is a good simulation then Lc0 should win the match, maybe 10 to 6 or so. But I predict that ThothFish will win the match fairly easily, though there should be several draws I guess.