Stockfish 11 at 120k nodes per move

Alayan · Post by **Alayan** » Fri Jan 17, 2020 5:07 pm

Following up on the discussion in the 2xEpyc 7742 thread.

bob wrote: ↑Wed Jan 15, 2020 7:17 pm That 5K nodes per second is a REAL restriction. Many were doing 5K nodes per second in the 70's and 80's. And far beyond. Without a GM being produced.

I am certain that the 2700 Elo at 2K nodes per second is a wild exaggeration of reality. Maybe 500K nodes per second, possible. Certainly not 2K.

MikeB wrote: ↑Thu Jan 16, 2020 12:39 am All my tests show that SF at 50K/sec - might be somewhere at the GM level - whether it is a Super GM or a more ordinary GM, who knows. I would be interested in hearing from Larry Kaufman on this.

bob wrote: ↑Thu Jan 16, 2020 7:21 pm How far it is above Crafty really doesn't affect my comments. 2K nodes per second is not going to beat a GM. Might not beat a master, since most of the speculative pruning stuff really needs exceptional search depth to bypass the holes it causes.

1 second per move vs 180 seconds per move? Maybe. But certainly not 360K nodes total vs 180 seconds for a GM.

CCRL 40/15 ratings are roughly anchored at how a 2700 elo human would perform at 40 moves in 40 minutes. This is short of classical TC, but not that much.

Gaviota 1.0 on 1 core is rated 2861 at CCRL 40/15. So, it is at the very least at Super-GM strength at this TC. (I wanted to test Crafty but I got "Engine Crafty 25.2(1) did not start the chess protocol in time" errors).

Because actually running on my computer games with 40 moves per 20 minutes TC (roughly standardized to CCRL, if anything, longer) against SF @ fixed nodes would take very long, I've divided time for both by 3. Gaviota with 40 moves / 400s, vs SF with 120K nodes per move. This is around 1:150 time odds, with the added handicap of fixed nodes per move for SF (no time management, impossibility to properly complete depths without wasting a big part of the node budget).

For diversity, I used random 8moves_v3 openings, with repeat so that both engines played each side once.

Here are the results :

Code: Select all

Score of Stockfish 11 at 120knpm vs Gaviota 1.0: 72 - 54 - 74  [0.545] 200
Elo difference: 31.35 +/- 38.39

Full event PGN :

Now, keep in mind that dividing time/nodes by 3 hurts Stockfish much more than Gaviota. 120K nodes is right in the zone where a TC doubling is worth ~200 elo, for ~300 elo lost, while it's unlikely to be more than 150-200 for Gaviota.

The obvious extrapolation from all this is that Stockfish @ 360K nodes per move would crush the average GM having classical TC. And just in case there is any doubt, crank contempt up.

lkaufman · Post by **lkaufman** » Fri Jan 17, 2020 5:36 pm

We actually have some real data for Komodo. About 2 or 3 years ago Komodo played a four game match with GM Sergei Erenburg, with Komodo running on 1 laptop core, no ponder, giving 30 to 1 time odds (3' + 1" vs 90' + 30"). Komodo also gave him White every game, only 3 move opening book, and no TBs. His FIDE rating was a bit under 2600. Komodo on one core probably ran a bit under 2 million NPS, but dividing by the 30 to 1 odds maybe about 60 KNSP. Result was 3.5 to 0.5 in favor of Komodo. Given the other handicaps and the result, I think it's reasonable to say that Komodo then at about 10 knps was around FIDE 2600 level in standard chess. Presumably Stockfish 11 now would reach that level at perhaps 5 knps. I believe that the CCRL ratings in question are such that 2700 should be at least equal to FIDE 2700 at standard chess, not at rapid. There were enough matches and tournaments at standard time controls around the turn of the century to calibrate this rather well.

Raphexon · Post by **Raphexon** » Fri Jan 17, 2020 5:44 pm

So this is basically some empirical evidence that my claim of SF at 100 n/s = GM level was right.
With +-2400 being at the low end of GM level.

People underestimate engines.

bob · Post by **bob** » Fri Jan 17, 2020 6:17 pm

OK, first, I assume that Komodo was not pondering? If that is wrong, then this brings the entire match into question since ponder hits would search FAR more nodes than when having to think on own time at a 30:1 handicap. So the obvious first question = ponder off?

Next, the published ratings everywhere have always been inflated due to the programs at the top whacking the programs lower down, and quickly climbing into the stratosphere. When you said CCRL == FIDE, are there many human games included in CCRL to normalize them? Taking one or two programs and factoring in a FIDE rating from a human event doesn't sound very accurate. For years, SSDF had to occasionally deflate all ratings to get them back to "sanity".

Will be interesting to learn the info for my first question...

The approach I was describing was for SF to search for 300K, then sit and wait while the GM took 3 minutes. Then search for 300K, then sit. No pondering which would completely wreck the 300K limit.

Guenther · Post by **Guenther** » Fri Jan 17, 2020 6:23 pm

n.t.
[quote=bob post_id=826242 time=1579281447 user_id=11]
OK, first, I assume that Komodo was not pondering? If that is wrong, then this brings the entire match into question since ponder hits would search FAR more nodes than when having to think on own time at a 30:1 handicap. So the obvious first question = ponder off?

Next, the published ratings everywhere have always been inflated due to the programs at the top whacking the programs lower down, and quickly climbing into the stratosphere. When you said CCRL == FIDE, are there many human games included in CCRL to normalize them? Taking one or two programs and factoring in a FIDE rating from a human event doesn't sound very accurate. For years, SSDF had to occasionally deflate all ratings to get them back to "sanity".

Will be interesting to learn the info for my first question...

The approach I was describing was for SF to search for 300K, then sit and wait while the GM took 3 minutes. Then search for 300K, then sit. No pondering which would completely wreck the 300K limit.

[quote=lkaufman post_id=826239 time=1579278984 user_id=4773]
We actually have some real data for Komodo. About 2 or 3 years ago Komodo played a four game match with GM Sergei Erenburg, with Komodo running on 1 laptop core, no ponder, giving 30 to 1 time odds (3' + 1" vs 90' + 30")

lkaufman · Post by **lkaufman** » Fri Jan 17, 2020 7:07 pm

Raphexon wrote: ↑Fri Jan 17, 2020 5:44 pm So this is basically some empirical evidence that my claim of SF at 100 n/s = GM level was right.
With +-2400 being at the low end of GM level.

People underestimate engines.

Did you omit a zero? A thousand n/s is at least in the ballpark, a hundred is ridiculous. I have the Revelation chessboard with Komodo, which allows for reducing its NPS drastically. Depending on the time I take, a setting of around 2k n/s is a good opponent for me in rapid. So maybe 1k for SF 11. But although I am a GM my playing strength at age 72 is not GM level. Since earning the GM title requires a 2500 FIDE rating (well, not in my case!), I think that 2500 should be the definition of GM strength. You don't lose your title if you drop below it, but you aren't necessarily still of GM strength.
Regarding CCRL ratings in another post, if you look up the old engines that actually played serious matches and tournaments twenty years ago or so and got performance ratings in the 2700 ballpark (with some interpolation/extrapolation necessary when versions don't match exactly), you can see that in general those engines are rated by CCRL somewhat below their performance ratings, despite CCRL hardware being vastly superior (even the hardware used before recent change from 40/40 to 40/15). So this means that if an engine is rated 2700 or 2800 on CCRL 40/15 list, it is a safe bet that it would be favored to win a standard match against a similarly rated human today if it is running on the reference i7 hardware and has a good quality opening book. Probably by 200 elo or more. This does not mean that a 3600 rated engine could earn a 3600 performance rating vs. humans, that's a different question.

Raphexon · Post by **Raphexon** » Fri Jan 17, 2020 7:42 pm

lkaufman wrote: ↑Fri Jan 17, 2020 7:07 pm
Raphexon wrote: ↑Fri Jan 17, 2020 5:44 pm So this is basically some empirical evidence that my claim of SF at 100 n/s = GM level was right.
With +-2400 being at the low end of GM level.

People underestimate engines.
Did you omit a zero? A thousand n/s is at least in the ballpark, a hundred is ridiculous. I have the Revelation chessboard with Komodo, which allows for reducing its NPS drastically. Depending on the time I take, a setting of around 2k n/s is a good opponent for me in rapid. So maybe 1k for SF 11. But although I am a GM my playing strength at age 72 is not GM level. Since earning the GM title requires a 2500 FIDE rating (well, not in my case!), I think that 2500 should be the definition of GM strength. You don't lose your title if you drop below it, but you aren't necessarily still of GM strength.
Regarding CCRL ratings in another post, if you look up the old engines that actually played serious matches and tournaments twenty years ago or so and got performance ratings in the 2700 ballpark (with some interpolation/extrapolation necessary when versions don't match exactly), you can see that in general those engines are rated by CCRL somewhat below their performance ratings, despite CCRL hardware being vastly superior (even the hardware used before recent change from 40/40 to 40/15). So this means that if an engine is rated 2700 or 2800 on CCRL 40/15 list, it is a safe bet that it would be favored to win a standard match against a similarly rated human today if it is running on the reference i7 hardware and has a good quality opening book. Probably by 200 elo or more. This does not mean that a 3600 rated engine could earn a 3600 performance rating vs. humans, that's a different question.

What version of Komodo is the Relevation Chessboard shipped with or can you install new versions on it?
And things to remember is that SF is slower than Komodo on the same hardware. (Not sure if that's also true for the most recent versions of K though)
So artificially reducing n/s to a specific amount would benefit Stockfish relative to Komodo.

Also I know Komodo scales very well with time, so I wonder what kind of behaviour it shows at super low node counts.

Fritz Bilbao is rated an exact 2700 on the 40/40 list, and we know how well it performed on a Centrino notebook, so yea it's not hard to believe it's 200 real ELO stronger on a more modern i7.

bob · Post by **bob** » Fri Jan 17, 2020 7:47 pm

Except that processor is irrelevant in a fixed node search. Or did I misunderstand your comment?

Alayan · Post by **Alayan** » Fri Jan 17, 2020 10:35 pm

Thank you Larry for your interesting input.

To complete, my initial post, I ran SF11 @ 120knpm vs SF11 @ 360knpm. It appears that by eyeballing the doubling at 200 elo in that npm range I overestimated it somewhat. My data for SF from may is 180 elo from 130k to 260k and 130 elo from 260k to 520k.

Code: Select all

Score of Stockfish 11 at 120knpm vs Stockfish 11 at 360knpm: 37 - 643 - 320  [0.197] 1000
Elo difference: -244.10 +/- 19.05

My main point stands, though.

Raphexon wrote: ↑Fri Jan 17, 2020 5:44 pm So this is basically some empirical evidence that my claim of SF at 100 n/s = GM level was right.
With +-2400 being at the low end of GM level.

No, that's wrong.

The strength loss is dramatic when reducing node count that much. Just see my graph at the beginning, at such low node counts a doubling makes a massive difference. SF at 15K nodes per move is in the ballpark of 800 elo weaker than SF at 300K nodes per move.

Stockfish's development is focused on improving strength in the 200K+ nodes per move range, and advanced search tricks will never be able to help when you have only a few thousands nodes per move total. So I can predict that SF will never ever reach the GM level at 100n/s.

lkaufman · Post by **lkaufman** » Fri Jan 17, 2020 11:04 pm

Raphexon wrote: ↑Fri Jan 17, 2020 7:42 pm
lkaufman wrote: ↑Fri Jan 17, 2020 7:07 pm
Raphexon wrote: ↑Fri Jan 17, 2020 5:44 pm So this is basically some empirical evidence that my claim of SF at 100 n/s = GM level was right.
With +-2400 being at the low end of GM level.

People underestimate engines.
Did you omit a zero? A thousand n/s is at least in the ballpark, a hundred is ridiculous. I have the Revelation chessboard with Komodo, which allows for reducing its NPS drastically. Depending on the time I take, a setting of around 2k n/s is a good opponent for me in rapid. So maybe 1k for SF 11. But although I am a GM my playing strength at age 72 is not GM level. Since earning the GM title requires a 2500 FIDE rating (well, not in my case!), I think that 2500 should be the definition of GM strength. You don't lose your title if you drop below it, but you aren't necessarily still of GM strength.
Regarding CCRL ratings in another post, if you look up the old engines that actually played serious matches and tournaments twenty years ago or so and got performance ratings in the 2700 ballpark (with some interpolation/extrapolation necessary when versions don't match exactly), you can see that in general those engines are rated by CCRL somewhat below their performance ratings, despite CCRL hardware being vastly superior (even the hardware used before recent change from 40/40 to 40/15). So this means that if an engine is rated 2700 or 2800 on CCRL 40/15 list, it is a safe bet that it would be favored to win a standard match against a similarly rated human today if it is running on the reference i7 hardware and has a good quality opening book. Probably by 200 elo or more. This does not mean that a 3600 rated engine could earn a 3600 performance rating vs. humans, that's a different question.
What version of Komodo is the Relevation Chessboard shipped with or can you install new versions on it?
And things to remember is that SF is slower than Komodo on the same hardware. (Not sure if that's also true for the most recent versions of K though)
So artificially reducing n/s to a specific amount would benefit Stockfish relative to Komodo.

Also I know Komodo scales very well with time, so I wonder what kind of behaviour it shows at super low node counts.

Fritz Bilbao is rated an exact 2700 on the 40/40 list, and we know how well it performed on a Centrino notebook, so yea it's not hard to believe it's 200 real ELO stronger on a more modern i7.

Revelation Komodo was around Komodo 12 (maybe near the end of the 11 series, I forget exactly). Most of the elo gain since then has been in the MCTS version, so probably that's not a major issue in this context. No way to update, but no need for this purpose. NPS is now close enough between Komodo and Stockfish to disregard for purposes such as this. However at low node counts Komodo is much stronger than Stockfish due to Stockfish doing WAY more PV pruning than Komodo, which makes it very weak at depths like 5 ply or so. But with NPS in the thousands that probably isn't a significant issue.
Stockfish at 100 nps is very weak. But I wouldn't say it will never be GM strength, just not without major change to Stockfish. Lc0 may already be close to GM strength (at least GM rapid or blitz strength) at 100 NPS, so Stockfish just needs to clone Lc0 to do that! Of course that won't happen, but the point is that eventually it will be normal for engines to play GM level with very few nodes, somehow.

Stockfish 11 at 120k nodes per move

Stockfish 11 at 120k nodes per move

Re: Stockfish 11 at 120k nodes per move

Re: Stockfish 11 at 120k nodes per move

Re: Stockfish 11 at 120k nodes per move

Re: Stockfish 11 at 120k nodes per move

Re: Stockfish 11 at 120k nodes per move

Re: Stockfish 11 at 120k nodes per move

Re: Stockfish 11 at 120k nodes per move

Re: Stockfish 11 at 120k nodes per move

Re: Stockfish 11 at 120k nodes per move