Someone lit a fire under the Stockfish team

leavenfish · Post by **leavenfish** » Fri Mar 02, 2018 3:55 am

Dann Corbit wrote:
Jouni wrote:NCM testing still shows +3 after SF9 - nothing spectacular so far .
Pohl already shows +8 from the February 9th version, and I expect more when he tests the latest version.

I am also not nearly so interested in an Elo gain as I am in a long term analysis gain. My gedankenexperiment about the current changes tells me that they are very good.

+9 = fire?? Maybe just a lot of smoke....
Where have you heard about any 'long term analysis gain'?

tpoppins · Post by **tpoppins** » Fri Mar 02, 2018 8:20 am

leavenfish wrote:
Dann Corbit wrote:Pohl already shows +8 from the February 9th version, and I expect more when he tests the latest version.

I am also not nearly so interested in an Elo gain as I am in a long term analysis gain. My gedankenexperiment about the current changes tells me that they are very good.
+9 = fire?? Maybe just a lot of smoke....

Or hot air? ;D

leavenfish wrote:Where have you heard about any 'long term analysis gain'?

Indeed. Of the three top engines Stockfish is the least suitable analysis tool. However, at least three of its derivatives -- Matefinder, McBrain and Sting -- are geared towards analysis rather than Elo chase, so I suppose there's hope that any gains the master branch makes will eventually filter downstream (at least in the case of the first two).

Dann Corbit · Post by **Dann Corbit** » Fri Mar 02, 2018 9:05 pm

tpoppins wrote:
leavenfish wrote:
Dann Corbit wrote:Pohl already shows +8 from the February 9th version, and I expect more when he tests the latest version.

I am also not nearly so interested in an Elo gain as I am in a long term analysis gain. My gedankenexperiment about the current changes tells me that they are very good.
+9 = fire?? Maybe just a lot of smoke....
Or hot air? ;D

leavenfish wrote:Where have you heard about any 'long term analysis gain'?
Indeed. Of the three top engines Stockfish is the least suitable analysis tool. However, at least three of its derivatives -- Matefinder, McBrain and Sting -- are geared towards analysis rather than Elo chase, so I suppose there's hope that any gains the master branch makes will eventually filter downstream (at least in the case of the first two).

I guess that the SF from the 28th will be at least +20 Elo from SF 9 when tested, which, at this level, is an incredible gain. I don't just look at what the patches accomplish as far as Elo, I look at what the code change does.

As far as SF being useless for analysis, that is simply wrong. It is the best engine for analysis of quiet positions, which is what most chess positions in the real world are.

The tactical shots that other engines are great at deciphering are the positions that are achieved by the accumulation of little micro-mistakes by your opponent because you out-thought him by a sliver at a time.

On the other hand, I use the big three along with the mentioned engines in this post and several others in my analysis (especially Shredder). We are, after all, seeking the truth.

I think that TCEC clearly shows that the SF approach is a very good one.

I also think that we should all marvel at Fire. (Speaking of fire).

Top engine looks like SF, which has a titanic team and enormous resources.

K and H are right on its heels. Not surprising since these are professional engines which allows for enormous development effort by the talented programmers that lead those projects.

Now consider Fire. One guy. To be 4th is really very surprising (well, probably 5th after Shredder, but that is neither here nor there).

IMO-YMMV

Ovyron · Post by **Ovyron** » Fri Mar 02, 2018 10:43 pm

Dann Corbit wrote:On the other hand, I use the big three along with the mentioned engines in this post and several others in my analysis (especially Shredder).

I have halted use of Stockfish 9 in my analysis, McBrain 9 is clearly better, and the redundancy I get is not worth using both (specially when I already have the redundancy of Learning Stockfish in there.)

What is your experience on this? Is Stockfish 9 better than its derivatives or have you found that using them together is better?

Dann Corbit · Post by **Dann Corbit** » Fri Mar 02, 2018 11:47 pm

Ovyron wrote:
Dann Corbit wrote:On the other hand, I use the big three along with the mentioned engines in this post and several others in my analysis (especially Shredder).
I have halted use of Stockfish 9 in my analysis, McBrain 9 is clearly better, and the redundancy I get is not worth using both (specially when I already have the redundancy of Learning Stockfish in there.)

What is your experience on this? Is Stockfish 9 better than its derivatives or have you found that using them together is better?

I have collected a large body of statistics from many millions of games, so I usually know what the right move is. I used engines to confirm (or to overturn if everyone on the planet has been overlooking something).

In general, SF gets the right answer faster than the derivatives.
But for tough tactical problems, the derivatives are faster.

So, for instance, if I am analyzing some new opening book, I will use Stockfish (and a few other engines but not the SF derivatives unless something indicates I need to try one on some special position).

If, on the other hand, I am analyzing some new problem set, I won't even bother with SF but go right to the offshoots (unless I am simply curious to see how SF will perform).

Ovyron · Post by **Ovyron** » Sat Mar 03, 2018 12:29 am

Okay, thanks.

I use them for analysis of Correspondence Games and they help me show what move to make on positions. I have found out that there's no such thing as "best move" in a quiet position, and what you need is a plan (or to find the opponent's plan, and refute it.)

The most dangerous positions, the ones that have gotten me into big trouble, are tactical positions that look quiet, where Stockfish 9 might wrongly show a score close to 0 (or even claim I have a small edge), but there's some deep tactic that the derivatives are finding faster, that may jump out to 0.40ish scores against me. If this happens from the distance I might avoid the variation, but it has happened at the root, so I get 0.00ish scores one move, 0.40ish scores the next, and then some infinite fail low that starts at 0.60s and keeps falling to the 0.90s without some end to it in sight...

I just can't allow that to happen, so hoping a derivative sees it in time is the best I can hope for. Maybe different engines are better depending on one's needs, and maybe I should be calibrating my opponents, and if they seem stronger than me, use negative contempt and try to draw the game at any price, these dangerous positions only happen because I aim for the least drawish variations, which is bad if I'm losing...

tpoppins · Post by **tpoppins** » Sun Mar 04, 2018 11:47 am

Dann Corbit wrote:As far as SF being useless for analysis, that is simply wrong.

I don't think anyone, even Lyudmil, would argue with you about that. I also wonder why you bring it up, as I don't see anyone here stating that SF is useless for analysis. Is it a straw man argument or simply a failure to read carefully?

Dann Corbit wrote:It is the best engine for analysis of quiet positions, which is what most chess positions in the real world are.

I believe it's been less than two months since I posted an asmFish-Houdini game where in a relatively quiet position White withdrew his dark-squared bishop to h2, pushed g3 and sat quietly nursing a drawish eval while being effectively a piece down, for the next dozen or so moves while Black proceeded to quietly tighten the noose around White's neck.

In the past two years there have been numerous posts about an alarming number of positions SF evaluates as 0.00. Someone even coined the term "Drawfish". I personally have seen hundreds of such positions on Let's Check, ranging from quiet to turbo-charged. There is a current thread discussing another such case and SF's "tunnel vision".

Then you have the AlphaZero match, a rather convincing demonstration that SF doesn't understand the first thing about quiet positions.

You seem strangely oblivious to all that. Not so long ago as last summer you stated here

Dann Corbit wrote:If a really strong engine failed to solve a problem, it probably means we did not give it enough time. Even bad pruning decisions will eventually be overcome by sufficient depth, because there is no algorithm that prunes down to zero (unless it is a pure loss or unless the program has a serious bug).

which is a belief I myself had been an adherent of for years (didn't I contribute hundreds of CPU-hours to your STS tests a while ago?), but which now sounds to me as plausible as the belief that if you tried to get to the Moon in a hot-air balloon and failed, that just means that you need a lot more time and balloons.

I largely agree with the rest of your post, but like you said, that's neither here nor there.

zullil · Post by **zullil** » Sun Mar 04, 2018 12:30 pm

tpoppins wrote:
In the past two years there have been numerous posts about an alarming number of positions SF evaluates as 0.00. Someone even coined the term "Drawfish". I personally have seen hundreds of such positions on Let's Check, ranging from quiet to turbo-charged. There is a current thread discussing another such case and SF's "tunnel vision".

I may regret this interjection, but an alarming number of positions deserve 0.00. One could argue that every position in any decently played game is a 0.00 position, at least in a theoretical sense.

Of course, the purpose of evaluation in an engine is not to establish the theoretical value of the position, but to help the engine choose a move to play. So the real questions are, how often does Stockfish choose a move that converts a draw into a loss---or a win into a draw---and why does Stockfish make such choices? It is my understanding, perhaps wrong, that most of Stockfish's 0.00 evaluations ultimately result from second repetitions of positions, and by the engine's inability to find anything in the search tree with a better score. Perhaps a more refined evaluation of "quiet positions" would provide such non-zero scores. Or perhaps such an evaluation would simply slow the engine down and cost it more Elo than it gains.

About "tunnel vision", I think I largely agree with Dann. Given enough time/depth, Stockfish will almost always find a winning move if one exists. Though sometimes the wait can be excruciating.

[D] rk6/p1r3p1/P3B1Kp/1p2B3/8/8/8/8 w - - 0 1

The very latest SF-dev (1 thread):

Code: Select all

info depth 70 seldepth 80 multipv 1 score cp -29 nodes 269400017121 nps 5896122 hashfull 999 tbhits 0 time 45691046 pv e6d5 b8c8 d5a8 c7c4 e5g7 b5b4 g6h6 c8c7 a8d5 c4c5 d5e4 c5a5 g7d4 a5a6 h6g5 a6a3 g5f4 c7d6 e4f3 a7a5 f4e4 a5a4 f3d1 b4b3 e4d3 b3b2 d3c2 b2b1q c2b1 a3d3 d1a4 d3d4 a4b3 d4d2 b1c1 d2f2 b3c2 d6c5 c1d1 c5d4 c2b1 f2g2 b1f5 d4c3 d1e1 g2b2 f5g4 c3d4 g4f5 d4e3 e1d1 b2e2 f5b1 e2d2 d1c1 d2h2 c1d1 e3d4 b1g6 d4c3 g6f5 c3b2 f5d3 b2b3 d3f5 b3c3 f5g6 c3d4 g6e8 d4e3 e8g6 h2f2 g6b1 f2g2 b1f5 g2d2 d1e1 d2a2 e1d1 a2f2

But I'm confident that Bd7 will emerge!

Thanks to Bernhard Bauer for sharing this position.

Laskos · Post by **Laskos** » Sun Mar 04, 2018 12:35 pm

tpoppins wrote:
You seem strangely oblivious to all that. Not so long ago as last summer you stated here
Dann Corbit wrote:If a really strong engine failed to solve a problem, it probably means we did not give it enough time. Even bad pruning decisions will eventually be overcome by sufficient depth, because there is no algorithm that prunes down to zero (unless it is a pure loss or unless the program has a serious bug).
which is a belief I myself had been an adherent of for years (didn't I contribute hundreds of CPU-hours to your STS tests a while ago?), but which now sounds to me as plausible as the belief that if you tried to get to the Moon in a hot-air balloon and failed, that just means that you need a lot more time and balloons.

I agree, and a nice parallel. I was against over-analyzing openings with engines in my amateurish opening test suite, and I just left it as it is, with some 15-20% of positions completely unsolved by engines, but which are favored by several databases. Maybe 10% of solutions are wrong, but over-analyzing might give say 20% "engine" solutions, which are wrong for "engine" reasons (and the top engines unfortunately became in the last say 5-10 years pretty similar).

tpoppins · Post by **tpoppins** » Sun Mar 04, 2018 1:34 pm

zullil wrote:I may regret this interjection, but an alarming number of positions deserve 0.00.

Perhaps they do, but not for the reasons SF "thinks" they are 0.00. See your own quote below:

zullil wrote:It is my understanding, perhaps wrong, that most of Stockfish's 0.00 evaluations ultimately result from second repetitions of positions, and by the engine's inability to find anything in the search tree with a better score.

zullil wrote:One could argue that every position in any decently played game is a 0.00 position, at least in a theoretical sense.

Substitute "perfectly" for "decently" and we are in perfect agreement. But where are they, those perfectly played games? And who be the judge of perfection?

zullil wrote:About "tunnel vision", I think I largely agree with Dann. Given enough time/depth, Stockfish will almost always find a winning move if one exists. Though sometimes the wait can be excruciating. :wink:
[D] rk6/p1r3p1/P3B1Kp/1p2B3/8/8/8/8 w - - 0 1

The very latest SF-dev (1 thread):
Code: Select all
info depth 70 seldepth 80 multipv 1 score cp -29 nodes 269400017121 nps 5896122 hashfull 999 tbhits 0 time 45691046 pv e6d5 b8c8 d5a8 c7c4 e5g7 b5b4 g6h6 c8c7 a8d5 c4c5 d5e4 c5a5 g7d4 a5a6 h6g5 a6a3 g5f4 c7d6 e4f3 a7a5 f4e4 a5a4 f3d1 b4b3 e4d3 b3b2 d3c2 b2b1q c2b1 a3d3 d1a4 d3d4 a4b3 d4d2 b1c1 d2f2 b3c2 d6c5 c1d1 c5d4 c2b1 f2g2 b1f5 d4c3 d1e1 g2b2 f5g4 c3d4 g4f5 d4e3 e1d1 b2e2 f5b1 e2d2 d1c1 d2h2 c1d1 e3d4 b1g6 d4c3 g6f5 c3b2 f5d3 b2b3 d3f5 b3c3 f5g6 c3d4 g6e8 d4e3 e8g6 h2f2 g6b1 f2g2 b1f5 g2d2 d1e1 d2a2 e1d1 a2f2
But I'm confident that Bd7 will emerge!.

Ah, happy thoughts! Remember this, though?

zullil wrote:
peter wrote:Hi Louis!

Very nice position indeed!

Made a .pgn about it:
Code: Select all
&#91;Event "CCC"&#93;
&#91;Site "?"&#93;
&#91;Date "2015.04.25"&#93;
&#91;Round "?"&#93;
&#91;White "Corbit, Dann"&#93;
&#91;Black "Zulli, Louis"&#93;
&#91;Result "1-0"&#93;
&#91;SetUp "1"&#93;
&#91;FEN "4q1kr/p6p/1prQPppB/4n3/4P3/2P5/PP2B2P/R5K1 w - -"&#93;

1. Qxe5 $1 &#40;1. Qa3 $2 Rxe6 2. Qxa7 Qe7 3. Qa8+ Qe8 4. Qb7 
Qe7 5. Qa8+ Qe8&#41; 1... fxe5 2. Rf1 Rc8 &#40;2... a6 3. Bd1 b5 
&#40;3... Rc7 4. Bb3 Re7 5. Bd5 b5 6. Kg2 a5 7. a3 b4 8. axb4 
Qb8 9. b5 Qe8 10. b6 a4 11. b7 a3 12. Rf7 axb2 13. Rxe7 Qb5 
14. Re8+ Qxe8 15. e7+ Qf7 16. e8=Q#) 4. Bb3 Rc4 5. a4 Qe7 
6. Kg2&#41; 3. Bd1 b5 4. Bb3 Rc4 5. a4 &#40;5. Rf3&#41; &#40;5. a3 a6 6. a4 
Qe7 7. Kg2 Qe8 8. Rf3 &#40;8. h3&#41; 8... Qd8&#41; 5... a6 6. Ba2 Qe7 
7. Kg2 Qg7 8. Bxc4 bxc4 9. Bxg7 Kxg7 10. Rf7+ 1-0
This position is from the game below, or available at http://www.chessgames.com/perl/chessgame?gid=1472988

Here is the position after Black's 23rd move, with SF's current PV:
[D]4q1kr/p6p/1prQPppB/4n3/4P3/2P5/PP2B2P/R5K1 w - - 1 24
Code: Select all
info depth 50 seldepth 81 multipv 1 score cp 397 nodes 940558094988 nps 27958134 hashfull 999 tbhits 0 time 33641661 pv d6e5 f6e5 a1f1 a7a6 e2d1 b6b5 d1b3 c6c4 a2a4 e8e7 g1g2 e7e8 f1f3 e8e7 b3a2 e7g7 h6g7 g8g7 a2c4 b5c4 f3f7 g7h6 e6e7 h8e8 g2g3 g6g5 f7f6 h6g7 f6a6 e8e7 g3g4 e7b7 g4g5 g7f7 g5f5 b7b2 a6a7 f7e8 a4a5 e8d8 a5a6 b2a2 f5e5 d8c8 e5d5 c8b8 a7h7 a2a6 h2h4 b8c8 h4h5 c8d8 d5c4 d8e8 c4d5 a6a5 d5d4 e8f8 e4e5 a5a6 h5h6 f8g8 h7g7 g8h8
This PV was obtained by having SF search the position after 24. Qxe5 fxe5 (with 16 threads, 16 GB hash and null move pruning disabled) until it saw a promising continuation for White. The search was then stopped but the hash table was left intact, and SF was given the position above. In short, "cheating" was involved.:wink: I don't know how long SF would take to find the key move Qxe5 with an empty hash table. But, in any case, we now have strong evidence that the sacrifice is sound, and wins with correct play.

[pgn]
[Event "Molniya Sporting Society"]
[Site "Chelyabinsk RUS"]
[Date "1946.??.??"]
[EventDate "?"]
[Round "5"]
[Result "1-0"]
[White "Yuri S Gusev"]
[Black "E Auerbach"]
[ECO "B70"]
[WhiteElo "?"]
[BlackElo "?"]
[PlyCount "73"]

1.e4 c5 2.Nf3 d6 3.d4 cxd4 4.Nxd4 Nf6 5.Nc3 g6 6.Be2 Nc6 7.Nb3
Bg7 8.O-O Be6 9.f4 Rc8 10.f5 Bd7 11.g4 Ne5 12.g5 Ng8 13.Nd5 f6
14.Be3 b6 15.Nd4 Kf7 16.c3 Qe8 17.Ne6 Bxe6 18.fxe6+ Kf8
19.Nxf6 Nxf6 20.gxf6 Bxf6 21.Bh6+ Kg8 22.Rxf6 exf6 23.Qxd6 Rc6
24.Qxe5 fxe5 25.Rf1 Rc8 26.Bd1 Rc4 27.Bb3 b5 28.Bxc4 bxc4
29.b3 a5 30.bxc4 Qe7 31.Kg2 Qa3 32.Rf2 Qe7 33.Rf1 g5 34.Rf5 g4
35.c5 Qd8 36.c6 Qe7 37.c7 1-0
[/pgn]

It's been nigh three years. A somewhat more excruciating wait than the 12 hours quoted in your previous post. How do I remember? Back in my "depth is king" days I used to follow posts like these religiously; yours especially, as you had the most depth, insane (for that time) depths. I'd sit at my DualCore from 2008 and pour over the lines and centipawn evals for hours, sigh and wish I had hardware like Louis'. Some time later (oh, what a long wait it seemed at the time) I got the hardware and after spending hundreds of hours on analysis with SF I realized I'd been chasing a mirage. Ah, the sweet innocence; sometimes I wish I could have it all back...

Back to the present. Has Stockfish emerged with 24.Qxe5 yet? Clean hash, no source mods, no MultiPV? I bet it's still 24.Qa3 0.00, three years and +200 Elo later. How many more years are you prepared to wait, Louis? :)

Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team

Re: Someone lit a fire under the Stockfish team