Page 2 of 4

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 12:23 pm
by hgm
George Tsavdaris wrote: Thu Aug 23, 2018 9:37 amWell not really this is the target, since engines have to keep for some time the "losing" score before they go to opposite evaluation.
In my position it keeps it for 1.6 billion nodes in yours it finds in a fraction of a second and 26000 nodes. :D
Try this one, then:

[d]nnnnknnn/pppppppp/8/8/8/8/PPPPPPPP/1Q1QK1Q1 w
Here engines usually think white is way ahead for a very long time. Although the score drops gradually, rather than in a single jump.

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 1:44 pm
by chrisw
zullil wrote: Thu Aug 23, 2018 12:57 am
George Tsavdaris wrote: Thu Aug 23, 2018 12:40 am In this interesting Lyudmil position, Stockfish has a huge fail high and went from +10.33(a win for white) to -mate in 12(black mates in 12).
It had a >+8.00 eval until it found the winning move for about 1.5 billion nodes and depth 32(not too easy to get great depth in this).

How other engines do on this?
Do you have other examples of huge fail lows/highs where engines change their evaluation so drastically?

[d]3r3k/1pNbb2p/1Pp2n2/P1Pp4/3Pp3/1QB1PpPr/2N2P2/2R2RK1 b - - 0 1
Not sure why this is surprising. After all, White is about a Queen ahead in material. So, of course, until the search discovers the mate for Black, the evaluation will be very much in favor of White.
What is "surprising" is how deep SF has to go before seeing black wins. Firstly, it is completely obvious to a human what black should do, namely double on the h-file and deliver mate, and then check out the white countermeasures. Nc7 can't do anything that isn't recaptured leaving the same problem. Bc3 can't do anything, moving either R nothing, trying to run with the king via f1 fails to Rh1 mate.Which leaves Ne1. Ne1-g2-h4 gets hammered and captured on h4, with undefendable mate. Ne1 Qd1 with intents on f6, then Bg4.

LC0 finds Rg8 almost immediately, presumably because it recognises in general the same obvious theme that humans see.

Stockfish takes absolutely ages because there is massive pruning, basically. And although Stockfish has several internal "tests" to pass forward threats and so on, these are not finding (because not coded for, last time I looked) various check threats (or checks left on for the opponent). The problem of coding and detecting more king threat situations is potential explosive imbalance in the search tree. So, not easy. Will have to be done though, in order to counter the NNs.

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 2:34 pm
by zullil
chrisw wrote: Thu Aug 23, 2018 1:44 pm
zullil wrote: Thu Aug 23, 2018 12:57 am
George Tsavdaris wrote: Thu Aug 23, 2018 12:40 am In this interesting Lyudmil position, Stockfish has a huge fail high and went from +10.33(a win for white) to -mate in 12(black mates in 12).
It had a >+8.00 eval until it found the winning move for about 1.5 billion nodes and depth 32(not too easy to get great depth in this).

How other engines do on this?
Do you have other examples of huge fail lows/highs where engines change their evaluation so drastically?

[d]3r3k/1pNbb2p/1Pp2n2/P1Pp4/3Pp3/1QB1PpPr/2N2P2/2R2RK1 b - - 0 1
Not sure why this is surprising. After all, White is about a Queen ahead in material. So, of course, until the search discovers the mate for Black, the evaluation will be very much in favor of White.
What is "surprising" is how deep SF has to go before seeing black wins.
Indeed. Sf-dev's deterministic search (1 thread) examines more than 2 billion nodes before deciding on Rg8. And that's simply to find that the move is the least bad of its available options, not that the move is in any way good! :D

info depth 37 seldepth 60 multipv 1 score cp -1116 upperbound nodes 2217290410 nps 2475287 hashfull 999 tbhits 0 time 895771 pv h7h5 a5a6
info depth 37 currmove h7h5 currmovenumber 1
info depth 37 currmove h3h6 currmovenumber 2
info depth 37 currmove h3h1 currmovenumber 3
info depth 37 currmove d7g4 currmovenumber 4
info depth 37 currmove h8g8 currmovenumber 5
info depth 37 currmove h3h2 currmovenumber 6
info depth 37 currmove d8g8 currmovenumber 7
info depth 37 seldepth 60 multipv 1 score cp -1101 lowerbound nodes 2458504285 nps 2481317 hashfull 999 tbhits 0 time 990806 pv d8g8

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 3:05 pm
by Stan Arts
I've always liked this one. Not sure of the source.

[d] n1QBq1k1/5p1p/5KP1/p7/8/8/8/8 w - - 0 1

White to move Bc7 is mate in 11. (?) If you force Bc7 engines will give a huge queen up score for black till they see mate which is not easy.

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 3:48 pm
by Uri
Stockfish is a weak engine.

Stockfish is not as strong as Komodo 12.1.1, Alpha Zero and Leela-zero.

No wonder Stockfish doesn't cost money.

Any engine that is free will not be as good as the commercial ones.

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 9:18 pm
by Guenther
Uri wrote: Thu Aug 23, 2018 3:48 pm Stockfish is a weak engine.

Stockfish is not as strong as Komodo 12.1.1, Alpha Zero and Leela-zero.

No wonder Stockfish doesn't cost money.

Any engine that is free will not be as good as the commercial ones.
Thanks for coming out of the woods after a while.
Now my ignore list finally seems complete.

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 9:58 pm
by Eelco de Groot
Poor Uri on an ignore list. It's an attempt at humor right? He was merely paraphrasing Chris. (Ignore lists on this forum are a bad idea I think. Soon nobody will be left..)

Oh, I thought it was Uri Blass. But it is not apparently? This is getting a bit confusing. Too many Uri's. Uri Blass, Uri Averny, Uri Geller.

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 10:02 pm
by flither
CiChess on my phone finds mate in 12 after 4 minutes and ~1 billion nodes, gradually going down from +10 to -mate.
Eelco - truth. I never put anyone on ignore list, whatever they write, it's cheap

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 10:15 pm
by Eelco de Groot
Your phone seems faster than my desktop :) Does anyone find the Mate in 11 though ?

Re: Testposition and HUGE fail high from Stockfish!!

Posted: Thu Aug 23, 2018 10:31 pm
by flither
My phone is also faster than my 5yo average desktop, three times faster :lol: I still can't understand why Android is not supported by most engine devs... :roll: