Testposition and HUGE fail high from Stockfish!!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Testposition and HUGE fail high from Stockfish!!

Post by hgm »

George Tsavdaris wrote: Thu Aug 23, 2018 9:37 amWell not really this is the target, since engines have to keep for some time the "losing" score before they go to opposite evaluation.
In my position it keeps it for 1.6 billion nodes in yours it finds in a fraction of a second and 26000 nodes. :D
Try this one, then:

[d]nnnnknnn/pppppppp/8/8/8/8/PPPPPPPP/1Q1QK1Q1 w
Here engines usually think white is way ahead for a very long time. Although the score drops gradually, rather than in a single jump.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Testposition and HUGE fail high from Stockfish!!

Post by chrisw »

zullil wrote: Thu Aug 23, 2018 12:57 am
George Tsavdaris wrote: Thu Aug 23, 2018 12:40 am In this interesting Lyudmil position, Stockfish has a huge fail high and went from +10.33(a win for white) to -mate in 12(black mates in 12).
It had a >+8.00 eval until it found the winning move for about 1.5 billion nodes and depth 32(not too easy to get great depth in this).

How other engines do on this?
Do you have other examples of huge fail lows/highs where engines change their evaluation so drastically?

[d]3r3k/1pNbb2p/1Pp2n2/P1Pp4/3Pp3/1QB1PpPr/2N2P2/2R2RK1 b - - 0 1
Not sure why this is surprising. After all, White is about a Queen ahead in material. So, of course, until the search discovers the mate for Black, the evaluation will be very much in favor of White.
What is "surprising" is how deep SF has to go before seeing black wins. Firstly, it is completely obvious to a human what black should do, namely double on the h-file and deliver mate, and then check out the white countermeasures. Nc7 can't do anything that isn't recaptured leaving the same problem. Bc3 can't do anything, moving either R nothing, trying to run with the king via f1 fails to Rh1 mate.Which leaves Ne1. Ne1-g2-h4 gets hammered and captured on h4, with undefendable mate. Ne1 Qd1 with intents on f6, then Bg4.

LC0 finds Rg8 almost immediately, presumably because it recognises in general the same obvious theme that humans see.

Stockfish takes absolutely ages because there is massive pruning, basically. And although Stockfish has several internal "tests" to pass forward threats and so on, these are not finding (because not coded for, last time I looked) various check threats (or checks left on for the opponent). The problem of coding and detecting more king threat situations is potential explosive imbalance in the search tree. So, not easy. Will have to be done though, in order to counter the NNs.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Testposition and HUGE fail high from Stockfish!!

Post by zullil »

chrisw wrote: Thu Aug 23, 2018 1:44 pm
zullil wrote: Thu Aug 23, 2018 12:57 am
George Tsavdaris wrote: Thu Aug 23, 2018 12:40 am In this interesting Lyudmil position, Stockfish has a huge fail high and went from +10.33(a win for white) to -mate in 12(black mates in 12).
It had a >+8.00 eval until it found the winning move for about 1.5 billion nodes and depth 32(not too easy to get great depth in this).

How other engines do on this?
Do you have other examples of huge fail lows/highs where engines change their evaluation so drastically?

[d]3r3k/1pNbb2p/1Pp2n2/P1Pp4/3Pp3/1QB1PpPr/2N2P2/2R2RK1 b - - 0 1
Not sure why this is surprising. After all, White is about a Queen ahead in material. So, of course, until the search discovers the mate for Black, the evaluation will be very much in favor of White.
What is "surprising" is how deep SF has to go before seeing black wins.
Indeed. Sf-dev's deterministic search (1 thread) examines more than 2 billion nodes before deciding on Rg8. And that's simply to find that the move is the least bad of its available options, not that the move is in any way good! :D

info depth 37 seldepth 60 multipv 1 score cp -1116 upperbound nodes 2217290410 nps 2475287 hashfull 999 tbhits 0 time 895771 pv h7h5 a5a6
info depth 37 currmove h7h5 currmovenumber 1
info depth 37 currmove h3h6 currmovenumber 2
info depth 37 currmove h3h1 currmovenumber 3
info depth 37 currmove d7g4 currmovenumber 4
info depth 37 currmove h8g8 currmovenumber 5
info depth 37 currmove h3h2 currmovenumber 6
info depth 37 currmove d8g8 currmovenumber 7
info depth 37 seldepth 60 multipv 1 score cp -1101 lowerbound nodes 2458504285 nps 2481317 hashfull 999 tbhits 0 time 990806 pv d8g8
Stan Arts
Posts: 179
Joined: Fri Feb 14, 2014 10:53 pm
Location: the Netherlands

Re: Testposition and HUGE fail high from Stockfish!!

Post by Stan Arts »

I've always liked this one. Not sure of the source.

[d] n1QBq1k1/5p1p/5KP1/p7/8/8/8/8 w - - 0 1

White to move Bc7 is mate in 11. (?) If you force Bc7 engines will give a huge queen up score for black till they see mate which is not easy.
Uri
Posts: 473
Joined: Thu Dec 27, 2007 9:34 pm

Re: Testposition and HUGE fail high from Stockfish!!

Post by Uri »

Stockfish is a weak engine.

Stockfish is not as strong as Komodo 12.1.1, Alpha Zero and Leela-zero.

No wonder Stockfish doesn't cost money.

Any engine that is free will not be as good as the commercial ones.
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Testposition and HUGE fail high from Stockfish!!

Post by Guenther »

Uri wrote: Thu Aug 23, 2018 3:48 pm Stockfish is a weak engine.

Stockfish is not as strong as Komodo 12.1.1, Alpha Zero and Leela-zero.

No wonder Stockfish doesn't cost money.

Any engine that is free will not be as good as the commercial ones.
Thanks for coming out of the woods after a while.
Now my ignore list finally seems complete.
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
User avatar
Eelco de Groot
Posts: 4561
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Testposition and HUGE fail high from Stockfish!!

Post by Eelco de Groot »

Poor Uri on an ignore list. It's an attempt at humor right? He was merely paraphrasing Chris. (Ignore lists on this forum are a bad idea I think. Soon nobody will be left..)

Oh, I thought it was Uri Blass. But it is not apparently? This is getting a bit confusing. Too many Uri's. Uri Blass, Uri Averny, Uri Geller.
Last edited by Eelco de Groot on Thu Aug 23, 2018 10:09 pm, edited 2 times in total.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
flither
Posts: 19
Joined: Thu Aug 02, 2018 11:16 pm
Full name: Raf Levsky

Re: Testposition and HUGE fail high from Stockfish!!

Post by flither »

CiChess on my phone finds mate in 12 after 4 minutes and ~1 billion nodes, gradually going down from +10 to -mate.
Eelco - truth. I never put anyone on ignore list, whatever they write, it's cheap
User avatar
Eelco de Groot
Posts: 4561
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Testposition and HUGE fail high from Stockfish!!

Post by Eelco de Groot »

Your phone seems faster than my desktop :) Does anyone find the Mate in 11 though ?
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
flither
Posts: 19
Joined: Thu Aug 02, 2018 11:16 pm
Full name: Raf Levsky

Re: Testposition and HUGE fail high from Stockfish!!

Post by flither »

My phone is also faster than my 5yo average desktop, three times faster :lol: I still can't understand why Android is not supported by most engine devs... :roll: