Stockfish zero evals

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat May 03, 2014 10:08 am

Ferdy wrote:
Lyudmil wrote:SF expects draw by repetition after the first repetition that occurs in the position and returns 0.0 instead of the objective best move and score.
There is nothing wrong with returning 0.0 here as long as the engine has tried very hard exploring other moves and proves that this is the only best reply, rather than the engine returns 0.0 just because there is repetition. One way to see how the engine reacts is if we let it search the position except the move that repeats, will it find a better alternative move? During a game it is different since there is time constraint.

This is the best move, but the score after it is wrong, as the other side has better continuations than the draw line.

Maybe Michel is right that SF would lose some elo if it goes on calculating other possible moves instead of instantly playing the repetition move in a position where the opponent has the edge.

Ugly, but gaining elo. If it gains elo, I am for it, sorry Carl.

But of course, the best thing would be to fix the issue without losing elo in the process, if possible.

Ferdy · Post by **Ferdy** » Sat May 03, 2014 1:33 pm

Lyudmil Tsvetkov wrote:
Ferdy wrote:
Lyudmil wrote:SF expects draw by repetition after the first repetition that occurs in the position and returns 0.0 instead of the objective best move and score.
There is nothing wrong with returning 0.0 here as long as the engine has tried very hard exploring other moves and proves that this is the only best reply, rather than the engine returns 0.0 just because there is repetition. One way to see how the engine reacts is if we let it search the position except the move that repeats, will it find a better alternative move? During a game it is different since there is time constraint.
This is the best move, but the score after it is wrong, as the other side has better continuations than the draw line.

Maybe Michel is right that SF would lose some elo if it goes on calculating other possible moves instead of instantly playing the repetition move in a position where the opponent has the edge.

Ugly, but gaining elo. If it gains elo, I am for it, sorry Carl.

But of course, the best thing would be to fix the issue without losing elo in the process, if possible.

[d]r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
From game score it played h7g8 with score 0.0 cp

Allowing it search without knowing move history it played d7c8 with score -77cp

position fen r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
go movetime 20000

info depth 22 seldepth 30 score cp -77 nodes 20730953 nps 1036392 time 20003 mul
tipv 1 pv d7c8 f1g1 d8g5 c4b5 c8b7 b5c6 a8g8 c6b7 g7b7 a4a5 g8a8
info nodes 20730953 time 20003
bestmove d7c8 ponder f1g1

Forcing it to search h7g8, and it does not like it.

position fen r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
go movetime 20000 searchmoves h7g8

info depth 25 seldepth 36 score cp -89 nodes 21263237 nps 1063108 time 20001 mul
tipv 1 pv h7g8 f2a2 a8a5 a2g2 g8h7 f1f4 a5a7 c4b5 d7b5 a4b5 d8d7 c3f6 a7b7 g3g4
b7b5 f4f5 b5b3 g4g5
info nodes 21263237 time 20002
bestmove h7g8 ponder f2a2

Did the engine cheated itself?
I will investigate this behaviour in my engine.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat May 03, 2014 2:28 pm

Ferdy wrote:
Lyudmil Tsvetkov wrote:
Ferdy wrote:
Lyudmil wrote:SF expects draw by repetition after the first repetition that occurs in the position and returns 0.0 instead of the objective best move and score.
There is nothing wrong with returning 0.0 here as long as the engine has tried very hard exploring other moves and proves that this is the only best reply, rather than the engine returns 0.0 just because there is repetition. One way to see how the engine reacts is if we let it search the position except the move that repeats, will it find a better alternative move? During a game it is different since there is time constraint.
This is the best move, but the score after it is wrong, as the other side has better continuations than the draw line.

Maybe Michel is right that SF would lose some elo if it goes on calculating other possible moves instead of instantly playing the repetition move in a position where the opponent has the edge.

Ugly, but gaining elo. If it gains elo, I am for it, sorry Carl.

But of course, the best thing would be to fix the issue without losing elo in the process, if possible.
[d]r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
From game score it played h7g8 with score 0.0 cp

Allowing it search without knowing move history it played d7c8 with score -77cp
position fen r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
go movetime 20000

info depth 22 seldepth 30 score cp -77 nodes 20730953 nps 1036392 time 20003 mul
tipv 1 pv d7c8 f1g1 d8g5 c4b5 c8b7 b5c6 a8g8 c6b7 g7b7 a4a5 g8a8
info nodes 20730953 time 20003
bestmove d7c8 ponder f1g1
Forcing it to search h7g8, and it does not like it.
position fen r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
go movetime 20000 searchmoves h7g8

info depth 25 seldepth 36 score cp -89 nodes 21263237 nps 1063108 time 20001 mul
tipv 1 pv h7g8 f2a2 a8a5 a2g2 g8h7 f1f4 a5a7 c4b5 d7b5 a4b5 d8d7 c3f6 a7b7 g3g4
b7b5 f4f5 b5b3 g4g5
info nodes 21263237 time 20002
bestmove h7g8 ponder f2a2
Did the engine cheated itself?
I will investigate this behaviour in my engine.

Obviously, there is a rule in SF that says, when possiple to repeat position once based on history, and you are worse, just repeat without searching. This saves SF search time for other moves, but the score returned is incorrect. Interesting if it is possible not to lose any elo by just returning the score of the previous move calculated, or do you call it iteration, and not 0.0, again without searching?

Ferdy · Post by **Ferdy** » Sat May 03, 2014 4:10 pm

Lyudmil Tsvetkov wrote:
Ferdy wrote:
Lyudmil Tsvetkov wrote:
Ferdy wrote:
Lyudmil wrote:SF expects draw by repetition after the first repetition that occurs in the position and returns 0.0 instead of the objective best move and score.
There is nothing wrong with returning 0.0 here as long as the engine has tried very hard exploring other moves and proves that this is the only best reply, rather than the engine returns 0.0 just because there is repetition. One way to see how the engine reacts is if we let it search the position except the move that repeats, will it find a better alternative move? During a game it is different since there is time constraint.
This is the best move, but the score after it is wrong, as the other side has better continuations than the draw line.

Maybe Michel is right that SF would lose some elo if it goes on calculating other possible moves instead of instantly playing the repetition move in a position where the opponent has the edge.

Ugly, but gaining elo. If it gains elo, I am for it, sorry Carl.

But of course, the best thing would be to fix the issue without losing elo in the process, if possible.
[d]r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
From game score it played h7g8 with score 0.0 cp

Allowing it search without knowing move history it played d7c8 with score -77cp
position fen r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
go movetime 20000

info depth 22 seldepth 30 score cp -77 nodes 20730953 nps 1036392 time 20003 mul
tipv 1 pv d7c8 f1g1 d8g5 c4b5 c8b7 b5c6 a8g8 c6b7 g7b7 a4a5 g8a8
info nodes 20730953 time 20003
bestmove d7c8 ponder f1g1
Forcing it to search h7g8, and it does not like it.
position fen r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
go movetime 20000 searchmoves h7g8

info depth 25 seldepth 36 score cp -89 nodes 21263237 nps 1063108 time 20001 mul
tipv 1 pv h7g8 f2a2 a8a5 a2g2 g8h7 f1f4 a5a7 c4b5 d7b5 a4b5 d8d7 c3f6 a7b7 g3g4
b7b5 f4f5 b5b3 g4g5
info nodes 21263237 time 20002
bestmove h7g8 ponder f2a2
Did the engine cheated itself?
I will investigate this behaviour in my engine.
Obviously, there is a rule in SF that says, when possiple to repeat position once based on history, and you are worse, just repeat without searching. This saves SF search time for other moves, but the score returned is incorrect. Interesting if it is possible not to lose any elo by just returning the score of the previous move calculated, or do you call it iteration, and not 0.0, again without searching?

In game play it is the move that counts, it can return 0.0 or not does not matter much.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat May 03, 2014 4:26 pm

Ferdy wrote:
Lyudmil Tsvetkov wrote:
Ferdy wrote:
Lyudmil Tsvetkov wrote:
Ferdy wrote:
Lyudmil wrote:SF expects draw by repetition after the first repetition that occurs in the position and returns 0.0 instead of the objective best move and score.
There is nothing wrong with returning 0.0 here as long as the engine has tried very hard exploring other moves and proves that this is the only best reply, rather than the engine returns 0.0 just because there is repetition. One way to see how the engine reacts is if we let it search the position except the move that repeats, will it find a better alternative move? During a game it is different since there is time constraint.
This is the best move, but the score after it is wrong, as the other side has better continuations than the draw line.

Maybe Michel is right that SF would lose some elo if it goes on calculating other possible moves instead of instantly playing the repetition move in a position where the opponent has the edge.

Ugly, but gaining elo. If it gains elo, I am for it, sorry Carl.

But of course, the best thing would be to fix the issue without losing elo in the process, if possible.
[d]r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
From game score it played h7g8 with score 0.0 cp

Allowing it search without knowing move history it played d7c8 with score -77cp
position fen r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
go movetime 20000

info depth 22 seldepth 30 score cp -77 nodes 20730953 nps 1036392 time 20003 mul
tipv 1 pv d7c8 f1g1 d8g5 c4b5 c8b7 b5c6 a8g8 c6b7 g7b7 a4a5 g8a8
info nodes 20730953 time 20003
bestmove d7c8 ponder f1g1
Forcing it to search h7g8, and it does not like it.
position fen r2q4/3b2rk/3p3p/2pP1p2/P1B1p3/1PQ1P1PP/5R1K/5R2 b - - 3 52
go movetime 20000 searchmoves h7g8

info depth 25 seldepth 36 score cp -89 nodes 21263237 nps 1063108 time 20001 mul
tipv 1 pv h7g8 f2a2 a8a5 a2g2 g8h7 f1f4 a5a7 c4b5 d7b5 a4b5 d8d7 c3f6 a7b7 g3g4
b7b5 f4f5 b5b3 g4g5
info nodes 21263237 time 20002
bestmove h7g8 ponder f2a2
Did the engine cheated itself?
I will investigate this behaviour in my engine.
Obviously, there is a rule in SF that says, when possiple to repeat position once based on history, and you are worse, just repeat without searching. This saves SF search time for other moves, but the score returned is incorrect. Interesting if it is possible not to lose any elo by just returning the score of the previous move calculated, or do you call it iteration, and not 0.0, again without searching?
In game play it is the move that counts, it can return 0.0 or not does not matter much.

Aesthetically it counts, because in this way the evaluation is jumping too much. Think of a human player who analyses a position with the help of SF, repeats once the position and SF says it is draw. A move before it said one of the sides was winning by a large margin, and a move later it also says so, but just now, in between it says it is 0.0, perfect draw. So that aesthetically it counts.

lkaufman · Post by **lkaufman** » Sat May 03, 2014 4:43 pm

Isaac wrote:
Joerg Oster wrote:I have seen this also at more shallow depths, so I don't think that's a good explanation.
I can confirm this. Someone on the TCEC analyzed about 30k games from the CCRL from engines rated over 3000 and having no more than 50 elo difference. He let Stockfish analyzing up to a fixed -shallow- depth (depth 14 if I remember well but I may be wrong) for a particular move number on every single game.
He then plotted the evaluation on the x-axis and the frequency on the y-axis.
As expected, "0.00" was the most common eval. If I remember well, it was around 11% of all eval values returned. The rest of the shape looked like a wide Gaussian distribution, not centered in 0 (drifted toward positive values).

Thanks. Do you know either what version was tested or about how long ago this was done? Also, was it done for any other engine? The 11% value does sound high, but we should really compare it to some other engine. Regarding the centering around a positive value, is this positive for White or positive for the engine? Positive for White would be expected of course, but positive for the engine might suggest that the side to move bonus was too high. I wonder if a high side to move bonus (or stand pat bonus) might cause more draw scores somehow?

syzygy · Post by **syzygy** » Sat May 03, 2014 6:01 pm

I just noticed that in the FishCooking forum (link) someone reported a zero score on the Behting study:

[D]8/8/7p/3KNN1k/2p4p/8/3P2p1/8 w - - 0 1

I tested myself and indeed:

Code: Select all

info depth 68 seldepth 88 score cp 0 nodes 4685320414 nps 14476145 tbhits 65063167 time 323658 multipv 1 pv d5c6 g2g1q f5h4 g1a1 h4f3 a1a5 c6d6 a5b5 d6e6 b5c5 e6f5 c5d5 f5f4 d5d8 f4e3 d8b8 e3d4 b8b5 d4e4 b5b3 e4d4 b3b4 d4d5 b4a4 d5c5 a4a2 c5b5 a2b3 b5c5 b3a4 c5d4 a4a6 d4d5 a6c8 d5d4 c8c7 d4d5 c7d8 d5c4 d8b6 d2d4 b6b7 d4d5 b7c7 c4b5 c7b7 b5c4

http://chessprogramming.wikispaces.com/Behting+Study
http://en.chessbase.com/post/john-nunn- ... y-is-sound

John Nunn wrote:Thus Behting’s 1908 study has, after just over a century, been proved correct and ChessBase can continue to use it to humiliate computers until the day dawns when they can finally solve it.

That day has come

I note that Nunn's alternative line ending in
[D]8/8/8/5N2/2p1K1k1/5N2/3P4/7q w - - 2 9
indeed can be seen to lose quickly using 6-piece TBs.

Isaac · Post by **Isaac** » Sat May 03, 2014 7:16 pm

lkaufman wrote: Thanks. Do you know either what version was tested or about how long ago this was done?

SF DD (about 1 month ago).

Larry wrote:Also, was it done for any other engine? The 11% value does sound high, but we should really compare it to some other engine.

You are right.
I have just chatted with him on the TCEC chat and he said he would post here when he's done with some other analysis. He apparently tested with Houdini 4 and at move 15 it would return a greater % than Stockfish of 0.00 evaluations.

Larry wrote:Regarding the centering around a positive value, is this positive for White or positive for the engine? Positive for White would be expected of course, but positive for the engine might suggest that the side to move bonus was too high. I wonder if a high side to move bonus (or stand pat bonus) might cause more draw scores somehow?

He is telling me "the side to move was always white".

carldaman · Post by **carldaman** » Sat May 03, 2014 8:08 pm

Lyudmil Tsvetkov wrote:
Ferdy wrote:
Lyudmil wrote:SF expects draw by repetition after the first repetition that occurs in the position and returns 0.0 instead of the objective best move and score.
There is nothing wrong with returning 0.0 here as long as the engine has tried very hard exploring other moves and proves that this is the only best reply, rather than the engine returns 0.0 just because there is repetition. One way to see how the engine reacts is if we let it search the position except the move that repeats, will it find a better alternative move? During a game it is different since there is time constraint.
This is the best move, but the score after it is wrong, as the other side has better continuations than the draw line.

Maybe Michel is right that SF would lose some elo if it goes on calculating other possible moves instead of instantly playing the repetition move in a position where the opponent has the edge.

Ugly, but gaining elo. If it gains elo, I am for it, sorry Carl.

But of course, the best thing would be to fix the issue without losing elo in the process, if possible.

Fixing it without Elo loss would be ideal, but until then, losing 1 measly rating point to improve the aesthetics sounds very practical to me. I just like well rounded engines, so the less odd behavior the better the overall effect, in my 'book', anyway.

Isaac · Post by **Isaac** » Sat May 03, 2014 9:22 pm

Here's a comparison between Houdini 4 and Stockfish DD about "0.00" evaluation for several moves number. The graph should be clear enough all by itself.
It's the analysis from "Joseph" on the TCEC.
https://public.bn1301.livefilestore.com ... png?psid=1

Stockfish zero evals

Re: Stockfish zero evals

Re: Stockfish zero evals

Re: Stockfish zero evals

Re: Stockfish zero evals

Re: Stockfish zero evals

Re: Stockfish zero evals

Re: Stockfish zero evals

Re: Stockfish zero evals

Re: Stockfish zero evals

Re: Stockfish zero evals