Stockfish bug

lucasart · Post by **lucasart** » Wed Jun 12, 2013 3:23 am

syzygy wrote:
tpetzke wrote:Hi Sven,

it does not hurt much because in an engine - engine match this position will likely not occur.
I would not be surprised if "fixing" it would somewhat hurt engine strength. Scoring first repetitions of positions in the game history immediately as a draw and not searching those branches further likely makes the tree a bit smaller.

Fixing it would also annoy some (other) users, because the engine will now sometimes choose a line that it can see to lead to a (first) repetition but that (with the fix) still seems better than other lines.

I still consider Sven's approach to be "more correct" and I have implemented that in my engine, but there seem to be good reasons for the approach of Stockfish.

Also the transposition table is blind to repetition problems and we could solve that by storing path information. But it will not make the program stronger so I guess nobody does that.
Indeed. Users can also complain about such problems, but considering them "bugs" that need to be fixed is simply not an option. This also reminds me of the complaints about smp variability of some time ago.

I just want to clarify one thing in this discussion. It is not proven beyond reasonable doubt that fixing the repetition bug is actually a regression in Stockfish. The problem is that they used an asymmetric SPRT to decide, with the following parameters

Code: Select all

H0&#58; elo0=-1.5 &#40;expressed in unscaled bayes elo&#41;
H1&#58; elo1=+4.5 &#40;expressed in unscaled bayes elo&#41;
alpha=5% &#40;max type I error probability of the test&#41;
beta=5% &#40;max type II error probability of the test, in the zone elo >= elo1&#41;

The test accepted H0, which does not prove (at 5% level of confidence) that the patch was a regression. This is a typical case of misusing SPRT. If they wanted to prove that the patch is a regression, they should have used elo0=-6 and elo1=0 (and elo0=0 elo1=6 to demonstrate that the patch is an improvement).

When I fixed this same bug in my engine, it was actualy a small gain. I had used a simple fixed game test and got the following result

Code: Select all

9000 games in 10"+0.1"
2374-2248-4378 &#91;50.7%&#93; p-value=96.8%

Perhaps it's a zero elo patch, and I just had a lucky run. In any case, fixing this bug correctly should not be a regression.

jundery · Post by **jundery** » Wed Jun 12, 2013 8:43 am

lucasart wrote: I just want to clarify one thing in this discussion. It is not proven beyond reasonable doubt that fixing the repetition bug is actually a regression in Stockfish. The problem is that they used an asymmetric SPRT to decide, with the following parameters
Code: Select all
H0&#58; elo0=-1.5 &#40;expressed in unscaled bayes elo&#41;
H1&#58; elo1=+4.5 &#40;expressed in unscaled bayes elo&#41;
alpha=5% &#40;max type I error probability of the test&#41;
beta=5% &#40;max type II error probability of the test, in the zone elo >= elo1&#41;
The test accepted H0, which does not prove (at 5% level of confidence) that the patch was a regression. This is a typical case of misusing SPRT. If they wanted to prove that the patch is a regression, they should have used elo0=-6 and elo1=0 (and elo0=0 elo1=6 to demonstrate that the patch is an improvement).

When I fixed this same bug in my engine, it was actualy a small gain. I had used a simple fixed game test and got the following result
Code: Select all
9000 games in 10"+0.1"
2374-2248-4378 &#91;50.7%&#93; p-value=96.8%
Perhaps it's a zero elo patch, and I just had a lucky run. In any case, fixing this bug correctly should not be a regression.

As someone who submitted a Stockfish patch for this to the testing framework, I 100% agree with Lucas this shouldn't be a regression, however I'd also note that in normal play, Stockfish shouldn't make the "blunders" that lead to these positions. In personal testing I saw some very short TC benefits but that became near zero at 60 second games. Overall, my personal feeling is that if Stockfish had an "Analysis mode" this could be added to that, but the increase to the search space for positions that the engine shouldn't have got into is the reason this won't be fixed in the main branch.

syzygy · Post by **syzygy** » Wed Jun 12, 2013 9:03 pm

syzygy wrote:
hgm wrote:Yes, that would be a nice further optimization. The underlying idea would also work for the conventional linear search through a stack of keys, btw.
I know, because that's what I am doing now

In addition to the stack of keys I have small hashtable with byte-size entries that I increase and decrease when entering/leaving a node. If the value is 0 on entering, I don't need to search through the history stack.

When I have some time I will implement your idea to see what is faster (if at all measurable).

Unfortunately your approach (including the further optimisation I proposed) turns out to be a bit slower in my engine than my current approach.

Uri Blass · Post by **Uri Blass** » Thu Jun 13, 2013 10:18 am

tpetzke wrote:
Why are you reducing the problem to the level of "assumptions"?

Because certain "assumptions" are responsible for the design of our algorithms.

E.g. a fundamental assumption in all chess programs is "we assume" our opponent plays his best move". We can't be sure, so it is an assumption.

The assumption behind "awarding a draw score for the 1st repetition is good enough" reads "an engine that plays a bad move once will most likely play it twice."

This assumption will only hurt in cases where an engine misses a draw by repetition when it first sees a position but spots it when it encounters it again. In a game of human players that might be common but in an engine - engine match this should be rare. So rare that fixing it will not produce an ELO gain.

Thomas...

Not exactly.
It can hurt also in cases that the engine does not miss a draw by repetition but simply change the evaluation after a deeper search.

Imagine that the main line is
1.Bd2 Be7 2.Rc1 Rc8 with a score of +1 for white at depth 20.

1.Rb1 is only +0.9 pawns for white at depth 20 so the engine prefers 1.Bd2

After 1.Bd2 Be7 the engine can search deeper and find that 2.Rc1 is only +0.1 for white.
It has also the option to force repetition of previous position by 2.Bc3 Bf6(assume black has no good choice)

If the engine evaluates every repetition as a draw then it means that it is going to prefer 2.Rc1 that is +0.1 for white and not 2.Bc3 that it evaluates as 0.00.

If the engine does not evaluate every repetition as a draw then it may choose 2.Bc3 because the score after 2.Bc3 Bf6 3.Rb1 is +0.9 pawns for white that is clearly better than +0.1 pawns for white.

Note that I do not claim that you earn elo by not evaluating repetition of previous game positions as a draw but only that nothing is obvious here.

There is also another option that is not to evaluate first repetition as a draw but evaluate it as something closer to draw(for example instead of returning 0.00 without a search return previous evaluation that you remember from previous search divided by 2 without a search).

tpetzke · Post by **tpetzke** » Thu Jun 13, 2013 10:41 am

There is also another option that is not to evaluate first repetition as a draw but evaluate it as something closer to draw(for example instead of returning 0.00 without a search return previous evaluation that you remember from previous search divided by 2 without a search).

This is what I did in the past. It introduces a bit additional search instability. Then I learned in the forum that the general approach is to score the first repetition as draw.

The reason given was that in most cases if the weaker opponent can force the stronger one into a repetition he can probably do so again. This somehow made sense to me and I tried it. It tested stronger in my engine so I kept it.

Thomas...

Uri Blass · Post by **Uri Blass** » Thu Jun 13, 2013 11:20 am

tpetzke wrote:
There is also another option that is not to evaluate first repetition as a draw but evaluate it as something closer to draw(for example instead of returning 0.00 without a search return previous evaluation that you remember from previous search divided by 2 without a search).
This is what I did in the past. It introduces a bit additional search instability. Then I learned in the forum that the general approach is to score the first repetition as draw.

The reason given was that in most cases if the weaker opponent can force the stronger one into a repetition he can probably do so again. This somehow made sense to me and I tried it. It tested stronger in my engine so I kept it.

Thomas...

I do not see a reason for search instability

Why returning previous_search_result/2 cause search instability when returning 0.00 score does not cause search instability?

In both cases you do not search after repetition.

tpetzke · Post by **tpetzke** » Thu Jun 13, 2013 11:43 am

I do not see a reason for search instability

Because in the level above you get different scores for the same move. But you are right. This issue is not resolved by returning 0 instead of the more drawish score.

I changed it because it tested stronger not because it solved the search instability issue (because it doesn't).

Thomas...

syzygy · Post by **syzygy** » Thu Jun 13, 2013 9:44 pm

Uri Blass wrote:There is also another option that is not to evaluate first repetition as a draw but evaluate it as something closer to draw(for example instead of returning 0.00 without a search return previous evaluation that you remember from previous search divided by 2 without a search).

What do you propose to return exactly? A previous evaluation (divided by 2) of what position? Or the result (divided) by 2 of which search and of what position?

Uri Blass · Post by **Uri Blass** » Thu Jun 13, 2013 11:02 pm

syzygy wrote:
Uri Blass wrote:There is also another option that is not to evaluate first repetition as a draw but evaluate it as something closer to draw(for example instead of returning 0.00 without a search return previous evaluation that you remember from previous search divided by 2 without a search).
What do you propose to return exactly? A previous evaluation (divided by 2) of what position? Or the result (divided) by 2 of which search and of what position?

We talk only about repetition of game position in the search(suppose for the discussion that the program play with the white pieces).
There are 2 cases about the relevant position:

case 1:it happened before white played move n
The program already searched the relevant position and calculated a score for it(call this score from white point of view s(n)).

case 2:it happened after white played move n when black is to move.
again the relevant score that the program can know is the same s(n)
or s(n+1) if s(n+1)<s(n)(if s(n+1)>s(n) then maybe it is because black did not play the best move so I prefer to trust s(n))

In the first case I suggest to return s(n)/2 and in the second case
s(n)/2 or s(n+1)/2 if the program already searched move n+1 and s(n+1)<s(n)

syzygy · Post by **syzygy** » Fri Jun 14, 2013 1:14 am

Uri Blass wrote:
syzygy wrote:
Uri Blass wrote:There is also another option that is not to evaluate first repetition as a draw but evaluate it as something closer to draw(for example instead of returning 0.00 without a search return previous evaluation that you remember from previous search divided by 2 without a search).
What do you propose to return exactly? A previous evaluation (divided by 2) of what position? Or the result (divided) by 2 of which search and of what position?
We talk only about repetition of game position in the search(suppose for the discussion that the program play with the white pieces).

Of course, and that's what I overlooked.

Your proposal certainly makes sense, but it would also prevent the engine from realising that the repeat position might really lead to a draw, where it might have seen the draw if it searched deeper. (But you can answer that if the first time around the engine could not search deep enough to see the first repeat, the second time around it would not search deep enough to see the third repeat, either.)

Stockfish bug

Re: Stockfish bug

Re: Stockfish bug

Re: Stockfish bug

Re: Stockfish bug

Re: Stockfish bug

Re: Stockfish bug

Re: Stockfish bug

Re: Stockfish bug

Re: Stockfish bug

Re: Stockfish bug