stockfish fail high fail low

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

QED
Posts: 60
Joined: Thu Nov 05, 2009 9:53 pm

Re: stockfish fail high fail low

Post by QED »

My last report on Stockfish 1.7.1, because 1.8 is out there with search.cpp nicer then ever. Majority of tests I tried were inconclusive and I was needing more computer time for my correspondence chess, but there are two ideas about null move search.

First idea is about what to return when we have nullValue>=beta. When testing 4 different patches to improve stability (round robin of matches of 1000 games repeating from first 500 gaviota starters), the best with +5(+-7) elo over original stockfish (los 81:18) was this patch

Code: Select all

diff -dur src-Ch/search.cpp src-Pv6Ch/search.cpp
--- src-Ch/search.cpp   2010-04-20 00:45:49.000000000 +0200
+++ src-Pv6Ch/search.cpp        2010-06-19 19:15:20.000000000 +0200
@@ -1334,7 +1334,8 @@
         if &#40;v < rbeta&#41;
             // Logically we should return &#40;v + razor_margin&#40;depth&#41;), but
             // surprisingly this did slightly weaker in tests.
-            return v;
+            //return v;
+            return v + razor_margin&#40;depth&#41;;
     &#125;
 
     // Step 7. Static null move pruning
@@ -1387,6 +1388,18 @@
                 && !AbortSearch
                 && !TM.thread_should_stop&#40;threadID&#41;)
             &#123;
+                // Reduce nullValue to not contradict null move conditions when used as lower bound.
+                if &#40;beta < refinedValue - PawnValueMidgame&#41;
+                &#123;
+                    if &#40;nullValue >= refinedValue - PawnValueMidgame&#41;
+                        nullValue = Max&#40;beta, refinedValue - PawnValueMidgame - 4&#41;;
+                &#125;
+                else
+                &#123;
+                    nullValue = Min&#40;nullValue, refinedValue + &#40;depth >= 4 * OnePly ? NullMoveMargin &#58; 0&#41;);
+                    assert&#40;nullValue >= beta&#41;;
+                &#125;
+
                 assert&#40;value_to_tt&#40;nullValue, ply&#41; == nullValue&#41;;
 
                 TT.store&#40;posKey, nullValue, VALUE_TYPE_NS_LO, depth, MOVE_NONE&#41;;
with the guiding idea that when search() returns something, it should return the same thing also when beta is less far from that. Razoring was well commented, but null move search was not.

When I saw that I am not getting stability, just elo, I focused more on null move search. So I asked myself, why do we do verification search with reduction of 5*OnePly when null move search was reduced by R*OnePly? Patch:

Code: Select all

diff -dur src-Ch/search.cpp src-Nmv0Ch/search.cpp
--- src-Ch/search.cpp   2010-04-20 00&#58;45&#58;49.000000000 +0200
+++ src-Nmv0Ch/search.cpp       2010-07-01 22&#58;44&#58;01.000000000 +0200
@@ -1382,8 +1382,8 @@
 
             // Do zugzwang verification search for high depths, don't store in TT
             // if search was stopped.
-            if (   (   depth < 6 * OnePly
-                    || search&#40;pos, ss, beta, depth-5*OnePly, ply, false, threadID&#41; >= beta&#41;
+            if (   (   depth-&#40;R+2&#41;*OnePly < OnePly
+                    || search&#40;pos, ss, beta, depth-&#40;R+2&#41;*OnePly, ply, false, threadID&#41; >= beta&#41;
                 && !AbortSearch
                 && !TM.thread_should_stop&#40;threadID&#41;)
             &#123;
And to keep notebook busy at weekend, I also combined R+2 verification with nullValue bound to third patch:

Code: Select all

diff -dur src-Ch/search.cpp src-Nmv1Ch/search.cpp
--- src-Ch/search.cpp   2010-04-20 00&#58;45&#58;49.000000000 +0200
+++ src-Nmv1Ch/search.cpp       2010-07-01 22&#58;43&#58;59.000000000 +0200
@@ -1376,14 +1376,24 @@
 
         if &#40;nullValue >= beta&#41;
         &#123;
+            // Reduce nullValue to not contradict null move conditions when used as lower bound.
+            if &#40;beta < refinedValue - PawnValueMidgame&#41;
+            &#123;
+                if &#40;nullValue >= refinedValue - PawnValueMidgame&#41;
+                    nullValue = Max&#40;beta, refinedValue - PawnValueMidgame - 8&#41;;
+            &#125;
+            else
+            &#123;
+                nullValue = Min&#40;nullValue, refinedValue + &#40;depth >= 4 * OnePly ? NullMoveMargin &#58; 0&#41;);
+                assert&#40;nullValue >= beta&#41;;
+            &#125;
             // Do not return unproven mate scores
-            if &#40;nullValue >= value_mate_in&#40;PLY_MAX&#41;)
-                nullValue = beta;
+            assert&#40;nullValue < value_mate_in&#40;PLY_MAX&#41;);
 
             // Do zugzwang verification search for high depths, don't store in TT
             // if search was stopped.
-            if (   (   depth < 6 * OnePly
-                    || search&#40;pos, ss, beta, depth-5*OnePly, ply, false, threadID&#41; >= beta&#41;
+            if (   (   depth-&#40;R+2&#41;*OnePly < OnePly
+                    || search&#40;pos, ss, beta, depth-&#40;R+2&#41;*OnePly, ply, false, threadID&#41; >= beta&#41;
                 && !AbortSearch
                 && !TM.thread_should_stop&#40;threadID&#41;)
             &#123;
LOS of second patch against original was 96:3, third patch against original had los 75:24, but in direct confrontation the third one was better with LOS 58:41. When I joined the 3 pgn files together, bayeselo changed this 3 ratios to 94:5, 91:8 and 40:59 respectively.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: stockfish fail high fail low

Post by mcostalba »

Thanks for ideas and especially for having tested them ;-)

BTW what is the time control used by your tests ?

Ideas are interesting so probably I will retest them for verification, but I will test one by one as I am used to.

First is the null search one, then the zugzwang detection and in case the razoring tweak, but there Joona already made the test that was unsuccesful (see the razoring comment) so on this I am not sure I will retest.

....waiting some interesting tweak on 1.8 sources now :-)
QED
Posts: 60
Joined: Thu Nov 05, 2009 9:53 pm

Re: stockfish fail high fail low

Post by QED »

Now I have a signature.

Considering razoring, I think that the most optimal solution should not introduce unstability. But I am afraid that the most optimal solution would compute razor margins in evaluation (as well as other search parameters such as futility margin etc). And I have no idea how to do it (yet).
Testing conditions:
tc=/0:40+.1 option.Threads=1 option.Hash=32 option.Ponder=false -pgnin gaviota-starters.pgn -concurrency 1 -repeat -games 1000
hash clear between games
make build ARCH=x86-64 COMP=gcc
around 680kps on 1 thread at startposition.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: stockfish fail high fail low

Post by mcostalba »

QED wrote:But I am afraid that the most optimal solution would compute razor margins in evaluation (as well as other search parameters such as futility margin etc).
I think this is a bad idea. Evaluation score is stored in TT where is accessed during search by nodes at different depths and you may want evaluation score to be independent from depth, move count, etc..

You really want to disjoint the static evaluation of a position from the pruning parameters used during the search.

So I think evaluation should remain absolutely indiependent from any search parameter.

This is just my idea of course :-)
QED
Posts: 60
Joined: Thu Nov 05, 2009 9:53 pm

Re: stockfish fail high fail low

Post by QED »

Marco Costalba wrote:I think this is a bad idea. Evaluation score is stored in TT where is accessed during search by nodes at different depths and you may want evaluation score to be independent from depth, move count, etc..

You really want to disjoint the static evaluation of a position from the pruning parameters used during the search.
I think evaluation does not need to be only about score.

For me, it is clear that when position is quiet, balanced and boring, we can prune heavily. But when there are pawns racing or kings in danger, we should be careful not to miss something. These factors are computed during evaluation (score), but thay can be used also for search decisions (via some other formula).

So evaluation could compute besides score also various search thresholds (arrays, by depth) to make search smarter. Maybe there is already a verb for static evaluation of search 'constants'.
Testing conditions:
tc=/0:40+.1 option.Threads=1 option.Hash=32 option.Ponder=false -pgnin gaviota-starters.pgn -concurrency 1 -repeat -games 1000
hash clear between games
make build ARCH=x86-64 COMP=gcc
around 680kps on 1 thread at startposition.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: stockfish fail high fail low

Post by mcostalba »

QED wrote: I think evaluation does not need to be only about score.

For me, it is clear that when position is quiet, balanced and boring, we can prune heavily. But when there are pawns racing or kings in danger, we should be careful not to miss something. These factors are computed during evaluation (score), but thay can be used also for search decisions (via some other formula).

So evaluation could compute besides score also various search thresholds (arrays, by depth) to make search smarter. Maybe there is already a verb for static evaluation of search 'constants'.
Now is more clear to me what you suggest. You are suggesting to allow the search to access some side values calculated during evaluation apart from position score.

Well, one of this is already used and is kingDanger that is calculated for evaluation the position but at the end is not discarded, but kept so that search can access this and use in futility pruning formula.