Verification of pruning techniques

zd3nik · Post by **zd3nik** » Sun Oct 04, 2015 11:58 pm

I've had a lot of trouble getting much value out of razoring and delta pruning, both techniques I classify as "alpha pruning" or "fail low pruning". The general idea behind both techniques is that if you're in a situation that looks likely to "fail low" in an alpha/beta search framework just return alpha. I'm over-simplifying of course, but that's the gist of it.

The main difference between razoring and delta pruning being that razoring occurs in the primary search routine when remaining search depth is low and delta pruning occurs in quiescence search. I'm leaving out other details of course, but hopefully that sums it up more or less accurately.

As I've tinkered with chess engine development of the years I have never seen much in the way of good results from these two techniques no matter how much I've tweaked deltas or pre-requisites. So either I'm doing something wrong or there are certain caveats involved in using these techniques that I'm unaware of.

I was wondering if anyone familiar with these techniques that has actually measured real success in using them would care to scrutinize the following code to help me figure out where I'm going awry.

This is the delta pruning code inside the move loop of quiescence search:
Note: 'depth' in this routine will be <= 0 and the 'child' variable is the destination position of the move being executed.

Code: Select all

      Exec<color>(*move, *child);
      if (!(checks | child->checks | (move->Promo() == (color|Queen)) | (abs(alpha) >= WinningScore)) &
          ((standPat + ValueOf(move->Cap()) + _qsearchDelta[-depth]) < alpha))
      {
        _stats.deltaCount++;
        Undo<color>(*move);
        continue;
      }
      move->Score() = -child->QSearch<!color>(-beta, -alpha, (depth - 1));
      Undo<color>(*move);

And this is the razoring routine inside the primary search function, which executes after transposition table heuristics and before any moves are generated/searched. This is basically a direct copy of what's being done in stockfish. I make it a rule to never copy other engines directly like this, but I resorted to copying stockfish's implementation of this routine because none of my own versions of this heuristic seemed to work well. But this hasn't worked out any better for me - it may be even worse.
Note: 'parent' is the position prior to this one.

Code: Select all

    // forward pruning prerequisites
    if (pvNode || !(pruneOK & !checks & (depthChange <= 0))) {
      goto search_main;  // yeah that's a goto, horrible I know
    }

    // razoring (fail low pruning)
    if ((depth < 4) & (abs(alpha) < WinningScore) &
        !parent->checks & ((eval + _razorDelta[depth]) <= alpha))
    {
      _stats.rzrCount++;
      if ((depth <= 1) && ((eval + _razorDelta[3 * depth]) <= alpha)) {
        _stats.rzrEarlyOut++;
        return QSearch<color>(alpha, beta, 0);
      }
      const int ralpha = (alpha - _razorDelta[depth]);
      const int val = QSearch<color>(ralpha, (ralpha + 1), 0);
      if (_stop) {
        return beta;
      }
      if (val <= ralpha) {
        _stats.rzrCutoffs++;
        return val;
      }
    }

The values of the _qsearchDelta and _razorDelta arrays area initialized with the following values:

Code: Select all

_qsearchDelta[] = { 1550, 500, 305, 237, 206, 188, 178, 171, 167, 164, 161, 159, 158, 157, 156, 155, 154, 154, 153, 153, ...} 
_razorDelta[] = { NOT_USED, 544, 640, 800, 1024 }

I've done some small bullet and 5 min blitz round robins and the engines with these pruning techniques enabled do consistently worse even than the engine with "NO" pruning techniques enabled at all.

I'm running a larger round-robin now. This one will take days to complete, but so far the engines with these techniques enabled are solidly in last place - even below the instance of my engine with "NO" forward pruning of any kind.

Ferdy · Post by **Ferdy** » Mon Oct 05, 2015 4:57 am

zd3nik wrote:

Code: Select all

// razoring (fail low pruning) 
    if ((depth < 4) & (abs(alpha) < WinningScore) & 
        !parent->checks & ((eval + _razorDelta[depth]) <= alpha))

Perhaps try to lower depth more, say (depth < 2), it is possible the eval is not able to approximate the correct evaluation of the position even when using margin,
or maybe the average depth reached by the engine is not enough for it to successfully prune the position.

zd3nik · Post by **zd3nik** » Mon Oct 05, 2015 5:44 am

Ferdy wrote:
zd3nik wrote:
Code: Select all
// razoring (fail low pruning) 
    if ((depth < 4) & (abs(alpha) < WinningScore) & 
        !parent->checks & ((eval + _razorDelta[depth]) <= alpha))
Perhaps try to lower depth more, say (depth < 2), it is possible the eval is not able to approximate the correct evaluation of the position even when using margin

That would limit razoring to nodes where depth == 1 since in the primary search function depth is always > 0. So it would always go into this branch:

Code: Select all

      if ((depth <= 1) && ((eval + _razorDelta[3 * depth]) <= alpha)) { 
        _stats.rzrEarlyOut++; 
        return QSearch<color>(alpha, beta, 0); 
      }

I'll give it a try, plus I will try different delta values. That's not too different than some of the experimentation I've already done. But perhaps restricting each test to one specific depth + delta combination at a time I can better determine which specific combinations are good or bad.

The bit about positional eval not being able approximate the correct evaluation of the position is interesting. I tend to use very simple evaluation functions in my engines but it's the same basic stuff a lot of engines use: pawn structure, piece/square table, piece mobility, king pawn shield/storm eval, king tropism, draw recognition, etc. I'm pretty confident this is not a problem pertaining to the engine's positional evaluation. Plus I'm using larger than average margins to compensate for that possibility. Even at depth 1 the margine is over the value of a rook. So the side to move would have to be a rook down and qsearch unable to find even the win of a pawn in order for razoring to kick in.

Ferdy wrote:or maybe the average depth reached by the engine is not enough for it to successfully prune the position.

I'm not sure I understand what you mean here. The average depth obtained by the engine is more than sufficient to allow for razoring. For example, here are the delta pruning and razoring stats after a 9-ply search from the standard start position (early out is when razoring depth == 1 and it drops directly into qsearch due to being extremely far below alpha:

Code: Select all

529382 delta pruned (7.99542%)
17177 razor attempts, 10685 early out (62.2053%), 3607 cutoffs (20.999%)

Ferdy · Post by **Ferdy** » Mon Oct 05, 2015 6:44 am

zd3nik wrote:
Ferdy wrote:or maybe the average depth reached by the engine is not enough for it to successfully prune the position.
I'm not sure I understand what you mean here. The average depth obtained by the engine is more than sufficient to allow for razoring. For example, here are the delta pruning and razoring stats after a 9-ply search from the standard start position (early out is when razoring depth == 1 and it drops directly into qsearch due to being extremely far below alpha:
Code: Select all
529382 delta pruned (7.99542%)
17177 razor attempts, 10685 early out (62.2053%), 3607 cutoffs (20.999%)

Sorry my statement is not clear.
What I mean is, there is a certain average depth of the engine (in combination with the eval quality - low quality eval needs more average depth)
where pruning every depth < 4 (I propose to lower it to depth < 2, for trial) could not be effective.
Something like,

Code: Select all

ave depth 8, use depth < 2
ave depth 9, use depth < 2
ave depth 10, use depth < 2
ave depth 11, use depth < 3
ave depth 12, use depth < 3
...
ave depth 18, use depth < 4

Pruning at depth 2 or 3 could probably deteriorate the search score quality if the ave depth reached by the engine is somewhat low. Probably a quality eval would compensate this on some positions.

zd3nik · Post by **zd3nik** » Mon Oct 05, 2015 7:10 am

Ferdy wrote: Sorry my statement is not clear.
What I mean is, there is a certain average depth of the engine (in combination with the eval quality - low quality eval needs more average depth)
where pruning every depth < 4 (I propose to lower it to depth < 2, for trial) could not be effective.
Something like,
Code: Select all
ave depth 8, use depth < 2
ave depth 9, use depth < 2
ave depth 10, use depth < 2
ave depth 11, use depth < 3
ave depth 12, use depth < 3
...
ave depth 18, use depth < 4
Pruning at depth 2 or 3 could probably deteriorate the search score quality if the ave depth reached by the engine is somewhat low. Probably a quality eval would compensate this on some positions.

Ahh. Thanks for the clarification.

In this early stage of the engine's development I've been testing mostly using bullet time control, so the average depth has been around 10.

I understand how simpler position eval could have an impact on razoring. But I don't understand how average depth has anything to do with it. I'll just take your word for it and see what happens in my next round of tests with the razoring depth threshold set at different levels.

abulmo · Post by **abulmo** » Mon Oct 05, 2015 9:44 am

Why do people use a leading underscore to name variables in their programs?

zd3nik wrote:

Code: Select all

        _qsearchDelta[-depth]
        _stats.deltaCount
        _razorDelta[depth]

According to both C and C++ standards, such name are reserved to the implementation. For example the C++ (last draft) standard says:

§2.10 (3.2) Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

hgm · Post by **hgm** » Mon Oct 05, 2015 10:11 am

zd3nik wrote:

Code: Select all

      Exec<color>(*move, *child);
      if (!(checks | child->checks | (move->Promo() == (color|Queen)) | (abs(alpha) >= WinningScore)) &
          ((standPat + ValueOf(move->Cap()) + _qsearchDelta[-depth]) < alpha))
      {
        _stats.deltaCount++;
        Undo<color>(*move);
        continue;
      }
      move->Score() = -child->QSearch<!color>(-beta, -alpha, (depth - 1));
      Undo<color>(*move);

I suppose 'checks' here means that you are in check in the current node, so that this move really is an evasion, and that child->checks means the move delivers check?

The condition looks correct. The test for if alpha exceeds WinningScore seems redundant; I presume you could never exceed WinningScore with a score derived frm the static eval, so the part after & would already take care of that. As you use & instead of && the latter test would be done anyway, and even if not, the abs(alpha) test doesn't seem any cheaper.

I am also not sure why you would exempt check evasions. The idea of futility pruning is that the move is guaranteed to fail low to a stand-pat reply. That is just as true for check evasions.

I also don't see the justification for making the qsearchDelta depth dependent. No matter what depth you are already on, you only try to estimate the score after this single move. How much that score can change due to non-material terms not taken into account should not depend on how deep in QS you are.

The point could be that it is not clear what this delta pruning actually saves you. You still make the pruned moves, and you still have to unmake them. Not doing the pruning at all would save you the effort of the test (which is also done for all non-pruned moves!), and the moves eligible to pruning would run into an immediate stand pat, possibly based on a lazy evaluation because of the margin.

This pruning would start to make sense if it saved you the work of the make/unmake. Or better yet, when you search the captures in MVV order, as soon as you reach the first futile victim, immediately return from the node with a fail-low score rather than keep running through the remaining captures to see if they are also futile (which they certainly will be), and then making and unmaking them nevertheless, just for fun.

Note that this really helps in d=1 nodes, where also the non-captures would be futile. And there usually are a lot of non-captures.

If you want to exempt moves that deliver check from futility pruning, it would be best to have a selective generator of checks, rather than tokeep looping through all moves to test whether they deliver check. Then, at the point where you reach futile victims, you could continue with any HxL captures that were postponed (and thus not futile) based on their SEE, although one could argue that currEval + SEE + margin < alpha is also not worth trying, and then continue with a checks-only generator to finish the node.

hgm · Post by **hgm** » Mon Oct 05, 2015 12:14 pm

zd3nik wrote:

Code: Select all

    // forward pruning prerequisites
    if (pvNode || !(pruneOK & !checks & (depthChange <= 0))) {
      goto search_main;  // yeah that's a goto, horrible I know
    }

    // razoring (fail low pruning)
    if ((depth < 4) & (abs(alpha) < WinningScore) &
        !parent->checks & ((eval + _razorDelta[depth]) <= alpha))
    {
      _stats.rzrCount++;
      if ((depth <= 1) && ((eval + _razorDelta[3 * depth]) <= alpha)) {
        _stats.rzrEarlyOut++;
        return QSearch<color>(alpha, beta, 0);
      }
      const int ralpha = (alpha - _razorDelta[depth]);
      const int val = QSearch<color>(ralpha, (ralpha + 1), 0);
      if (_stop) {
        return beta;
      }
      if (val <= ralpha) {
        _stats.rzrCutoffs++;
        return val;
      }
    }

Btw, this code makes very little sense to me. I know razoring in the definition of Heinz, where it is just futility pruning at d=3 with a larger margin. I.e. it would be applied to the individual moves, and not to the node as a whole, unless you could prove there were no moves that could possibly beat the margin.

I don't understand how with depth <= 1 razorDelta[0] would be NOT_USED. How much is 3*depth if depth=0? And if depth can only be 1, why not use depth[3] or a hard-coded 800 here? The depth <= 1 part seems regular futility pruning, not razoring, but implemented in an inefficient way, by delegating it to QS. In the first place this would re-calculate eval, while this was established to be already so much below alpha that there is no hope for a stand-pat at all. (But perhaps it relies on an eval cache to ameliorate this waste of time.) In the second place it would still test all the individual captures (after making them) to see if they get close to alpha, despite the fact that they start from a position hugely below alpha. Basically only capture of a Queen or promotion could beat the margin of 800. And there might not even been a Queen on the board, so that the entire exercise is a pure waste of time.

Michel · Post by **Michel** » Mon Oct 05, 2015 12:34 pm

The depth <= 1 part seems regular futility pruning, not razoring, but implemented in an inefficient way, by delegating it to QS.

This is copied from Stockfish. The idea is that all quiet moves will be futility pruned, so you may just as well jump directly into quiescence search.

This was tested to contribute elo. Moreover Lucas tried already twice to remove it with a no-regression test, and both times it failed.

hgm · Post by **hgm** » Mon Oct 05, 2015 12:50 pm

It is well-known that d=1 or even d=2 nodes and QS nodes are often the same. It seems very strange, though, to subject that notion to a ridiculously large margin of 800. How much do you have to be below alpha to be sure that non-captures are not going to cut it?

If you want to implement this by calling QS, it would be better to do it before wasting time on caluclating eval in this node, and base it on a much smaller margin.

Whether handling the futility pruning directly in the d=1 node itself would work better is of course dependent on the quality of the implementation. If that would not 'bulk-prune' all non-captures, but subject them to a futility test individually, you of course jumped from the frying pan into the fire. Normally you would bulk-prune as soon as you reach the first non-futile victime.

Verification of pruning techniques

Verification of pruning techniques

Re: Verification of pruning techniques

Re: Verification of pruning techniques

Re: Verification of pruning techniques

Re: Verification of pruning techniques

Re: Verification of pruning techniques

Re: Verification of pruning techniques

Re: Verification of pruning techniques

Re: Verification of pruning techniques

Re: Verification of pruning techniques