Futility Pruning Issues

Cheney · Post by **Cheney** » Tue Jul 07, 2020 2:14 pm

Hi,

I have been trying to implement Futility pruning (FP) for some time now and have not had much success. I have read almost every post linked on CPW for this subject. I have tried all kinds of variations but when my base engine plays the one with FP, the FP version loses. However, when I compare test searches, the FP version seems better.

A simple background: I use negamax with PVS, null move pruning (no mate threat detection but think I need it), transposition table, and a lot of move ordering tuning.

After various observations of "bad games", I have come to this process which seems to have the least negative impact:

Code: Select all

move Search(board, depth, ply, alpha, beta, state)
    (1) … check TT, broken rules, if horizon, perform null move
    
    (2) Flag the node:
    if(!inPVNode && !inCheck && !SearchingNullMove && abs(alpha) < MATE-MIN && MyPieceScore > RookValue && depth <=2)
        s_eval = evaluate(board)
        futileflag = (s_eval + futilemargine[depth] < alpha)
    
    (3) Play out the moves
    while(move=getnextmove)
        if(futileflag && movesmade > 4 && !move.Checking && !move.Capture && !move.SpecialPawn && !move.Hash
          continue; // Skip the move
        
        score = play(move)
        
*NOTE* 
MyPieceScore does not account for pawns, it is the value of all pieces for the side to move. 
Also, evaluate is the full evaluation which I have nothing special:
   PSQ, mobility, basic pawn structure, and king safety.

For testing, I perform searches over several test positions. When I compare the base version to the FP version:
* Around 90% of the different positions get to the same depth but FP gets there faster as nodes played decreases. It looks to be around 20% faster.
* Around 5% of these positions, FP gets to one additional ply.
* The rest of the searches, FP still reaches the same depth but a bit slower.

I have tested adjusting the margins and even flagged more moves as not futile. I am under the assumption if this was implemented correctly then I should reach an additional ply on most positions (which I am not). If I over adjust then I just lose speed and in some cases, I lose a ply.

To debug this, instead of pruning the move, I let it play and if the returned score beat alpha, I logged the position with all the values.

Here's a position reached:

[d]4nrk1/2p2p1p/r1p1b1pQ/3pP3/3R4/P1q5/1P3PPP/5RK1 w - - 0 3

Here's the data:
Depth: 2 PLY: 6
Move: d4h4 - Rh4 Flagged: History Move
S_Eval: -576
Margin: 650
Alpha: 382
Score: 390
MoveNumber: 23 (*This is a late move)

As you can see, this silent move would have been pruned but leads to a big swing in score. If the engine could only reach one more ply then it would find this move at depth 3. But, what I have seen is IF the search can reach one more ply and find a late move like this then the search seems to explode a bit and the branching factor increases.

The only thing I could think of to do here was to validate any move before pruning it. So, I perform a QSearch and if that score does not beat alpha then prune it. With this in play, the engine is slightly faster, still does not reach an additional ply, but when playing the base version in games, the FP engine is seriously winning. Interesting, WAC 300 tests are losing by about 30 positions with this flavor of FP.

Here is my dilemma: What I am doing with this qsearch seems more like a reduction than pruning. I'd like to get the natural pruning to work but am not sure what I am doing wrong. Can anyone shed some light on this?

Thank you for the help

hgm · Post by **hgm** » Tue Jul 07, 2020 2:50 pm

The sample position is not very shocking. A difficult-to-combat mate threat through a quiet move, no simple algorithm can foresee that, so of course you will make the wrong call. The idea behind futility pruning is that such things should be very rare, so that searching on just in case you might find them is a wild goose chase.

I guess a lot depends on the margins. You seem to use a marging 650 here. Isn't that overly cautious? Basically that says that you don't believe a quiet move can forceibly gain a Queen, but you don't want to exclude it can force gain of a Rook plus a Pawn, or a Queen for a minor. Both of that still seems extremely unlikely. Even the chances that a random quiet move will forcibly gain you a minor seem very bleak. Even if the move happen to attack a minor, what are the chances he cannot simply move it away, if it wasn't already protected to begin with?

Cheney · Post by **Cheney** » Tue Jul 07, 2020 3:51 pm

Ty for the input

The interesting thing about that move is it had a high history score. One test I performed without the qsearch validation was to only prune moves not matched by any sorting method. In other words, since history is the last score applied then if the score is zero, look to prune the move. Although this decreases the amount of prunes, the engine still loses against its base version.

To put one measurement into perspective, without the qsearch validation, I may see a position score 100K+ prunes but with the validation, only 5K. There are that many moves which, if searched, beat alpha according to the qsearch.

As for the margin, with the qsearch validation, I am looking at the last three plies and using these margins:
fmargin[4] = { 0, 100, 150, 300 }

Without the qsearch validation, I did try traditional FP, only at the frontier node, just to try to get that to work and I have use margines 100, 200, and 325 when depth=1.

I have also tried setting the flag like this (both with I tried traditional FP and extended FP):
futileflag = (s_eval + board.GetPieceValueByRank(enemy, bestpiece) + fmargin[depth] < alpha)

… but I believe that, too, may be too cautious.

All of these seem to increase speed but the engine loses playing the base version.

I am thinking the next items I may want to test are:

1. Instead of obtaining the static eval to set the flag, maybe I'll perform the qsearch there and use that score.

2. Maybe performing this qsearch there under certain conditions.

3. Maybe my move ordering is too diverse. For silent moves, I sort in this order:
* Killers
* Moves that attack (forks, discoveries, pinned pieces)
* Promotions
* Pawns to 6th or 7th rank or pass pawn pushes.
* King moves (castles)
* Counter Move Heuristic
* History.
... so the thought is to reduce this to just killer, attacks, and history and then use these other sort styles as flags for pruning or not.

4. I can go back to trying to tune just the margins and get more data dumps, looking for positions that were not leading to mate threats. Maybe I fixed a bug along the way (which I did find some) and I need to go back to square one

In the end, when I was not using the qsearch validation, it really looked to me that the engine was not reaching an additional ply more than the base engine, a move that was pruned at depth 1 or 2 was a good move, found late at depth 3, and that changed the PV and increased search time.

hgm · Post by **hgm** » Tue Jul 07, 2020 4:55 pm

I don't understand this qsearch validation. At d<=1 moves are futile when they do not get the eval above alpha, because the opponent will simply stand pat using the eval score to fail high. If you use a large-enough margin at d<=1 to prevent non-captures from rising the eval to above alpha. With a d=2 search there can still be hope to beat alpha from an eval very much below it for any move, because you you get to do a second move as the first ply of QS. But a QS should fail low automatically.

Cheney · Post by **Cheney** » Tue Jul 07, 2020 9:59 pm

After reading your points, I see a definite mistake. I am performing the qsearch for each move I want to prune and yet I am not making that move; thus, I am repeating the same qsearch for every prunable move in that node.

I guess where I was trying to go with the qsearch validation was to avoid pruning moves similar to the one in the position above. I know you say those types of positions should be rare. However, for some reason, the engine with FP is at or lower than 50/50 with the base version if I do not perform a qsearch to obtain the real score. In the end, the eval of a position is not the same as that returned if I perform a qsearch on that position.

I have read comments about pruning is dangerous but also read "prune like there is no tomorrow"

, but when I lower the margins, I do not seem to reach the next ply to compensate and can only expect my chances for errors increases.

At this point, I have reverted back to the standard method and am testing.

Here's another question - I have to expect FP works for both fast games and slow games. Is this accurate or is there a particular way I should test this out? For example, TC=0:01+0.8 or perform a fixed nodes test? Since it is a search enhancement, I am performing the timed games.

Cheney · Post by **Cheney** » Wed Jul 08, 2020 2:09 pm

Problem solved

After going back to the standard implementation of FP and Extended FP, I realized a bug I fixed but never tested under the standard deployment - it was only tested using the qsearch deployment. The bug was to flag any move that generated check - this was performed when generating a move sorting value. This method for identifying if a move delivers or discovers check was nested in an if/then/else and would not apply to any other moves. Needless to say, it does now.

With current margins at {100,150}, the engine is +25 elo after 2000 games. I will look towards tuning these numbers a bit more and maybe trying deep FP.

Futility Pruning Issues

Futility Pruning Issues

Re: Futility Pruning Issues

Re: Futility Pruning Issues

Re: Futility Pruning Issues

Re: Futility Pruning Issues

Re: Futility Pruning Issues