Toubles with pv and reporting best move

KhepriChess · Post by **KhepriChess** » Fri Mar 24, 2023 5:33 pm

Yesterday I came across a bug where I wasn't recording (and subsequently reporting) the best move from previous searches in the iterative deepening loop. The result was if a search terminated before completing, sometimes the best move reported was whatever the best move of that depth currently being search (when the search stopped) instead of the first move of the PV line from the last completed depth.

Basically, if I had a search going that returned:

Code: Select all

info depth 10 score cp 11 nodes 576553 nps 264110 hashfull 28 time 2183 pv e2d1 h7h6 b2b4 a5b6 f2f4 g5h7 b4b5 a6b8 a1c1 d8c8

But then while searching at depth 11, the search terminated, the move returned wasn't always "e2d1" (or whatever that first move was at that last depth).

So first, I tried fixing it by storing whatever move was at the front of the PV (pvArray[0][0]) in a variable at the end of the iterative deepening loop. That resulted in a non-insignificant (double digits) Elo loss. Then I tried changing up the PV-scheme completely, but that had the same outcome.

Is that expected? Would using the "best so far" move really be that much better than using just the best move from last fully searched depth?

JoAnnP38 · Post by **JoAnnP38** » Fri Mar 24, 2023 5:45 pm

I am assuming you are doing iterative deepening. If so, I would recommend that after every depth is completed to just save the completed PV. Only report PVs from completed searches. This means that if a search terminates prematurely due to time or receiving a uci stop, then just discard the half-baked PV.

KhepriChess · Post by **KhepriChess** » Fri Mar 24, 2023 5:53 pm

JoAnnP38 wrote: ↑Fri Mar 24, 2023 5:45 pm I am assuming you are doing iterative deepening. If so, I would recommend that after every depth is completed to just save the completed PV. Only report PVs from completed searches. This means that if a search terminates prematurely due to time or receiving a uci stop, then just discard the half-baked PV.

Right, that's what I wasn't doing, but in fixing my code to actually do that I get double digits loss in rating (in self-play). Is that just something I have to live with? Technically doing it the wrong way (reporting the "best so far") wasn't causing any illegal moves to be played, so if it results in stronger play is it actually bad to do it that way?

JoAnnP38 · Post by **JoAnnP38** » Fri Mar 24, 2023 6:27 pm

KhepriChess wrote: ↑Fri Mar 24, 2023 5:53 pm Technically doing it the wrong way (reporting the "best so far") wasn't causing any illegal moves to be played, so if it results in stronger play is it actually bad to do it that way?

Interesting. When I abort a search, I return 0 so I lose any notion at all of whether the "best so far" scored better than the "best that we know." Maybe you are on to something I hadn't considered. What do you do, if the score for the "best so far" is lower than the score returned from your previous completed searches?

KhepriChess · Post by **KhepriChess** » Fri Mar 24, 2023 6:45 pm

JoAnnP38 wrote: ↑Fri Mar 24, 2023 6:27 pm
KhepriChess wrote: ↑Fri Mar 24, 2023 5:53 pm Technically doing it the wrong way (reporting the "best so far") wasn't causing any illegal moves to be played, so if it results in stronger play is it actually bad to do it that way?
Interesting. When I abort a search, I return 0 so I lose any notion at all of whether the "best so far" scored better than the "best that we know." Maybe you are on to something I hadn't considered. What do you do, if the score for the "best so far" is lower than the score returned from your previous completed searches?

Well, I've tried a few things. Originally, I was returning an "abort" score of 500,000 which is much greater than my infinity score (for reference, I'm using 50,000 as Infinity). But I've also tried 0, minus Infinity, and -500,000. All have the same result.

For example, this is what I see:

Code: Select all

info depth 7 score cp 6 nodes 109921 nps 262968 hashfull 4 time 418 pv d3a6 a5a6
bestmove e2h5

Search was aborting during depth 8 and notice the best move doesn't match the last PV. I do want to reiterate, this doesn't always happen. If anything, considering the number of searches happening, it's actually relatively uncommon.

algerbrex · Post by **algerbrex** » Sat Mar 25, 2023 2:02 pm

You could use a different scheme to collect the PV. The approach I use is probably suboptimal, but in practice, it's worked pretty well for me. At the beginning of the iterative deepening loop, I declare a dynamic array (although it doesn't need to be dynamic, it just needs to be set to a length long enough to accompany the longest possible PV your search could return), and pass it into the first call to the negamax search routine by reference. I've just wrapped the PV in a struct to tack on some convenience functions:

Code: Select all

func (search *Search) RunSearch() uint32 {
    pv := PVLine{}
    bestMove := NullMove
    ...
    for depth := int8(1); depth <= MaxPly && depth <= search.Timer.MaxDepth; depth++ {
        pv.Clear()
        ...
        score := search.negamax(depth, 0, -Infinity, Infinity, &pv, false)
        ...
        if search.Timer.IsStopped() {
            if bestMove == NullMove && depth == 1 {
                bestMove = pv.GetBestMove()
            }
            break
        }
        ...
        bestMove = pv.GetBestMove()

Then each recursive call to the negamax search routine has it's own child PV array created and then reused so that the child node can record and store its PV. Whenever we find a new best move (alpha is raised), I call a method to take the PV array of the current node, clear its first move, and re-build the array by tacking on the new best move found, and then the new PV from the child associated with it at the end:

Code: Select all

func (search *Search) negamax(depth int8, ply uint8, alpha, beta int16, pv *PVLine, doNMP bool) int16 {
    ...
    childPV := PVLine{}
    ...
    for move := moveGen.Next(); !equals(move, NullMove); move = moveGen.Next() {
        ...
        score = -search.negamax(depth-1, ply+1, -beta, -alpha, &childPV, false)
        ...
        if bestScore > alpha {
            alpha = bestScore
            nodeType = PVNode
            pv.Update(move, childPV)
        }

        childPV.Clear()
    }
    ...
}

The important part is at the end of searching a move in the move loop, is for me to clear the childPV, so the child node gets a fresh PV to start building from.

As I said, I recognize this method is not the most efficient, and at some point, I plan on revisiting things to speed them up. But as I said, it's worked well so far and from testing it's not a significant speed penalty. Although if you're curious I can run a short match with two versions of Blunder, one having the PV code one without, and see if there's a statistically significant Elo loss like you mentioned encountering for your orginal solution.

Ras · Post by **Ras** » Sat Mar 25, 2023 2:06 pm

KhepriChess wrote: ↑Fri Mar 24, 2023 5:33 pmWould using the "best so far" move really be that much better than using just the best move from last fully searched depth?

Let's assume you have completed depth 11 and got move X as best move. You would search that first at depth 12. Now if before completing depth 12, you already found another move Y that beats move X, then of course you should return that move. If you were to complete depth 12, you might have found move Z that would be even better than Y, but that doesn't change the fact that Y was already better than X.

KhepriChess · Post by **KhepriChess** » Sat Mar 25, 2023 6:18 pm

algerbrex wrote: ↑Sat Mar 25, 2023 2:02 pm You could use a different scheme to collect the PV. The approach I use is probably suboptimal, but in practice, it's worked pretty well for me. At the beginning of the iterative deepening loop, I declare a dynamic array (although it doesn't need to be dynamic, it just needs to be set to a length long enough to accompany the longest possible PV your search could return), and pass it into the first call to the negamax search routine by reference.

That was actually one of the methods I tried. But it seems like actually gathering the PV isn't the issue. It's basically what Ras said in his comment.

I'm curious why you think that method is suboptimal? Seems to work the same as other methods, in that you end up with the pv (maybe it wouldn't work for multi-pv?).

Ras wrote: ↑Sat Mar 25, 2023 2:06 pm
KhepriChess wrote: ↑Fri Mar 24, 2023 5:33 pmWould using the "best so far" move really be that much better than using just the best move from last fully searched depth?
Let's assume you have completed depth 11 and got move X as best move. You would search that first at depth 12. Now if before completing depth 12, you already found another move Y that beats move X, then of course you should return that move. If you were to complete depth 12, you might have found move Z that would be even better than Y, but that doesn't change the fact that Y was already better than X.

Ya, that's exactly what's happening. I just wasn't (am not?) sure if there are any issues, that I'm unaware of, in just playing "move Y" like that.

The more I've thought about the more I don't see an issue with doing that, especially if my engine is stronger doing so. Sure, there might be the occasional issue where move Y is worse than move X (since it couldn't search move Y to the full depth), but if the overall net is positive why not do it that way?

Ras · Post by **Ras** » Sat Mar 25, 2023 6:53 pm

KhepriChess wrote: ↑Sat Mar 25, 2023 6:18 pmSure, there might be the occasional issue where move Y is worse than move X (since it couldn't search move Y to the full depth)

If you return your "dynamic" alpha when time is over, that won't be an issue. If you are still searching through the replies for move Y, then the next deeper level will return its alpha, which then will appear as beta value at the Y (i.e. root) level. So in case Y has not been searched completely, it won't override move X, but appear as beta cut-off instead. Besides, if you do this time-over return before updating the local line in the recursive function, move Y will never change into the PV line in the first place.

Toubles with pv and reporting best move

Toubles with pv and reporting best move

Re: Toubles with pv and reporting best move

Re: Toubles with pv and reporting best move

Re: Toubles with pv and reporting best move

Re: Toubles with pv and reporting best move

Re: Toubles with pv and reporting best move

Re: Toubles with pv and reporting best move

Re: Toubles with pv and reporting best move

Re: Toubles with pv and reporting best move