MCTS-like algorithm with alpha-beta rollouts

RedBedHed · Post by **RedBedHed** » Mon Jan 23, 2023 6:38 am

Hey all!

I've been working on a new-ish algorithm based on the paper https://www.semanticscholar.org/paper/P ... 82d4740db9 and the engine "Scorpio MCTS." It uses alpha-beta rollouts in an iterative deepening fashion, taking advantage of the classical dynamic move-ordering scheme.

Unlike Scorpio, it uses regular backtracking alpha-beta for IID and searches with null windows, like the scout searches and null move searches. It also uses regular quiescence search. It doesn't reuse the tree, as reuse only seems to slow things down. Instead, it passes the tree from each iteration to a garbage collection thread running in the background.

It can play at about the same strength as the classical algo, but I think it has a unique advantage:

Printing a PV line from any node is as simple as following a chain of pointers through the part of the game tree stored in heap memory. And, for a 16-ply search, this tree can contain fewer than 500,000 nodes. I have yet to try any other method for multi-PV printing... is there a better method out there?

The last big thing I need to add to these algorithms is "draw by repetition" detection. However, when I try to handle this by iterating through the history and detecting the first identical position, the playing strength of my engine drops noticeably. Is there something that I am missing? (This code is a little ugly; please don't judge it too harshly, lol)

Code: Select all

    inline int repeating(Board* b, int ply) {
        int i = ply;
        State* state = b->getState();
        uint64_t key = state->key;
        State* s = state->prevState; 
        while(s && i-- > 0) {
            if(s->key == key)
                return i;
            s = s->prevState;
        }
        return 0;
    }

Ras · Post by **Ras** » Mon Jan 23, 2023 7:11 am

RedBedHed wrote: ↑Mon Jan 23, 2023 6:38 amiterating through the history and detecting the first identical position

You can decrement i by 2 because an identical position means that the side to move must be the same, but you can end prematurely when you encounter a move such as a capture or pawn push. I'm not sure whether your code does that, but you may need to go back further than the root position, i.e. also include the positions of the game history as transmitted by the moves list in the UCI go command.

RedBedHed · Post by **RedBedHed** » Mon Jan 23, 2023 8:27 am

Ras wrote: ↑Mon Jan 23, 2023 7:11 am
RedBedHed wrote: ↑Mon Jan 23, 2023 6:38 amiterating through the history and detecting the first identical position
You can decrement i by 2 because an identical position means that the side to move must be the same, but you can end prematurely when you encounter a move such as a capture or pawn push. I'm not sure whether your code does that, but you may need to go back further than the root position, i.e. also include the positions of the game history as transmitted by the moves list in the UCI go command.

Hmmm that does make sense... I did initially try going back 2 states at a time... Is it normal for these methods to affect playing strength?

RedBedHed · Post by **RedBedHed** » Mon Jan 23, 2023 8:31 am

RedBedHed wrote: ↑Mon Jan 23, 2023 6:38 am Hey all!

I've been working on a new-ish algorithm based on the paper https://www.semanticscholar.org/paper/P ... 82d4740db9 and the engine "Scorpio MCTS." It uses alpha-beta rollouts in an iterative deepening fashion, taking advantage of the classical dynamic move-ordering scheme.

Unlike Scorpio, it uses regular backtracking alpha-beta for IID and searches with null windows, like the scout searches and null move searches. It also uses regular quiescence search. It doesn't reuse the tree, as reuse only seems to slow things down. Instead, it passes the tree from each iteration to a garbage collection thread running in the background.

It can play at about the same strength as the classical algo, but I think it has a unique advantage:

Printing a PV line from any node is as simple as following a chain of pointers through the part of the game tree stored in heap memory. And, for a 16-ply search, this tree can contain fewer than 500,000 nodes. I have yet to try any other method for multi-PV printing... is there a better method out there?

The last big thing I need to add to these algorithms is "draw by repetition" detection. However, when I try to handle this by iterating through the history and detecting the first identical position, the playing strength of my engine drops noticeably. Is there something that I am missing? (This code is a little ugly; please don't judge it too harshly, lol)
Code: Select all
    inline int repeating(Board* b, int ply) {
        int i = ply;
        State* state = b->getState();
        uint64_t key = state->key;
        State* s = state->prevState; 
        while(s && i-- > 0) {
            if(s->key == key)
                return i;
            s = s->prevState;
        }
        return 0;
    }

Edit: looks like I spoke a little too soon. The PVs are incomplete for most children of the root... And that makes sense because of the scout searches.

MCTS-like algorithm with alpha-beta rollouts

MCTS-like algorithm with alpha-beta rollouts

Re: MCTS-like algorithm with alpha-beta rollouts

Re: MCTS-like algorithm with alpha-beta rollouts

Re: MCTS-like algorithm with alpha-beta rollouts