PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

RedBedHed · Post by **RedBedHed** » Thu Nov 03, 2022 3:01 am

Hello!

I am extremely busy this semester, but I am still trying to make progress on my engine.

I spent nearly a month chasing a "bug" that mysteriously caused my engine to give away pieces in the endgame. Yesterday, I discovered that the Robust Child policy for final move selection was the culprit. When cornered, MCTS engines spend a lot of time thinking about their death. Robust Child causes a sort of panic effect, choosing bad moves out of fear. I switched to the Max Child policy, and the odd behavior disappeared. I think that Robust Child-- while great for games like Go-- is very bad for instant-death games like Chess.

I now have two questions that are weighing heavily on my mind...

I am using the Alpha-Zero version of PUCT, assigning priors to each edge based on an initial Alpha-Beta evaluation. The result is a deeper search, but with little-- if any-- increase in ELO. I assign each prior during expansion as v_i/v, where v_i is the reward of the current edge, and v is the sum of rewards over all edges. Does v_i/v make a valid prior? Is there a better way?

I am using 1.0 (unity) as my reward for checkmate. However, my engine has no desire to win once it has collected all of the enemy's pieces. It just moves around the board aimlessly until cutechess draws the game. Is there something wrong with the way I am evaluating checkmate? Is there a better way?

AAce3 · Post by **AAce3** » Thu Nov 03, 2022 3:08 am

There is a strategy for terminal node backup in MCTS, called "mcts-solver." Essentially, a node with terminal children can itself become terminal.

Relevant paper: https://dke.maastrichtuniversity.nl/m.w ... uctloa.pdf

RedBedHed · Post by **RedBedHed** » Thu Nov 03, 2022 3:14 am

AAce3 wrote: ↑Thu Nov 03, 2022 3:08 am There is a strategy for terminal node backup in MCTS, called "mcts-solver." Essentially, a node with terminal children can itself become terminal.

Relevant paper: https://dke.maastrichtuniversity.nl/m.w ... uctloa.pdf

Woah! Thank you so much for this! Very, very interesting.

AAce3 · Post by **AAce3** » Thu Nov 03, 2022 3:19 am

RedBedHed wrote: ↑Thu Nov 03, 2022 3:14 am
AAce3 wrote: ↑Thu Nov 03, 2022 3:08 am There is a strategy for terminal node backup in MCTS, called "mcts-solver." Essentially, a node with terminal children can itself become terminal.

Relevant paper: https://dke.maastrichtuniversity.nl/m.w ... uctloa.pdf
Woah! Thank you so much for this! Very, very interesting.

The paper is about lines of action, which I don't believe has draw states. But the generalization to chess is pretty straightforward, I think.

RedBedHed · Post by **RedBedHed** » Thu Nov 03, 2022 5:21 am

AAce3 wrote: ↑Thu Nov 03, 2022 3:19 am
RedBedHed wrote: ↑Thu Nov 03, 2022 3:14 am
AAce3 wrote: ↑Thu Nov 03, 2022 3:08 am There is a strategy for terminal node backup in MCTS, called "mcts-solver." Essentially, a node with terminal children can itself become terminal.

Relevant paper: https://dke.maastrichtuniversity.nl/m.w ... uctloa.pdf
Woah! Thank you so much for this! Very, very interesting.
The paper is about lines of action, which I don't believe has draw states. But the generalization to chess is pretty straightforward, I think.

I implemented the idea in this paper rather quickly, and I am extremely excited to tell you that it indeed works.

So excited-- in fact-- that I almost jumped up and started shouting in the middle of the library. Lol. What a stunning discovery these researchers made.

hgm · Post by **hgm** » Thu Nov 03, 2022 8:54 am

RedBedHed wrote: ↑Thu Nov 03, 2022 3:01 amHowever, my engine has no desire to win once it has collected all of the enemy's pieces. It just moves around the board aimlessly until cutechess draws the game. Is there something wrong with the way I am evaluating checkmate? Is there a better way?

This is a general problem in all engines, no matter what search algorithm they use. Alpha-beta engines would also move aimlessly once they get the checkmate within the horizon. This is why they evaluate a fast checkmate higher than a slow checkmate.

A similar problem occurs when the checkmate is still beyond the horizon, and the heuristic evaluation does not award progress in any way (e.g. by scoring positions with the bare King closer to the corner higher). In general, when you don't give the engine an incentive to make progress, it won't make progress. By the time the 50-move draw gets within the horizon, the checkmate will still be far beyond it (against an opponent who doesn know what he is doing, and has been able to maximize the DTM all the time without meeting any resistance), and the draw is unavoidable.

So in general this behavior means your evaluation sucks. You cannot expect to solve that through search.

JohnWoe · Post by **JohnWoe** » Thu Nov 03, 2022 2:21 pm

There's no need for bloated solutions like 100GB+ EGTB or NNUE to checkmate naked king. Just force the king into corner and find max 6 ply checkmate. Like 10 lines of code. In NN solutions I have found NN are just shuffling aimlessly when easily winning. I simply use HCE for that. Stockfish can force chekmate w/ its super deep/complex search and NN.

Joerg Oster · Post by **Joerg Oster** » Thu Nov 03, 2022 2:53 pm

AAce3 wrote: ↑Thu Nov 03, 2022 3:08 am There is a strategy for terminal node backup in MCTS, called "mcts-solver." Essentially, a node with terminal children can itself become terminal.

Relevant paper: https://dke.maastrichtuniversity.nl/m.w ... uctloa.pdf

Thank you!
Didn't know this one before.

AAce3 · Post by **AAce3** » Thu Nov 03, 2022 9:16 pm

hgm wrote: ↑Thu Nov 03, 2022 8:54 am
RedBedHed wrote: ↑Thu Nov 03, 2022 3:01 amHowever, my engine has no desire to win once it has collected all of the enemy's pieces. It just moves around the board aimlessly until cutechess draws the game. Is there something wrong with the way I am evaluating checkmate? Is there a better way?
This is a general problem in all engines, no matter what search algorithm they use. Alpha-beta engines would also move aimlessly once they get the checkmate within the horizon. This is why they evaluate a fast checkmate higher than a slow checkmate.

A similar problem occurs when the checkmate is still beyond the horizon, and the heuristic evaluation does not award progress in any way (e.g. by scoring positions with the bare King closer to the corner higher). In general, when you don't give the engine an incentive to make progress, it won't make progress. By the time the 50-move draw gets within the horizon, the checkmate will still be far beyond it (against an opponent who doesn know what he is doing, and has been able to maximize the DTM all the time without meeting any resistance), and the draw is unavoidable.

So in general this behavior means your evaluation sucks. You cannot expect to solve that through search.

Not necessarily. MCTS struggles a lot with mates (as well as heavily losing positions), because its scoring is based on averages. There was an interview with Komodo MCTS team where they said that they would switch to standard A/B komodo in winning endgames, because MCTS search had the tendency to aimlessly move around while not making any progress, as those actions still led to winning states.

hgm · Post by **hgm** » Fri Nov 04, 2022 6:59 am

Well, if there is one infinite score amongst the childs the average would also be infinite.

But I did not say that any kind of search flies. I just said that without proper evaluation even the best search could not solve this problem. Of course it even with perfect evaluation it is always possible to use a search that is so buggy or crappy that it spoils the deal.

Using averages in a game where only the best move matters is of course a major mistake. It is not intrinsic to MCTS; it is perfectly possible to do MCTS with more sensible score propagation. Like it would be possible to do alpha-beta search with a less sensible score propagation.

PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

Re: PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

Re: PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

Re: PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

Re: PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

Re: PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

Re: PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

Re: PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

Re: PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta

Re: PUCT and Mate Evaluation for Monte-Carlo Alpha-Beta