MCTS implementation question

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

MOBMAT
Posts: 385
Joined: Sat Feb 04, 2017 11:57 pm
Location: USA

MCTS implementation question

Post by MOBMAT »

I'm working the code for an MCTS implementation.

I'm coding Tic-Tac-Toe as an easy GAME to work out the bugs.

I'm running into a situation where either:

1. The code is working fine and i'm just not sure what to do
2. I haven't implemented something correctly (usually this one! LOL)

The question has to do with "terminal nodes" and what to do with them when one is returned by the selection/traverse step.
here is my condensed main search loop:

Code: Select all

	while (AbortSearch() == false)
		{
		iLeaf = Traverse(0);	// (0) is the root
				
		if (IsNodeTerminal(iLeaf) == false) {
			Reward = Rollout(iLeaf);
			BackPropagate(iLeaf, Reward);
			}
		m_NumSimulations++;	// keep track of how many times we have looped
		}
Abort fails if we run out of nodes (i use a fixed array rather than dynamic allocation), or it runs out of time (it never when debugging i have it search forever)
Traverse (select) appears to be working correctly, and can return a terminal node.
when this occurs, a rollout isn't done (we have end of game) and my thinking was, there isn't anything to back propagate, so i don't call backpropate.

with Tic-Tac-Toe, especially after a few moves have been made, it doesn't take long for all the nodes to be terminal, and the loop just keeps spinning it's wheels. The number of simulations goes up but no new nodes are created.

I haven't seen a good write up of what to do in this situation, assuming my search loop code is correct.

it seems like a waste of time to simply keep looping till time runs out, but i see the same type of behavior with chess engines when they find mate. they just keep on searching forever.
i7-6700K @ 4.00Ghz 32Gb, Win 10 Home, EGTBs on PCI SSD
Benchmark: Stockfish15.1 NNUE x64 bmi2 (nps): 1277K
nionita
Posts: 175
Joined: Fri Oct 22, 2010 9:47 pm
Location: Austria

Re: MCTS implementation question

Post by nionita »

The search aim is to approximate the node values. But once you have terminal nodes, there is no need to approximate anymore, their value is exact. So those nodes should not be visited again. Instead their parents should be aware of such descendants, such that:
- if one descendant is a win, the parent is a win (parent must be treated as a leaf)
- if all descendant are loss, the parent is a loss (parent must be treated as a leaf)
- otherwise expand only the other descendants (normal MCTS node)

This logic should propagate up to the root.
dkappe
Posts: 1631
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: MCTS implementation question

Post by dkappe »

Have a look at leela_lite for a ~100 line implementation.

https://github.com/dkappe/leela_lite
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".