MCTS implementation question

MOBMAT · Post by **MOBMAT** » Fri Feb 22, 2019 9:45 pm

I'm working the code for an MCTS implementation.

I'm coding Tic-Tac-Toe as an easy GAME to work out the bugs.

I'm running into a situation where either:

1. The code is working fine and i'm just not sure what to do
2. I haven't implemented something correctly (usually this one! LOL)

The question has to do with "terminal nodes" and what to do with them when one is returned by the selection/traverse step.
here is my condensed main search loop:

Code: Select all

	while (AbortSearch() == false)
		{
		iLeaf = Traverse(0);	// (0) is the root
				
		if (IsNodeTerminal(iLeaf) == false) {
			Reward = Rollout(iLeaf);
			BackPropagate(iLeaf, Reward);
			}
		m_NumSimulations++;	// keep track of how many times we have looped
		}

Abort fails if we run out of nodes (i use a fixed array rather than dynamic allocation), or it runs out of time (it never when debugging i have it search forever)
Traverse (select) appears to be working correctly, and can return a terminal node.
when this occurs, a rollout isn't done (we have end of game) and my thinking was, there isn't anything to back propagate, so i don't call backpropate.

with Tic-Tac-Toe, especially after a few moves have been made, it doesn't take long for all the nodes to be terminal, and the loop just keeps spinning it's wheels. The number of simulations goes up but no new nodes are created.

I haven't seen a good write up of what to do in this situation, assuming my search loop code is correct.

it seems like a waste of time to simply keep looping till time runs out, but i see the same type of behavior with chess engines when they find mate. they just keep on searching forever.

nionita · Post by **nionita** » Sat Feb 23, 2019 8:05 am

The search aim is to approximate the node values. But once you have terminal nodes, there is no need to approximate anymore, their value is exact. So those nodes should not be visited again. Instead their parents should be aware of such descendants, such that:
- if one descendant is a win, the parent is a win (parent must be treated as a leaf)
- if all descendant are loss, the parent is a loss (parent must be treated as a leaf)
- otherwise expand only the other descendants (normal MCTS node)

This logic should propagate up to the root.

dkappe · Post by **dkappe** » Thu Feb 28, 2019 8:53 pm

Have a look at leela_lite for a ~100 line implementation.

https://github.com/dkappe/leela_lite

MCTS implementation question

MCTS implementation question

Re: MCTS implementation question

Re: MCTS implementation question