Alpha-Beta as a rollouts algorithm

Daniel Shawul · Post by **Daniel Shawul** » Thu Jan 25, 2018 11:59 pm

I just implemented the alpha-beta algorithm as a rollouts (recasting it into a monte-carlo tree search algorithm) according to this paper.

The implementation in Scorpio is in this commit.

It does iterative deepening of the alpha-beta rollouts. The node selection is done greedily (no exploration) so the the left-most sibling is finished before the other siblings are explored. This helps to establish a proper alpha-beta bound before exploring other siblings.

I think it has a great potential in so many aspects:

a) Alpha-Beta is now very easy to parallelize since it is recast in MCTS form which itself is easy to parallelize. The parallel version maybe equivalent to ABDADA or the lazy schemes.

b) Now MCTS in chess with alpha-beta rollouts becomes stronger in tactics since it can solve shallow traps

c) I think the move ordering in alpha-beta rollouts is superior to standard alpha-beta implemenation because all previous results at a given node are stored in the tree.

d) We can easily do LMR/extensions here though i haven't tried it

e) Mixing alpha-beta and rollouts. I can easily do a scheme where 50% of the time is used for alpha-beta (tactics), and 50% is used for MCTS (strategy).

Probably so many other things i forgot about. I am very very excited about it to be honest.

Daniel

Aleks Peshkov · Post by **Aleks Peshkov** » Fri Jan 26, 2018 8:27 am

Please can you explain what is rollout algorithm in simple words?

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 26, 2018 12:03 pm

A rollouts algorithm searches one path from root to a single leaf node in a one episode (iteration). That is in contrast to, for example, a standard iterative-deepening alpha-beta framework where all the leaf nodes are examined in each iteration. In pictures it would be:

A rollout

Code: Select all

root
|
|
|
|
|
|
leaf

A standard tree

Code: Select all

                                root
                                   |
----------------------------------------------------------
|           |            |                   |            |            |
                               -------------------------------------
 ....                          |             |             |

In the rollouts-version of alpha-beta you have to save the alpha-beta bounds in the tree structure just like the ones saved in hash-tables for the standard alpha-beta to be used for pruning later.

I just implemented rollouts alpha-beta with LMR (no researches) and a mixed alpha-beta rollouts and MCTS. The result on the start position is this:

Code: Select all

&#91;st = 10000ms, mt = 10000ms , hply = 0&#93;
4 41 1 3490  d2-d4 d7-d5 Ng1-f3 Ng8-f6
5 35 5 21177  Ng1-f3 Ng8-f6 d2-d3 d7-d5 Nb1-c3
6 31 22 69910  e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6
7 30 50 134000  e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4
7 30 88 260837  e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4
8 30 90 263000  e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
8 29 142 355000  e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
8 28 194 470000  e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
8 28 244 594000  e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
8 27 296 720000  e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
&#91;b&#93;Switching rollout types to MCTS.&#91;/b&#93;
16 27 376 976000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 Ke1-g1 Bf8-c5 Nc3-d5 Ke8-g8 c2-c3 d7-d6 Nd5xf6 Qd8xf6
19 26 438 1171000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 Ke1-g1 Bf8-c5 Nc3-d5 Ke8-g8 c2-c3 d7-d6 b2-b4 Bc5-a7 Bc1-b2 Bc8-e6 Nd5xf6
19 26 494 1326000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-c4 Bf8-c5 d2-d3 Ke8-g8 Ke1-g1 d7-d6 Bc1-g5 h7-h6 Bg5xf6 Qd8xf6 Nc3-d5 Qf6-d8 Nd5-b6
24 26 559 1482000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-c4 Bf8-c5 d2-d3 Ke8-g8 Ke1-g1 d7-d6 Bc1-e3 Bc5xe3 f2xe3 Nf6-g4 Qd1-d2 Bc8-e6 Bc4-b5 Be6-d7 h2-h3 Nc6-b4 Bb5xd7 Qd8xd7
21 26 609 1598000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-c4 Bf8-c5 d2-d3 Ke8-g8 Ke1-g1 d7-d6 Bc1-e3 Bc5xe3 f2xe3 Bc8-g4 a2-a3 Qd8-d7 d3-d4 Bg4xf3 Rf1xf3
12 26 671 1712000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-b5 Bf8-d6 Ke1-g1 a7-a6 Bb5xc6 d7xc6
26 25 744 1846000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-c4 Bf8-c5 d2-d3 Ke8-g8 Ke1-g1 d7-d6 Bc1-e3 Bc5xe3 f2xe3 Bc8-g4 a2-a3 Qd8-d7 h2-h3 Bg4-h5 g2-g4 Bh5-g6 Bc4-b5 h7-h6 Bb5xc6 Qd7xc6
26 25 799 1941000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 Ke1-g1 Bf8-e7 b2-b3 Ke8-g8 Bc1-b2 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Qd8-d7 h2-h3 Ra8-c8 Rf1-e1 Qd7-e8 Nd5xe7 Qe8xe7
25 25 856 2054000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 h2-h3 Bf8-c5 b2-b4 Bc5-e7 Bc1-b2 Ke8-g8 Ke1-g1 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Rf8-e8 Qd1-b3 Nc6-d4 Bb2xd4
27 25 933 2205000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 h2-h3 Bf8-c5 b2-b4 Bc5-e7 Bc1-b2 Ke8-g8 Ke1-g1 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Qd8-d7 Qd1-b3 Ra8-d8 Bd3-c2 Rd8-a8 Nd5xe7
27 25 1104 2490000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 Ke1-g1 Bf8-e7 b2-b3 Ke8-g8 Bc1-b2 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Qd8-d7 h2-h3 Ra8-c8 Rf1-e1 Be7-d8 Qd1-c2 Rf8-e8 Ra1-d1
27 25 1104 2490000  e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 Ke1-g1 Bf8-e7 b2-b3 Ke8-g8 Bc1-b2 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Qd8-d7 h2-h3 Ra8-c8 Rf1-e1 Be7-d8 Qd1-c2 Rf8-e8 Ra1-d1
nodes = 2338816 depth = 0/30 time = 11044ms pps = 225461 visits = 2490000 search calls = 2337981
move e2e4
Bye Bye

The algorithm switched to MCTS midway through (i.e. at depth 8) -- suddenly jumping to depth 16 and then reaching extreme depths. This highlights the strategic nature of MCTS.

regards,
Daniel

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 26, 2018 7:44 pm

Nullmove can be implemented also in the rollouts algorithm. All I did is to insert a null move in the tree stored in memory and make sure the selection algorithm tries it first when conditions are met. In other cases the nullmove is practically turned off.

Unfortunately nullmove doesn't seem to improve performance from the alpha-beta+LMR rollouts version. It is most likely there is a bug in nullmove code though. One thing that is unique in this method is that a nullmove can pop up in the PV. This because the search is plain alpha-beta, not negascout or PVS that may need researches, and there is no node type classification. I could have simply turned off nullmove in PV nodes. Note that it is not possible to do researches in the rollouts within the tree (no researches for failed zero-window search, and for that matter no researchs for reduced LMR).

I am going to try next MTDf to gain something from zero-window searches.

With this new "MCTS" algorithm, I think it is very much possible to get an engine as strong as the standard alpha-beta implementation.

Daniel

Dann Corbit · Post by **Dann Corbit** » Fri Jan 26, 2018 8:00 pm

Maybe it will scale will with very high core counts.

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 26, 2018 9:14 pm

It definately scales well with high core counts since it is an MCTS algorithm (see my number 1 points in the OP). This is the best parallel version alpha-beta could ever be IMO, and it doesn't come at the cost of doing a minmax (which the standard MCTS search converges too) instead of alpha-beta.

The selection algorithm has no exploration (which means the left-most sibling is searched fully) before others are searched, so this gives you the YBW algorithm since all processors take the left-most branch. If I let it explore other siblings before the left-most is finished, then it becomes a speculative YBW (and you incur some overhead due to bounds not be as tight as could be). Each rollout is completely independent with no synhronization so it is highly parallel.

Daniel

jorose · Post by **jorose** » Sat Jan 27, 2018 1:28 am

Could you comment on the memory overhead? I can see how a rollout based search could work out fairly well in the case of an engine using a neural net to evaluate positions, but I would imagine saving the tree gets expensive quickly with the fast algorithms that are standard in modern engines.

Daniel Shawul · Post by **Daniel Shawul** » Sat Jan 27, 2018 7:12 pm

jorose wrote:Could you comment on the memory overhead? I can see how a rollout based search could work out fairly well in the case of an engine using a neural net to evaluate positions, but I would imagine saving the tree gets expensive quickly with the fast algorithms that are standard in modern engines.

With long time controls you could run out memory for sure. MCTS was the default algorithm in Go even before the NN era so growing the tree in-memory posed no practical problems. I think some programs pruned the tree when running out of memory for storing nodes; some opt for increasing the visits threshold for node expansion. The latter can not be used in alpha-beta rollouts or alphazero MCTS because each leaf node is visited exactly once. Also note that quiescene search nodes are not stored in memory which often account for 75% of the total nodes. So a simple solution is to use a larger depth search(2) instead of qsearch(0) to evaluate the leaves thereby increasing the part of the tree that is not stored in memory.

Daniel Shawul · Post by **Daniel Shawul** » Sun Jan 28, 2018 3:32 pm

I have added all sorts of pruning heuristics, aspiration window, move sorting etc, and it has become sort of competitive with standard scorpio itself not crappy tscp anymore

Some notes:

a) The move ordering of nodes is done after every iterative deepening stage for all nodes. It gives ranks to all nodes in the tree based on the score in the previous stage. This is a much better move ordering strategy than history,see,mvv heurieustic junk. In this new approach, the nodes can get equal ranks if their scores are the same. You could get for instance, a ranking of 1,2,3,3,3,6,7 for 7 nodes, which I think is better than 1,2,3,4,5,6,7.

The LMR is done based on this ranking computed after every ID step and I think it should be superior than the standard approach.

b) A misconception I had before of how the search proceeds is that the leftmost sibling is fully searched before others. That is not the case. The rollouts search proceeds in a best first manner, i.e. whichever sibling node currently has the best score is allocated more rollouts. So if a good looking move suddenly starts to drop its score, another sibling with a higher score is given the chance. This is better than standard alpha-beta search where the leftmost sibling has to be finished anyway.

In other words, the advantage you get from standard parallel alpha-beta search in the form of super-linear speedup with inferior move ordering, you get it even in the sequential search with the new approach.

c) Before I was averaging scores even for alpha-beta rollouts. That means since the nodes keep the score from one ID iteration to other, the scores from different ID depths were averaged. This is the approach in standard MCTS, but I have now made it so that the nodes keep the latest score from the current iteration and assume all previous visits to the node gave this new score. This has made alpha-beta rollouts to give exactly same as score as the standard alpha-beta implementation -- in which scores swing more rapidly because scores are not averaged.

c) The nullmove implementation had a bug before which i fixed. Now it gives a good performance boost on top of LMR

d) Aspiration windows help a lot. The rollouts are still alpha-beta not scout (PVS) but we don't really need that.

I tell you this thing is a gold mine.

Daniel

Guenther · Post by **Guenther** » Sun Jan 28, 2018 3:42 pm

Daniel Shawul wrote:I have added all sorts of pruning heuristics, aspiration window, move sorting etc, and it has become sort of competitive with standard scorpio itself not crappy tscp anymore :)

Some notes:

a) The move ordering of nodes is done after every iterative deepening stage for all nodes. It gives ranks to all nodes in the tree based on the score in the previous stage. This is a much better move ordering strategy than history,see,mvv heurieustic junk. In this new approach, the nodes can get equal ranks if their scores are the same. You could get for instance, a ranking of 1,2,3,3,3,6,7 for 7 nodes, which I think is better than 1,2,3,4,5,6,7.

The LMR is done based on this ranking computed after every ID step and I think it should be superior than the standard approach.

b) A misconception I had before of how the search proceeds is that the leftmost sibling is fully searched before others. That is not the case. The rollouts search proceeds in a best first manner, i.e. whichever sibling node currently has the best score is allocated more rollouts. So if a good looking move suddenly starts to drop its score, another sibling with a higher score is given the chance. This is better than standard alpha-beta search where the leftmost sibling has to be finished anyway.

In other words, the advantage you get from standard parallel alpha-beta search in the form of super-linear speedup with inferior move ordering, you get it even in the sequential search with the new approach.

c) Before I was averaging scores even for alpha-beta rollouts. That means since the nodes keep the score from one ID iteration to other, the scores from different ID depths were averaged. This is the approach in standard MCTS, but I have now made it so that the nodes keep the latest score from the current iteration and assume all previous visits to the node gave this new score. This has made alpha-beta rollouts to give exactly same as score as the standard alpha-beta implementation -- in which scores swing more rapidly because scores are not averaged.

c) The nullmove implementation had a bug before which i fixed. Now it gives a good performance boost on top of LMR

d) Aspiration windows help a lot. The rollouts are still alpha-beta not scout (PVS) but we don't really need that.

I tell you this thing is a gold mine.

Daniel

Thanks for the further reports on your progress and ideas.
I appreciate them.

Guenther

Alpha-Beta as a rollouts algorithm

Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm

alpha-beta + Nullmove + LMR in rollouts

Re: alpha-beta + Nullmove + LMR in rollouts

Re: alpha-beta + Nullmove + LMR in rollouts

Re: alpha-beta + Nullmove + LMR in rollouts

Re: alpha-beta + Nullmove + LMR in rollouts

Re: Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm