I just implemented the alpha-beta algorithm as a rollouts (recasting it into a monte-carlo tree search algorithm) according to this paper.
The implementation in Scorpio is in this commit.
It does iterative deepening of the alpha-beta rollouts. The node selection is done greedily (no exploration) so the the left-most sibling is finished before the other siblings are explored. This helps to establish a proper alpha-beta bound before exploring other siblings.
I think it has a great potential in so many aspects:
a) Alpha-Beta is now very easy to parallelize since it is recast in MCTS form which itself is easy to parallelize. The parallel version maybe equivalent to ABDADA or the lazy schemes.
b) Now MCTS in chess with alpha-beta rollouts becomes stronger in tactics since it can solve shallow traps
c) I think the move ordering in alpha-beta rollouts is superior to standard alpha-beta implemenation because all previous results at a given node are stored in the tree.
d) We can easily do LMR/extensions here though i haven't tried it
e) Mixing alpha-beta and rollouts. I can easily do a scheme where 50% of the time is used for alpha-beta (tactics), and 50% is used for MCTS (strategy).
Probably so many other things i forgot about. I am very very excited about it to be honest.
Daniel
Alpha-Beta as a rollouts algorithm
Moderators: hgm, Rebel, chrisw
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
-
- Posts: 892
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: Alpha-Beta as a rollouts algorithm
Please can you explain what is rollout algorithm in simple words?
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Alpha-Beta as a rollouts algorithm
A rollouts algorithm searches one path from root to a single leaf node in a one episode (iteration). That is in contrast to, for example, a standard iterative-deepening alpha-beta framework where all the leaf nodes are examined in each iteration. In pictures it would be:
A rollout
A standard tree
In the rollouts-version of alpha-beta you have to save the alpha-beta bounds in the tree structure just like the ones saved in hash-tables for the standard alpha-beta to be used for pruning later.
I just implemented rollouts alpha-beta with LMR (no researches) and a mixed alpha-beta rollouts and MCTS. The result on the start position is this:
The algorithm switched to MCTS midway through (i.e. at depth 8) -- suddenly jumping to depth 16 and then reaching extreme depths. This highlights the strategic nature of MCTS.
regards,
Daniel
A rollout
Code: Select all
root
|
|
|
|
|
|
leaf
Code: Select all
root
|
----------------------------------------------------------
| | | | | |
-------------------------------------
.... | | |
I just implemented rollouts alpha-beta with LMR (no researches) and a mixed alpha-beta rollouts and MCTS. The result on the start position is this:
Code: Select all
[st = 10000ms, mt = 10000ms , hply = 0]
4 41 1 3490 d2-d4 d7-d5 Ng1-f3 Ng8-f6
5 35 5 21177 Ng1-f3 Ng8-f6 d2-d3 d7-d5 Nb1-c3
6 31 22 69910 e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6
7 30 50 134000 e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4
7 30 88 260837 e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4
8 30 90 263000 e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
8 29 142 355000 e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
8 28 194 470000 e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
8 28 244 594000 e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
8 27 296 720000 e2-e4 e7-e5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 d2-d4 e5xd4
[b]Switching rollout types to MCTS.[/b]
16 27 376 976000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 Ke1-g1 Bf8-c5 Nc3-d5 Ke8-g8 c2-c3 d7-d6 Nd5xf6 Qd8xf6
19 26 438 1171000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 Ke1-g1 Bf8-c5 Nc3-d5 Ke8-g8 c2-c3 d7-d6 b2-b4 Bc5-a7 Bc1-b2 Bc8-e6 Nd5xf6
19 26 494 1326000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-c4 Bf8-c5 d2-d3 Ke8-g8 Ke1-g1 d7-d6 Bc1-g5 h7-h6 Bg5xf6 Qd8xf6 Nc3-d5 Qf6-d8 Nd5-b6
24 26 559 1482000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-c4 Bf8-c5 d2-d3 Ke8-g8 Ke1-g1 d7-d6 Bc1-e3 Bc5xe3 f2xe3 Nf6-g4 Qd1-d2 Bc8-e6 Bc4-b5 Be6-d7 h2-h3 Nc6-b4 Bb5xd7 Qd8xd7
21 26 609 1598000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-c4 Bf8-c5 d2-d3 Ke8-g8 Ke1-g1 d7-d6 Bc1-e3 Bc5xe3 f2xe3 Bc8-g4 a2-a3 Qd8-d7 d3-d4 Bg4xf3 Rf1xf3
12 26 671 1712000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-b5 Bf8-d6 Ke1-g1 a7-a6 Bb5xc6 d7xc6
26 25 744 1846000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-c4 Bf8-c5 d2-d3 Ke8-g8 Ke1-g1 d7-d6 Bc1-e3 Bc5xe3 f2xe3 Bc8-g4 a2-a3 Qd8-d7 h2-h3 Bg4-h5 g2-g4 Bh5-g6 Bc4-b5 h7-h6 Bb5xc6 Qd7xc6
26 25 799 1941000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 Ke1-g1 Bf8-e7 b2-b3 Ke8-g8 Bc1-b2 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Qd8-d7 h2-h3 Ra8-c8 Rf1-e1 Qd7-e8 Nd5xe7 Qe8xe7
25 25 856 2054000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 h2-h3 Bf8-c5 b2-b4 Bc5-e7 Bc1-b2 Ke8-g8 Ke1-g1 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Rf8-e8 Qd1-b3 Nc6-d4 Bb2xd4
27 25 933 2205000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 h2-h3 Bf8-c5 b2-b4 Bc5-e7 Bc1-b2 Ke8-g8 Ke1-g1 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Qd8-d7 Qd1-b3 Ra8-d8 Bd3-c2 Rd8-a8 Nd5xe7
27 25 1104 2490000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 Ke1-g1 Bf8-e7 b2-b3 Ke8-g8 Bc1-b2 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Qd8-d7 h2-h3 Ra8-c8 Rf1-e1 Be7-d8 Qd1-c2 Rf8-e8 Ra1-d1
27 25 1104 2490000 e2-e4 e7-e5 Ng1-f3 Nb8-c6 Nb1-c3 Ng8-f6 Bf1-d3 a7-a6 a2-a3 h7-h6 Ke1-g1 Bf8-e7 b2-b3 Ke8-g8 Bc1-b2 d7-d6 Nc3-d5 Bc8-e6 c2-c4 Qd8-d7 h2-h3 Ra8-c8 Rf1-e1 Be7-d8 Qd1-c2 Rf8-e8 Ra1-d1
nodes = 2338816 depth = 0/30 time = 11044ms pps = 225461 visits = 2490000 search calls = 2337981
move e2e4
Bye Bye
regards,
Daniel
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
alpha-beta + Nullmove + LMR in rollouts
Nullmove can be implemented also in the rollouts algorithm. All I did is to insert a null move in the tree stored in memory and make sure the selection algorithm tries it first when conditions are met. In other cases the nullmove is practically turned off.
Unfortunately nullmove doesn't seem to improve performance from the alpha-beta+LMR rollouts version. It is most likely there is a bug in nullmove code though. One thing that is unique in this method is that a nullmove can pop up in the PV. This because the search is plain alpha-beta, not negascout or PVS that may need researches, and there is no node type classification. I could have simply turned off nullmove in PV nodes. Note that it is not possible to do researches in the rollouts within the tree (no researches for failed zero-window search, and for that matter no researchs for reduced LMR).
I am going to try next MTDf to gain something from zero-window searches.
With this new "MCTS" algorithm, I think it is very much possible to get an engine as strong as the standard alpha-beta implementation.
Daniel
Unfortunately nullmove doesn't seem to improve performance from the alpha-beta+LMR rollouts version. It is most likely there is a bug in nullmove code though. One thing that is unique in this method is that a nullmove can pop up in the PV. This because the search is plain alpha-beta, not negascout or PVS that may need researches, and there is no node type classification. I could have simply turned off nullmove in PV nodes. Note that it is not possible to do researches in the rollouts within the tree (no researches for failed zero-window search, and for that matter no researchs for reduced LMR).
I am going to try next MTDf to gain something from zero-window searches.
With this new "MCTS" algorithm, I think it is very much possible to get an engine as strong as the standard alpha-beta implementation.
Daniel
-
- Posts: 12540
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: alpha-beta + Nullmove + LMR in rollouts
Maybe it will scale will with very high core counts.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: alpha-beta + Nullmove + LMR in rollouts
It definately scales well with high core counts since it is an MCTS algorithm (see my number 1 points in the OP). This is the best parallel version alpha-beta could ever be IMO, and it doesn't come at the cost of doing a minmax (which the standard MCTS search converges too) instead of alpha-beta.
The selection algorithm has no exploration (which means the left-most sibling is searched fully) before others are searched, so this gives you the YBW algorithm since all processors take the left-most branch. If I let it explore other siblings before the left-most is finished, then it becomes a speculative YBW (and you incur some overhead due to bounds not be as tight as could be). Each rollout is completely independent with no synhronization so it is highly parallel.
Daniel
The selection algorithm has no exploration (which means the left-most sibling is searched fully) before others are searched, so this gives you the YBW algorithm since all processors take the left-most branch. If I let it explore other siblings before the left-most is finished, then it becomes a speculative YBW (and you incur some overhead due to bounds not be as tight as could be). Each rollout is completely independent with no synhronization so it is highly parallel.
Daniel
-
- Posts: 358
- Joined: Thu Jan 22, 2015 3:21 pm
- Location: Zurich, Switzerland
- Full name: Jonathan Rosenthal
Re: alpha-beta + Nullmove + LMR in rollouts
Could you comment on the memory overhead? I can see how a rollout based search could work out fairly well in the case of an engine using a neural net to evaluate positions, but I would imagine saving the tree gets expensive quickly with the fast algorithms that are standard in modern engines.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: alpha-beta + Nullmove + LMR in rollouts
With long time controls you could run out memory for sure. MCTS was the default algorithm in Go even before the NN era so growing the tree in-memory posed no practical problems. I think some programs pruned the tree when running out of memory for storing nodes; some opt for increasing the visits threshold for node expansion. The latter can not be used in alpha-beta rollouts or alphazero MCTS because each leaf node is visited exactly once. Also note that quiescene search nodes are not stored in memory which often account for 75% of the total nodes. So a simple solution is to use a larger depth search(2) instead of qsearch(0) to evaluate the leaves thereby increasing the part of the tree that is not stored in memory.jorose wrote:Could you comment on the memory overhead? I can see how a rollout based search could work out fairly well in the case of an engine using a neural net to evaluate positions, but I would imagine saving the tree gets expensive quickly with the fast algorithms that are standard in modern engines.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Alpha-Beta as a rollouts algorithm
I have added all sorts of pruning heuristics, aspiration window, move sorting etc, and it has become sort of competitive with standard scorpio itself not crappy tscp anymore
Some notes:
a) The move ordering of nodes is done after every iterative deepening stage for all nodes. It gives ranks to all nodes in the tree based on the score in the previous stage. This is a much better move ordering strategy than history,see,mvv heurieustic junk. In this new approach, the nodes can get equal ranks if their scores are the same. You could get for instance, a ranking of 1,2,3,3,3,6,7 for 7 nodes, which I think is better than 1,2,3,4,5,6,7.
The LMR is done based on this ranking computed after every ID step and I think it should be superior than the standard approach.
b) A misconception I had before of how the search proceeds is that the leftmost sibling is fully searched before others. That is not the case. The rollouts search proceeds in a best first manner, i.e. whichever sibling node currently has the best score is allocated more rollouts. So if a good looking move suddenly starts to drop its score, another sibling with a higher score is given the chance. This is better than standard alpha-beta search where the leftmost sibling has to be finished anyway.
In other words, the advantage you get from standard parallel alpha-beta search in the form of super-linear speedup with inferior move ordering, you get it even in the sequential search with the new approach.
c) Before I was averaging scores even for alpha-beta rollouts. That means since the nodes keep the score from one ID iteration to other, the scores from different ID depths were averaged. This is the approach in standard MCTS, but I have now made it so that the nodes keep the latest score from the current iteration and assume all previous visits to the node gave this new score. This has made alpha-beta rollouts to give exactly same as score as the standard alpha-beta implementation -- in which scores swing more rapidly because scores are not averaged.
c) The nullmove implementation had a bug before which i fixed. Now it gives a good performance boost on top of LMR
d) Aspiration windows help a lot. The rollouts are still alpha-beta not scout (PVS) but we don't really need that.
I tell you this thing is a gold mine.
Daniel
Some notes:
a) The move ordering of nodes is done after every iterative deepening stage for all nodes. It gives ranks to all nodes in the tree based on the score in the previous stage. This is a much better move ordering strategy than history,see,mvv heurieustic junk. In this new approach, the nodes can get equal ranks if their scores are the same. You could get for instance, a ranking of 1,2,3,3,3,6,7 for 7 nodes, which I think is better than 1,2,3,4,5,6,7.
The LMR is done based on this ranking computed after every ID step and I think it should be superior than the standard approach.
b) A misconception I had before of how the search proceeds is that the leftmost sibling is fully searched before others. That is not the case. The rollouts search proceeds in a best first manner, i.e. whichever sibling node currently has the best score is allocated more rollouts. So if a good looking move suddenly starts to drop its score, another sibling with a higher score is given the chance. This is better than standard alpha-beta search where the leftmost sibling has to be finished anyway.
In other words, the advantage you get from standard parallel alpha-beta search in the form of super-linear speedup with inferior move ordering, you get it even in the sequential search with the new approach.
c) Before I was averaging scores even for alpha-beta rollouts. That means since the nodes keep the score from one ID iteration to other, the scores from different ID depths were averaged. This is the approach in standard MCTS, but I have now made it so that the nodes keep the latest score from the current iteration and assume all previous visits to the node gave this new score. This has made alpha-beta rollouts to give exactly same as score as the standard alpha-beta implementation -- in which scores swing more rapidly because scores are not averaged.
c) The nullmove implementation had a bug before which i fixed. Now it gives a good performance boost on top of LMR
d) Aspiration windows help a lot. The rollouts are still alpha-beta not scout (PVS) but we don't really need that.
I tell you this thing is a gold mine.
Daniel
-
- Posts: 4606
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Alpha-Beta as a rollouts algorithm
Thanks for the further reports on your progress and ideas.Daniel Shawul wrote:I have added all sorts of pruning heuristics, aspiration window, move sorting etc, and it has become sort of competitive with standard scorpio itself not crappy tscp anymore :)
Some notes:
a) The move ordering of nodes is done after every iterative deepening stage for all nodes. It gives ranks to all nodes in the tree based on the score in the previous stage. This is a much better move ordering strategy than history,see,mvv heurieustic junk. In this new approach, the nodes can get equal ranks if their scores are the same. You could get for instance, a ranking of 1,2,3,3,3,6,7 for 7 nodes, which I think is better than 1,2,3,4,5,6,7.
The LMR is done based on this ranking computed after every ID step and I think it should be superior than the standard approach.
b) A misconception I had before of how the search proceeds is that the leftmost sibling is fully searched before others. That is not the case. The rollouts search proceeds in a best first manner, i.e. whichever sibling node currently has the best score is allocated more rollouts. So if a good looking move suddenly starts to drop its score, another sibling with a higher score is given the chance. This is better than standard alpha-beta search where the leftmost sibling has to be finished anyway.
In other words, the advantage you get from standard parallel alpha-beta search in the form of super-linear speedup with inferior move ordering, you get it even in the sequential search with the new approach.
c) Before I was averaging scores even for alpha-beta rollouts. That means since the nodes keep the score from one ID iteration to other, the scores from different ID depths were averaged. This is the approach in standard MCTS, but I have now made it so that the nodes keep the latest score from the current iteration and assume all previous visits to the node gave this new score. This has made alpha-beta rollouts to give exactly same as score as the standard alpha-beta implementation -- in which scores swing more rapidly because scores are not averaged.
c) The nullmove implementation had a bug before which i fixed. Now it gives a good performance boost on top of LMR
d) Aspiration windows help a lot. The rollouts are still alpha-beta not scout (PVS) but we don't really need that.
I tell you this thing is a gold mine.
Daniel
I appreciate them.
Guenther