Daniel Shawul wrote:That is correct, but I think I am suffering in tactics heavily with this approach too. The problem is that when you have a forced move (e.g. a recapture), and you sample the other moves, obviously wrong ones, moves often enough the real score gets smeared out. That was the problem in the playout part of the MCTS before. In Checkers, captures are forced automatically excluding the non-captures so there is no "smearing problem". So I am gonna have to use the shallow search to bias sampling of moves heavily especially when there is a forced line. I need an efficient search that outputs a score and winning probabilities for each move -- just like the NN -- and then i will use it at the leaves AND in the internal nodes
Try the following. At the leaf node you are expanding, do first just eval for all the moves, select a few best ones and then perform mpv search of a fixed depth with those including QS.
You convert scores for each move into probabilities using a logistic function.
Daniel Shawul wrote:That is correct, but I think I am suffering in tactics heavily with this approach too. The problem is that when you have a forced move (e.g. a recapture), and you sample the other moves, obviously wrong ones, moves often enough the real score gets smeared out. That was the problem in the playout part of the MCTS before. In Checkers, captures are forced automatically excluding the non-captures so there is no "smearing problem". So I am gonna have to use the shallow search to bias sampling of moves heavily especially when there is a forced line. I need an efficient search that outputs a score and winning probabilities for each move -- just like the NN -- and then i will use it at the leaves AND in the internal nodes
Try the following. At the leaf node you are expanding, do first just eval for all the moves, select a few best ones and then perform mpv search of a fixed depth with those including QS.
You convert scores for each move into probabilities using a logistic function.
I had already something for reading tactics (using qsearch) when children nodes are first created. So what I do now is to do a shallow depth search (depth=4) instead and store that score for computing move probabilities later. I do not want to completely cut out inferior moves from consideration but bias the UCB formula so that a move that wins material is selected with very high probability. Logistic is what I use for converting the eval score to winning prob, but I think something more aggressive is need for the problem at hand, that is avoiding excessive exploration in forced lines. Reducing the exploration coefficient helps a little, so does sorting the moves it seems.
With very low exploration coefficient of 0.025 and just a qsearch() for eval it has become very strong in tactics. The tree very selective and grows rapidly reaching very high depths sometimes heating the MAX_PLY roof. I can see how this approach can be very competitive with a highly selective alpha-beta search.
The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.
.
With very low exploration coefficient of 0.025 and just a qsearch() for eval it has become very strong in tactics. The tree very selective and grows rapidly reaching very high depths sometimes heating the MAX_PLY roof. I can see how this approach can be very competitive with a highly selective alpha-beta search.
The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.
Once again very impressive! I look forward to the Scorpio implementation.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
With very low exploration coefficient of 0.025 and just a qsearch() for eval it has become very strong in tactics. The tree very selective and grows rapidly reaching very high depths sometimes heating the MAX_PLY roof. I can see how this approach can be very competitive with a highly selective alpha-beta search.
The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.
Once again very impressive! I look forward to the Scorpio implementation.
It turned out not to be as great as I hoped. It is definitely much better than TSCP now but its tactical blindness is there. Alpha's trick maybe that having a bunch of threads searching the MCTS tree with low exploration coefficient may eventually compensate for its tactical blindness. There maybe ways to emulate the LMR effect by careful management of the exploration coefficient using heurieustics, progressive widening (pruning) etc, but the plain MCTS as in alphazero's paper seems to be very week on one thread unless I am missing something.
In any case, I had two revelations that I missed before: not using rollouts, and that the NN does not need to resolve too much tactics -- using qsearch() seems to be better than other depths at the moment.
It turned out not to be as great as I hoped. It is definitely much better than TSCP now but its tactical blindness is there. Alpha's trick maybe that having a bunch of threads searching the MCTS tree with low exploration coefficient may eventually compensate for its tactical blindness. There maybe ways to emulate the LMR effect by careful management of the exploration coefficient using heurieustics, progressive widening (pruning) etc, but the plain MCTS as in alphazero's paper seems to be very week on one thread unless I am missing something.
I must confess these things are unclear in my mind. What would be the difference between A/B and UCT with low exploration? I guess A/B searches (in principle) all moves in ALL nodes and UCT has no such concept. But at low depth the difference should not be very big. Or am I misunderstanding this?
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.