Nebiyu-MCTS vs TSCP 1-0

hgm · Post by **hgm** » Mon Dec 11, 2017 11:25 am

I guess it depends on whether you consider using the knowledge that a Queen is worth more than a Knight, so that you should try N x protected Q with high probablilty, and Q x protected N only with very low probability, 'cheating'. Or the fact that capturing unprotected pieces is a good deal.

A weakness of MCTS is that it considers every position of the tree in isolation. When it finally figures out it is good to capture PxQ in one position close to the root, it will in no way bias it towards PxQ in other positions deep in the tree. For that a neural network controlling the move selection would be helpful. This could learn very quickly from positions close to the root that grabbing unprotected material is good, and that with protection N x Q usually ends much better then Q x N. And then apply that kowledge to get more realistic playouts.

Milos · Post by **Milos** » Mon Dec 11, 2017 3:27 pm

Daniel Shawul wrote:That is correct, but I think I am suffering in tactics heavily with this approach too. The problem is that when you have a forced move (e.g. a recapture), and you sample the other moves, obviously wrong ones, moves often enough the real score gets smeared out. That was the problem in the playout part of the MCTS before. In Checkers, captures are forced automatically excluding the non-captures so there is no "smearing problem". So I am gonna have to use the shallow search to bias sampling of moves heavily especially when there is a forced line. I need an efficient search that outputs a score and winning probabilities for each move -- just like the NN -- and then i will use it at the leaves AND in the internal nodes

Try the following. At the leaf node you are expanding, do first just eval for all the moves, select a few best ones and then perform mpv search of a fixed depth with those including QS.
You convert scores for each move into probabilities using a logistic function.

Daniel Shawul · Post by **Daniel Shawul** » Mon Dec 11, 2017 6:54 pm

Milos wrote:
Daniel Shawul wrote:That is correct, but I think I am suffering in tactics heavily with this approach too. The problem is that when you have a forced move (e.g. a recapture), and you sample the other moves, obviously wrong ones, moves often enough the real score gets smeared out. That was the problem in the playout part of the MCTS before. In Checkers, captures are forced automatically excluding the non-captures so there is no "smearing problem". So I am gonna have to use the shallow search to bias sampling of moves heavily especially when there is a forced line. I need an efficient search that outputs a score and winning probabilities for each move -- just like the NN -- and then i will use it at the leaves AND in the internal nodes
Try the following. At the leaf node you are expanding, do first just eval for all the moves, select a few best ones and then perform mpv search of a fixed depth with those including QS.
You convert scores for each move into probabilities using a logistic function.

I had already something for reading tactics (using qsearch) when children nodes are first created. So what I do now is to do a shallow depth search (depth=4) instead and store that score for computing move probabilities later. I do not want to completely cut out inferior moves from consideration but bias the UCB formula so that a move that wins material is selected with very high probability. Logistic is what I use for converting the eval score to winning prob, but I think something more aggressive is need for the problem at hand, that is avoiding excessive exploration in forced lines. Reducing the exploration coefficient helps a little, so does sorting the moves it seems.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Thu Dec 14, 2017 5:40 pm

With very low exploration coefficient of 0.025 and just a qsearch() for eval it has become very strong in tactics. The tree very selective and grows rapidly reaching very high depths sometimes heating the MAX_PLY roof. I can see how this approach can be very competitive with a highly selective alpha-beta search.

The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.
.

Another game against TSCP

[pgn]
[Event "Computer Chess Game"]
[Site "daniel-Satellite-C855"]
[Date "2017.12.14"]
[Round "-"]
[White "NebiyuChess_1.44"]
[Black "tscp"]
[Result "1-0"]
[TimeControl "40/300"]
[Annotator "1. +0.15 2... +0.00"]

1. d4 {+0.15/17} e6 2. Be3 {+0.28/14 7} d5 {+0.00/6 10} 3. h4 {+0.24/14 7}
Nf6 {+0.32/6 10} 4. Nc3 {+0.22/13 7} Bb4 {+0.38/6 9} 5. Qd3 {+0.20/16 7}
Bxc3+ {+0.56/6 9} 6. bxc3 {+0.37/18 7} Nc6 {+0.88/6 9} 7. a4 {+0.35/14 7}
e5 {+1.19/6 8} 8. Rb1 {+0.23/17 8} Qd6 {+1.22/6 8} 9. Rb3 {+0.20/11 7} O-O
{+1.35/6 8} 10. f3 {+0.27/11 7} Na5 {+1.66/6 8} 11. Rb4 {+0.29/12 8} exd4
{+1.90/6 7} 12. Bxd4 {-0.69/39 7} Nc6 {+1.81/5 7} 13. Rb3 {+0.24/20 7} Nh5
{+1.97/4 7} 14. Nh3 {+0.55/15 7} Bxh3 {+1.95/4 7} 15. gxh3 {+0.44/20 7}
Rfb8 {+1.66/5 6} 16. Rg1 {+0.85/9 7} Qf4 {+1.74/5 6} 17. Rg4 {+1.16/28 7}
Qc1+ {+1.58/6 6} 18. Kf2 {+1.00/23 7} Nxd4 {+1.84/5 6} 19. Rxd4
{+0.95/26 7} Nf6 {+1.73/6 6} 20. Rdb4 {+0.89/33 7} b6 {+2.01/6 5} 21. e3
{+0.68/31 7} c5 {+2.90/6 5} 22. Rb1 {+0.00/7 7} Qa3 {+2.58/7 5} 23. R1b3
{+0.00/5 7} cxb4 {+2.44/7 5} 24. Rxa3 {+1.18/34 7} bxa3 {+2.29/7 5} 25. c4
{+1.16/25 7} dxc4 {+1.93/7 5} 26. Qxc4 {+1.14/27 7} Rd8 {+1.93/6 4} 27. Bd3
{+1.26/30 7} Rd5 {+1.89/6 4} 28. Qb3 {+1.19/38 7} Rh5 {+1.83/6 4} 29. Kg3
{+1.18/36 7} Rc8 {+1.40/6 4} 30. Qxa3 {+1.32/23 7} Rcc5 {+1.39/6 4} 31. f4
{+1.46/19 7} Kf8 {+1.45/6 4} 32. e4 {+1.40/34 7} Ke7 {+1.39/6 4} 33. Qb2
{+1.39/36 8} Rh6 {+1.49/5 3} 34. e5 {+1.57/39 8} Nd7 {+1.41/6 3} 35. Qd4
{+1.72/26 8} a6 {+1.42/4 3} 36. Bf5 {+1.58/31 8} Nf8 {+1.33/6 3} 37. Qd3
{+1.58/37 8} a5 {+1.25/6 3} 38. c4 {+1.60/35 6} Rhc6 {+1.38/6 3} 39. Be4
{+1.66/32 9} Rh6 {+1.29/6 2.9} 40. Qd4 {+1.65/44 10} Rc7 {+1.33/6 2.8} 41.
f5 {+1.58/18 7} Nd7 {+0.45/7 13} 42. Bd5 {+1.80/55 7} Nc5 {+0.09/7 12} 43.
e6 {+2.10/31 7} f6 {-0.26/7 12} 44. Qg4 {+3.93/32 7} Kf8 {-1.68/7 11} 45.
Qf4 {+4.79/37 7} Re7 {-3.46/7 11} 46. Qb8+ {+5.27/41 7} Re8 {-12.63/8 11}
47. Qxb6 {+5.45/45 7} Nxe6 {-4.52/7 10} 48. Bxe6 {+6.65/20 7} Rh5
{-5.14/7 10} 49. Qc7 {+7.68/28 7} Re7 {-4.50/8 10} 50. Qd6 {+7.76/26 7} Rh6
{-7.66/8 9} 51. Kg4 {+8.71/11 7} Rg6+ {-9.52/8 9} 52. fxg6 {+8.86/14 7}
hxg6 {-99.90/8 3} 53. Kf4 {+9.00/9 7} g5+ {-99.92/6 0.3} 54. hxg5
{+9.07/9 7} fxg5+ {-99.94/5 0.1} 55. Kf5 {+10.99/7 7} g4 {-99.94/6 0.2} 56.
hxg4 {+13.18/5 7} g5 {-99.96/5 0.1} 57. Kf6 {+16.90/3 7} Ke8 {-99.98/3 0.1}
58. Qxe7# {+20.13/1 7}
{Xboard adjudication: Checkmate} 1-0

[/pgn]

A nice side effect of this approach is you can extract a PV for evey root move

Daniel

trulses · Post by **trulses** » Thu Dec 14, 2017 9:11 pm

Hey Daniel, very interesting stuff.

Do you take advantage of transpositions in any way? For instance, by linking in a node that already has statistics if available.

Daniel Shawul wrote:The MCTS tree in xml format looks like this after e1xd4 which is played with high probability.
Code: Select all
...

Is there a handy tool you to use to view these XML trees graphically?

Michel · Post by **Michel** » Thu Dec 14, 2017 9:38 pm

With very low exploration coefficient of 0.025 and just a qsearch() for eval it has become very strong in tactics. The tree very selective and grows rapidly reaching very high depths sometimes heating the MAX_PLY roof. I can see how this approach can be very competitive with a highly selective alpha-beta search.

The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.

Once again very impressive! I look forward to the Scorpio implementation.

Daniel Shawul · Post by **Daniel Shawul** » Fri Dec 15, 2017 2:58 am

So far I haven't exploited transpositions yet -- my goal is to see if the MCTS is a competitive approach without optimization.

I use web browser to view the xml file.

Daniel

tmokonen · Post by **tmokonen** » Fri Dec 15, 2017 6:01 am

Have you tried MCTS with Nebiyu reversi, and if so how successful was it for you?

Daniel Shawul · Post by **Daniel Shawul** » Sat Dec 16, 2017 3:50 pm

Michel wrote:
With very low exploration coefficient of 0.025 and just a qsearch() for eval it has become very strong in tactics. The tree very selective and grows rapidly reaching very high depths sometimes heating the MAX_PLY roof. I can see how this approach can be very competitive with a highly selective alpha-beta search.

The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.

Once again very impressive! I look forward to the Scorpio implementation.

It turned out not to be as great as I hoped. It is definitely much better than TSCP now but its tactical blindness is there. Alpha's trick maybe that having a bunch of threads searching the MCTS tree with low exploration coefficient may eventually compensate for its tactical blindness. There maybe ways to emulate the LMR effect by careful management of the exploration coefficient using heurieustics, progressive widening (pruning) etc, but the plain MCTS as in alphazero's paper seems to be very week on one thread unless I am missing something.

In any case, I had two revelations that I missed before: not using rollouts, and that the NN does not need to resolve too much tactics -- using qsearch() seems to be better than other depths at the moment.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Sat Dec 16, 2017 3:52 pm

tmokonen wrote:Have you tried MCTS with Nebiyu reversi, and if so how successful was it for you?

Checkers was very good for sure and I think reversi was too last time I tried it.

Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0

Re: Nebiyu-MCTS vs TSCP 1-0