I guess it depends on whether you consider using the knowledge that a Queen is worth more than a Knight, so that you should try N x protected Q with high probablilty, and Q x protected N only with very low probability, 'cheating'. Or the fact that capturing unprotected pieces is a good deal.
A weakness of MCTS is that it considers every position of the tree in isolation. When it finally figures out it is good to capture PxQ in one position close to the root, it will in no way bias it towards PxQ in other positions deep in the tree. For that a neural network controlling the move selection would be helpful. This could learn very quickly from positions close to the root that grabbing unprotected material is good, and that with protection N x Q usually ends much better then Q x N. And then apply that kowledge to get more realistic playouts.
Nebiyu-MCTS vs TSCP 1-0
Moderators: hgm, Rebel, chrisw
-
- Posts: 27790
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Nebiyu-MCTS vs TSCP 1-0
Try the following. At the leaf node you are expanding, do first just eval for all the moves, select a few best ones and then perform mpv search of a fixed depth with those including QS.Daniel Shawul wrote:That is correct, but I think I am suffering in tactics heavily with this approach too. The problem is that when you have a forced move (e.g. a recapture), and you sample the other moves, obviously wrong ones, moves often enough the real score gets smeared out. That was the problem in the playout part of the MCTS before. In Checkers, captures are forced automatically excluding the non-captures so there is no "smearing problem". So I am gonna have to use the shallow search to bias sampling of moves heavily especially when there is a forced line. I need an efficient search that outputs a score and winning probabilities for each move -- just like the NN -- and then i will use it at the leaves AND in the internal nodes
You convert scores for each move into probabilities using a logistic function.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Nebiyu-MCTS vs TSCP 1-0
I had already something for reading tactics (using qsearch) when children nodes are first created. So what I do now is to do a shallow depth search (depth=4) instead and store that score for computing move probabilities later. I do not want to completely cut out inferior moves from consideration but bias the UCB formula so that a move that wins material is selected with very high probability. Logistic is what I use for converting the eval score to winning prob, but I think something more aggressive is need for the problem at hand, that is avoiding excessive exploration in forced lines. Reducing the exploration coefficient helps a little, so does sorting the moves it seems.Milos wrote:Try the following. At the leaf node you are expanding, do first just eval for all the moves, select a few best ones and then perform mpv search of a fixed depth with those including QS.Daniel Shawul wrote:That is correct, but I think I am suffering in tactics heavily with this approach too. The problem is that when you have a forced move (e.g. a recapture), and you sample the other moves, obviously wrong ones, moves often enough the real score gets smeared out. That was the problem in the playout part of the MCTS before. In Checkers, captures are forced automatically excluding the non-captures so there is no "smearing problem". So I am gonna have to use the shallow search to bias sampling of moves heavily especially when there is a forced line. I need an efficient search that outputs a score and winning probabilities for each move -- just like the NN -- and then i will use it at the leaves AND in the internal nodes
You convert scores for each move into probabilities using a logistic function.
Daniel
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Nebiyu-MCTS vs TSCP 1-0
With very low exploration coefficient of 0.025 and just a qsearch() for eval it has become very strong in tactics. The tree very selective and grows rapidly reaching very high depths sometimes heating the MAX_PLY roof. I can see how this approach can be very competitive with a highly selective alpha-beta search.
The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.
.
Another game against TSCP
[pgn]
[Event "Computer Chess Game"]
[Site "daniel-Satellite-C855"]
[Date "2017.12.14"]
[Round "-"]
[White "NebiyuChess_1.44"]
[Black "tscp"]
[Result "1-0"]
[TimeControl "40/300"]
[Annotator "1. +0.15 2... +0.00"]
1. d4 {+0.15/17} e6 2. Be3 {+0.28/14 7} d5 {+0.00/6 10} 3. h4 {+0.24/14 7}
Nf6 {+0.32/6 10} 4. Nc3 {+0.22/13 7} Bb4 {+0.38/6 9} 5. Qd3 {+0.20/16 7}
Bxc3+ {+0.56/6 9} 6. bxc3 {+0.37/18 7} Nc6 {+0.88/6 9} 7. a4 {+0.35/14 7}
e5 {+1.19/6 8} 8. Rb1 {+0.23/17 8} Qd6 {+1.22/6 8} 9. Rb3 {+0.20/11 7} O-O
{+1.35/6 8} 10. f3 {+0.27/11 7} Na5 {+1.66/6 8} 11. Rb4 {+0.29/12 8} exd4
{+1.90/6 7} 12. Bxd4 {-0.69/39 7} Nc6 {+1.81/5 7} 13. Rb3 {+0.24/20 7} Nh5
{+1.97/4 7} 14. Nh3 {+0.55/15 7} Bxh3 {+1.95/4 7} 15. gxh3 {+0.44/20 7}
Rfb8 {+1.66/5 6} 16. Rg1 {+0.85/9 7} Qf4 {+1.74/5 6} 17. Rg4 {+1.16/28 7}
Qc1+ {+1.58/6 6} 18. Kf2 {+1.00/23 7} Nxd4 {+1.84/5 6} 19. Rxd4
{+0.95/26 7} Nf6 {+1.73/6 6} 20. Rdb4 {+0.89/33 7} b6 {+2.01/6 5} 21. e3
{+0.68/31 7} c5 {+2.90/6 5} 22. Rb1 {+0.00/7 7} Qa3 {+2.58/7 5} 23. R1b3
{+0.00/5 7} cxb4 {+2.44/7 5} 24. Rxa3 {+1.18/34 7} bxa3 {+2.29/7 5} 25. c4
{+1.16/25 7} dxc4 {+1.93/7 5} 26. Qxc4 {+1.14/27 7} Rd8 {+1.93/6 4} 27. Bd3
{+1.26/30 7} Rd5 {+1.89/6 4} 28. Qb3 {+1.19/38 7} Rh5 {+1.83/6 4} 29. Kg3
{+1.18/36 7} Rc8 {+1.40/6 4} 30. Qxa3 {+1.32/23 7} Rcc5 {+1.39/6 4} 31. f4
{+1.46/19 7} Kf8 {+1.45/6 4} 32. e4 {+1.40/34 7} Ke7 {+1.39/6 4} 33. Qb2
{+1.39/36 8} Rh6 {+1.49/5 3} 34. e5 {+1.57/39 8} Nd7 {+1.41/6 3} 35. Qd4
{+1.72/26 8} a6 {+1.42/4 3} 36. Bf5 {+1.58/31 8} Nf8 {+1.33/6 3} 37. Qd3
{+1.58/37 8} a5 {+1.25/6 3} 38. c4 {+1.60/35 6} Rhc6 {+1.38/6 3} 39. Be4
{+1.66/32 9} Rh6 {+1.29/6 2.9} 40. Qd4 {+1.65/44 10} Rc7 {+1.33/6 2.8} 41.
f5 {+1.58/18 7} Nd7 {+0.45/7 13} 42. Bd5 {+1.80/55 7} Nc5 {+0.09/7 12} 43.
e6 {+2.10/31 7} f6 {-0.26/7 12} 44. Qg4 {+3.93/32 7} Kf8 {-1.68/7 11} 45.
Qf4 {+4.79/37 7} Re7 {-3.46/7 11} 46. Qb8+ {+5.27/41 7} Re8 {-12.63/8 11}
47. Qxb6 {+5.45/45 7} Nxe6 {-4.52/7 10} 48. Bxe6 {+6.65/20 7} Rh5
{-5.14/7 10} 49. Qc7 {+7.68/28 7} Re7 {-4.50/8 10} 50. Qd6 {+7.76/26 7} Rh6
{-7.66/8 9} 51. Kg4 {+8.71/11 7} Rg6+ {-9.52/8 9} 52. fxg6 {+8.86/14 7}
hxg6 {-99.90/8 3} 53. Kf4 {+9.00/9 7} g5+ {-99.92/6 0.3} 54. hxg5
{+9.07/9 7} fxg5+ {-99.94/5 0.1} 55. Kf5 {+10.99/7 7} g4 {-99.94/6 0.2} 56.
hxg4 {+13.18/5 7} g5 {-99.96/5 0.1} 57. Kf6 {+16.90/3 7} Ke8 {-99.98/3 0.1}
58. Qxe7# {+20.13/1 7}
{Xboard adjudication: Checkmate} 1-0
[/pgn]
A nice side effect of this approach is you can extract a PV for evey root move
Daniel
The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.
.
Another game against TSCP
[pgn]
[Event "Computer Chess Game"]
[Site "daniel-Satellite-C855"]
[Date "2017.12.14"]
[Round "-"]
[White "NebiyuChess_1.44"]
[Black "tscp"]
[Result "1-0"]
[TimeControl "40/300"]
[Annotator "1. +0.15 2... +0.00"]
1. d4 {+0.15/17} e6 2. Be3 {+0.28/14 7} d5 {+0.00/6 10} 3. h4 {+0.24/14 7}
Nf6 {+0.32/6 10} 4. Nc3 {+0.22/13 7} Bb4 {+0.38/6 9} 5. Qd3 {+0.20/16 7}
Bxc3+ {+0.56/6 9} 6. bxc3 {+0.37/18 7} Nc6 {+0.88/6 9} 7. a4 {+0.35/14 7}
e5 {+1.19/6 8} 8. Rb1 {+0.23/17 8} Qd6 {+1.22/6 8} 9. Rb3 {+0.20/11 7} O-O
{+1.35/6 8} 10. f3 {+0.27/11 7} Na5 {+1.66/6 8} 11. Rb4 {+0.29/12 8} exd4
{+1.90/6 7} 12. Bxd4 {-0.69/39 7} Nc6 {+1.81/5 7} 13. Rb3 {+0.24/20 7} Nh5
{+1.97/4 7} 14. Nh3 {+0.55/15 7} Bxh3 {+1.95/4 7} 15. gxh3 {+0.44/20 7}
Rfb8 {+1.66/5 6} 16. Rg1 {+0.85/9 7} Qf4 {+1.74/5 6} 17. Rg4 {+1.16/28 7}
Qc1+ {+1.58/6 6} 18. Kf2 {+1.00/23 7} Nxd4 {+1.84/5 6} 19. Rxd4
{+0.95/26 7} Nf6 {+1.73/6 6} 20. Rdb4 {+0.89/33 7} b6 {+2.01/6 5} 21. e3
{+0.68/31 7} c5 {+2.90/6 5} 22. Rb1 {+0.00/7 7} Qa3 {+2.58/7 5} 23. R1b3
{+0.00/5 7} cxb4 {+2.44/7 5} 24. Rxa3 {+1.18/34 7} bxa3 {+2.29/7 5} 25. c4
{+1.16/25 7} dxc4 {+1.93/7 5} 26. Qxc4 {+1.14/27 7} Rd8 {+1.93/6 4} 27. Bd3
{+1.26/30 7} Rd5 {+1.89/6 4} 28. Qb3 {+1.19/38 7} Rh5 {+1.83/6 4} 29. Kg3
{+1.18/36 7} Rc8 {+1.40/6 4} 30. Qxa3 {+1.32/23 7} Rcc5 {+1.39/6 4} 31. f4
{+1.46/19 7} Kf8 {+1.45/6 4} 32. e4 {+1.40/34 7} Ke7 {+1.39/6 4} 33. Qb2
{+1.39/36 8} Rh6 {+1.49/5 3} 34. e5 {+1.57/39 8} Nd7 {+1.41/6 3} 35. Qd4
{+1.72/26 8} a6 {+1.42/4 3} 36. Bf5 {+1.58/31 8} Nf8 {+1.33/6 3} 37. Qd3
{+1.58/37 8} a5 {+1.25/6 3} 38. c4 {+1.60/35 6} Rhc6 {+1.38/6 3} 39. Be4
{+1.66/32 9} Rh6 {+1.29/6 2.9} 40. Qd4 {+1.65/44 10} Rc7 {+1.33/6 2.8} 41.
f5 {+1.58/18 7} Nd7 {+0.45/7 13} 42. Bd5 {+1.80/55 7} Nc5 {+0.09/7 12} 43.
e6 {+2.10/31 7} f6 {-0.26/7 12} 44. Qg4 {+3.93/32 7} Kf8 {-1.68/7 11} 45.
Qf4 {+4.79/37 7} Re7 {-3.46/7 11} 46. Qb8+ {+5.27/41 7} Re8 {-12.63/8 11}
47. Qxb6 {+5.45/45 7} Nxe6 {-4.52/7 10} 48. Bxe6 {+6.65/20 7} Rh5
{-5.14/7 10} 49. Qc7 {+7.68/28 7} Re7 {-4.50/8 10} 50. Qd6 {+7.76/26 7} Rh6
{-7.66/8 9} 51. Kg4 {+8.71/11 7} Rg6+ {-9.52/8 9} 52. fxg6 {+8.86/14 7}
hxg6 {-99.90/8 3} 53. Kf4 {+9.00/9 7} g5+ {-99.92/6 0.3} 54. hxg5
{+9.07/9 7} fxg5+ {-99.94/5 0.1} 55. Kf5 {+10.99/7 7} g4 {-99.94/6 0.2} 56.
hxg4 {+13.18/5 7} g5 {-99.96/5 0.1} 57. Kf6 {+16.90/3 7} Ke8 {-99.98/3 0.1}
58. Qxe7# {+20.13/1 7}
{Xboard adjudication: Checkmate} 1-0
[/pgn]
A nice side effect of this approach is you can extract a PV for evey root move
Daniel
-
- Posts: 39
- Joined: Wed Dec 06, 2017 5:34 pm
Re: Nebiyu-MCTS vs TSCP 1-0
Hey Daniel, very interesting stuff.
Do you take advantage of transpositions in any way? For instance, by linking in a node that already has statistics if available.
Do you take advantage of transpositions in any way? For instance, by linking in a node that already has statistics if available.
Is there a handy tool you to use to view these XML trees graphically?Daniel Shawul wrote:The MCTS tree in xml format looks like this after e1xd4 which is played with high probability.
Code: Select all
...
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Nebiyu-MCTS vs TSCP 1-0
With very low exploration coefficient of 0.025 and just a qsearch() for eval it has become very strong in tactics. The tree very selective and grows rapidly reaching very high depths sometimes heating the MAX_PLY roof. I can see how this approach can be very competitive with a highly selective alpha-beta search.
The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.
Once again very impressive! I look forward to the Scorpio implementation.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Nebiyu-MCTS vs TSCP 1-0
So far I haven't exploited transpositions yet -- my goal is to see if the MCTS is a competitive approach without optimization.
I use web browser to view the xml file.
Daniel
I use web browser to view the xml file.
Daniel
-
- Posts: 1296
- Joined: Sun Mar 12, 2006 6:46 pm
- Location: Kelowna
- Full name: Tony Mokonen
Re: Nebiyu-MCTS vs TSCP 1-0
Have you tried MCTS with Nebiyu reversi, and if so how successful was it for you?
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Nebiyu-MCTS vs TSCP 1-0
It turned out not to be as great as I hoped. It is definitely much better than TSCP now but its tactical blindness is there. Alpha's trick maybe that having a bunch of threads searching the MCTS tree with low exploration coefficient may eventually compensate for its tactical blindness. There maybe ways to emulate the LMR effect by careful management of the exploration coefficient using heurieustics, progressive widening (pruning) etc, but the plain MCTS as in alphazero's paper seems to be very week on one thread unless I am missing something.Michel wrote:With very low exploration coefficient of 0.025 and just a qsearch() for eval it has become very strong in tactics. The tree very selective and grows rapidly reaching very high depths sometimes heating the MAX_PLY roof. I can see how this approach can be very competitive with a highly selective alpha-beta search.
The games are not that good due to the lack of eval of nebiyu, so I am going to try and implement it in scorpio.
Once again very impressive! I look forward to the Scorpio implementation.
In any case, I had two revelations that I missed before: not using rollouts, and that the NN does not need to resolve too much tactics -- using qsearch() seems to be better than other depths at the moment.
Daniel
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Nebiyu-MCTS vs TSCP 1-0
Checkers was very good for sure and I think reversi was too last time I tried it.tmokonen wrote:Have you tried MCTS with Nebiyu reversi, and if so how successful was it for you?