Re: Why there is no interest in Computer with odds Vs Humans match?
Posted: Tue Aug 06, 2019 6:07 am
What we use is closer to what AlphaZero and Lc0 use, but we have changed it substantially. I think that they would all behave similarly in terms of "bluffing". But the degree of "bluffing" will vary from one implementation to another. I think that Lc0 will probably "bluff" to a greater degree when losing than Komodo MCTS; Komodo MCTS won't go quite as "crazy" in a lost position. A way to test this bluffing would be to have Komodo and Komodo MCTS each play a series of handicap games against the same top GM or GMs with the same handicaps to see which would do better, but it would be rather expensive to play a lot of serious games with each engine. I suppose I could run the same experiment against some weaker engine (or even Komodo time-handicapped), but it will only tell us if the bluffing strategy works against engines.jp wrote: ↑Tue Aug 06, 2019 5:25 amWhen I posted before, I was assuming that the number of visits would work out in a way that would avoid bluffing. Later, I thought different chess engines may be doing MCTS differently, so there might be differences in bluffing.lkaufman wrote: ↑Fri Aug 02, 2019 3:58 pmYes. Of course the devil is in the details, but it basically chooses its move by a weighted average of all plausible replies, so if one is bad for the engine but the other 29 are good for the engine, it might take the chance, depending on how "bad" and "good" and the number of visits. Remember, engines don't evaluate lines as won or drawn, but just as a win prob. (or centipawn score, which are convertible into each other). So it's not a simple question.
Does Komodo MCTS use UCT? I'm not sure from what Mark wrote a few months ago.
mjlef wrote: ↑Wed May 08, 2019 2:24 pmFor MCTS you need to keep a tree in memory. In Komodo MCTS, a node in the tree consists of a visit count, sim of win probabilities, a policy value, a like to the next sibling, and a link to a child node. The key is determining which node to visit. So you start at the root node (the present position) and look at all its children, apply a formula like UCT1 to decide which node to select. The you look at the selected nodes children again using UCT1, and do on until you hit one of these:
c. 50 move draw
d. rep draw
e. leaf node
A leaf node has no children yet, so if that is selected, you generate its children. Lco/leela/lphaZero all use a NN to generate the win prob at this point, and fill in policy network information in the children. Then you back up the win probability, remembering to flip it as you go up the tree so it is from the point of view of that side. If you use a win prob in the range of 0.0..1.0 you just return 1.0 - winprob as you go up the tree to thr root. You increase the count and add with winprob to the nodes's winprob sum.
I see in that thread reference to the CP wiki, which says
UCT ... deals with the flaw of Monte-Carlo Tree Search, when a program may favor a losing move with only one or a few forced refutations, but due to the vast majority of other moves provides a better random playout score than other, better moves.,
Xj is the win ratio of the child
n is the number of times the parent has been visited
nj is the number of times the child has been visited
C is a constant to adjust the amount of exploration and incorporates the sqrt(2) from the UCB1 formula.
The first component of the UCB1 formula above corresponds to exploitation, as it is high for moves with high average win ratio. The second component corresponds to exploration, since it is high for moves with few simulations.