MCTS with minmax backup operator?

Daniel Shawul · Post by **Daniel Shawul** » Mon May 07, 2018 7:22 am

Michel wrote: ↑Sun May 06, 2018 3:57 pm
The policy network is a static rule (unlike an actual shallow depth search). If it is mostly good, then it won't give enough visits for trap moves that look bad initially but turn out to be good on exhaustive search. If it is bad, there is no hope.
You keep repeating this but it does not have to be true. Personally I would guess that the task of the policy network is to select _potentially_ good moves.

I will repeat it until I see any kind of evidence with regard to tactics and NN. It should be very easy to grasp that any kind of static policy is going to fail compared to a dynamic policy that actually searchs the current board. If the policy NN was close to perfect, then there wouldn't be a point to searching. I am pretty sure by now the policy NN of LCzero has been good enough for selecting "potentially good" moves, but what good has it done it so far interms of tactics? Absolutely nothing. Remi also seems to have this gross misunderstanding of the power of the policy NN for chess. It is not even as good as deph=1 stockfish search or even a SEE level tactics. Do you really think with that kind of policy NN you are going to guess even a level-3 trap ?
So far the evidence with regard to tactics + NN is completely absent and I am supposed to believe in it? So far the only explanation left for me is the use of a powerful accelerator that gave A0 such a huge advantage (4-TPUs to get 80 kN/s and minimize the tactical blunders).

If the NN were to provide a score interval instead of a single value then in the spirit of UCT the policy ordering should be according to the upper bound on the score whereas for the actual evaluation one would use the middle of the interval.

Note this is just a theory. I have not verified if it corresponds in any way to reality.

This is going to annoy you more but I am going to repeat: Whether it is a good/bad policy NN, it is going to fail. It better be good, but even in that case it is going to fail because traps are very common in chess.

Daniel

Michel · Post by **Michel** » Mon May 07, 2018 11:43 am

Daniel wrote:This is going to annoy you more but I am going to repeat: Whether it is a good/bad policy NN, it is going to fail. It better be good, but even in that case it is going to fail because traps are very common in chess.

It doesn't annoy me

I just find your argument not as compelling as you do. But time will tell if lczero will really hit a wall concerning tactics or not. For that it is important for the main lczero project to continue its present course. Otherwise we will never know.

Daniel Shawul · Post by **Daniel Shawul** » Mon May 07, 2018 6:24 pm

Michel wrote: ↑Mon May 07, 2018 11:43 am
Daniel wrote:This is going to annoy you more but I am going to repeat: Whether it is a good/bad policy NN, it is going to fail. It better be good, but even in that case it is going to fail because traps are very common in chess.
It doesn't annoy me I just find your argument not as compelling as you do. But time will tell if lczero will really hit a wall concerning tactics or not. For that it is important for the main lczero project to continue its present course. Otherwise we will never know.

I want the lczero project to continue its course too so that we can find out how the result is achieved. It is still at 2100-ish elo on single core
but scales better with hadware. On the other hand, the best version of scorpio-mcts maybe be 2700 elo CCRL (Graham is testing it right now) on 1 core.
The lczero project is highly dependent on a very high-end hardware being used. If given a 180x hardware advantage interms of TFLOPs, any tactical
defincency could maybe overcome, especially when combined with the powerful tool of cherry-picking

:) It is unbelivable to me that an engine which still falls for 2-level ply tactics didn't let stockfish win a game in a 100 game match.

AlvaroBegue · Post by **AlvaroBegue** » Tue May 08, 2018 9:49 pm

Daniel Shawul wrote: ↑Tue May 01, 2018 10:32 pm I keep actual scores in centipawns but that is irrelvant to the minmax/averaging stuff.

Sorry, I haven't followed the whole thread, but this statement makes no sense to me. Since the conversion from "expected result" space (i.e., [-1,1]) to "score in centipawns" is not linear, you can't expect averaging will give identical results.

I can build an example to make this more clear. But perhaps I didn't understand something.

Daniel Shawul · Post by **Daniel Shawul** » Tue May 08, 2018 10:02 pm

AlvaroBegue wrote: ↑Tue May 08, 2018 9:49 pm
Daniel Shawul wrote: ↑Tue May 01, 2018 10:32 pm I keep actual scores in centipawns but that is irrelvant to the minmax/averaging stuff.
Sorry, I haven't followed the whole thread, but this statement makes no sense to me. Since the conversion from "expected result" space (i.e., [-1,1]) to "score in centipawns" is not linear, you can't expect averaging will give identical results.

I can build an example to make this more clear. But perhaps I didn't understand something.

I actually do the score/winning percentage conversions using logistic and logit functions before averaging, it is just that I store the centipawns score in the Node data structure -- which is irrelevant as long as I do the averaging with winning percentage instead of centi-pawns score. Btw conversion is not necessary for mcts-min and alpha-beta rollouts which is why I decided to store the actual centipawns score.
Daniel

MCTS with minmax backup operator?

Re: MCTS with minmax backup operator?

Re: MCTS with minmax backup operator?

Re: MCTS with minmax backup operator?

Re: MCTS with minmax backup operator?

Re: MCTS with minmax backup operator?