mvanthoor wrote: ↑Thu Nov 05, 2020 11:37 am
maksimKorzh wrote: ↑Tue Nov 03, 2020 12:21 am
Hi guys, after getting criticized for embedding sf nnue into my engine I decided to change the exploration direction
and now learning MCTS algorithm.
Cool
I hope you intend to make video's about it; I look forward to your implementations and explanations.
Even though I personally don't really like the coding style you use (to each their own; it's your code and you are the one who has to maintain or use it), I think your video's and explanations are very good. You don't explain any new stuff, but sometimes you explain things just a bit differently than the well-known sources. It has happened a few times that one of your video's switched my understanding of a topic from "I think I understand this... but I could be wrong, so I need to read more about it" to "I understand this, and it works like..."
MCTS is something I want to study as well, later on. Maybe make it an option. At some point I'll probably also look into NNUE, but only if I can write the code myself and train my own networks using my own engine.
All of that stuff will take some time yet
Even though I personally don't really like the coding style you use
Well)))) I prefer "not best practice style code" but that you can fully rely on in terms of understanding of how it works rather that "best practice/production/cool way how pros do" code that you can't figure out of what's going on there because code forces you to get clear with best practices first and only then hopefully with the subject itself (I always get stuck at first pays and usually never get to the subject - that's the reason for my videos to exist).
Cool
I hope you intend to make video's about it
Without an intent to make a video series I wouldn't even start that))) I would ever create BBC if it wasn't the matter of video series.
It has happened a few times that one of your video's switched my understanding of a topic from "I think I understand this... but I could be wrong, so I need to read more about it" to "I understand this, and it works like..."
now you know the reason why am I making these videos)
MCTS is something I want to study as well, later on. Maybe make it an option. At some point I'll probably also look into NNUE, but only if I can write the code myself and train my own networks using my own engine.
Well, first of all MCTS is just another search algorithm as it was already mentioned in this thread before. I'm going (already working!) to create 1000 times simplified Leela type engine. Note that even though NNUE is a neural net still it's completely different conceptually from Leela's net. According to my code monkey's understanding SF NNUE is a REGRESSION network which takes board position as input and gives an estimate evaluation value in centi pawns as the output (I name what regression is not because you don't know it but to settle the difference between classification and regression problems in my head - code monkeys need LOTS of repetitions to claim they understand something and start making use of it). On the other hand Leela type net has two outputs - one is logit probability for every move which affects the formula (UCT) balancing between exploration/exploitation and another is winning probabilty which is NOT in centi pawns but just a value of how likely the side is about to win in the current position. Another thing I've finally realized is that AlphaGo used 2 nets - one did output logit probabilities (to influaence the move ordering) and another did winning probability. AlphaZero and Leela already using 1 NN with 2 outputs... In my dumb implementation I would rather go for 2 nets because it's easier for me to understand how they operate.
And one last thing - NNUE can be used instead of eval NN in Leela type engines but I guess (someone please correct me if I'm wrong) it takes all the fun from the idea of reinforcement learning, I mean the idea behind Leela type of NN is that it improves through the self-play while NNUE according to CPW: "using a mixture of supervised and reinforcement learning methods".
Anyway, at the moment I've implemented tictactoe game with MCTS in python (and making a series on that as well btw) and the next step would be do create 2 NNs to use instead of random rollouts. So as far as I currently understand the idea - it's the matter of MCTS to generate training data (logit probabilities for move ordering and winning probability) so we can then compare them with actual node considerations during search along with actual winning results. As they claim in one of the online tutorials: "it's all we need to train our network(s)"