hgm wrote: ↑Thu Jul 28, 2022 10:22 pm
The idea of tuning a NN is rather simple, and doesn't really require any advanced math. You just determine the effect of a small change in each weight on the Mean Square Error of the output. Then you change the weights in proportion to how much effect they have, to make the LSE go in the desired direction.
This!
Some things are linked here that take 40 pages to describe how NNs work.
NN math and training fits on a half napkin.
dangi12012 wrote: ↑Thu Jul 28, 2022 11:30 pm
NN math and training fits on a half napkin.
Sure, but writing something with the least amount of words, and teaching someone the same thing are two very different things. Especially if you consider different starting points. Maybe someone has never thought about a multivariable gradient before. Additionally, it is often much better for truly understanding a concept (and finally building new things upon it) to comprehend how people thought of it/created it in the first place.
dangi12012 wrote: ↑Thu Jul 28, 2022 11:30 pm
Some things are linked here that take 40 pages to describe how NNs work.
NN math and training fits on a half napkin.
True, but the understanding and background knowledge that's required to properly understand said math doesn't really fit.
For instance, I encountered the mathematics behind neural networks a couple of years ago, and while it was all neat and self-contained, I didn't have the necessary background to understand the concepts involved. I had no clue how any of it worked. It wasn't until a couple of years later after completing my multivariable calculus course, that I could actually go back and look at the math and realize how all of the pieces of the puzzle fit together.
The idea of how to train a neural network only seems relatively straightforward now to me because of the many math courses I took the past several years before.
RedBedHed wrote: ↑Wed Jul 27, 2022 5:39 am
But I am a total novice when it comes to neural networks.
...
Does anyone out there know of resources that provide full technical explanations of the theory, algorithms, or data structures used for this kind of learning? Textbooks, papers, videos? Relevant types of math? Machine learning libraries that are compatible with a C++ engine? Experience with training on a budget?
I found the following series of YouTube videos very educational when I first started to learn how neural networks work:
- What is a gradient? Part 1 and part 2
- Playlist that introduces and explains neural networks, it is based on this free online book, which I can also recommend. I admittedly had to watch the videos multiple times and read the book closely before I fully grasped how everything worked. But I still think that this is probably one of the best ways to learn about neural networks.
This covers only supervised learning, but as far as I know, most other training methods are based on this and only have some way of generating the loss, which they calculate the gradient from, themselves.
If you don't already know the basics of calculus and linear algebra (or even if you do, but you want to refresh it with nice intuitive explanations) there are these YouTube playlists:
- Linear algebra
- Calculus
For C++ the go-to library is probably Pytorch, which is usually known as a neural network and tensor library for Python, but it is written in C++ and also has a nice and almost the same API for C++.
Of course, you could implement a small neural network library yourself, but unless you are satisfied with very simple networks (i.e. fully connected with simple activation functions), it will be a lot of work, and finding bugs can be very difficult, as neural networks often just perform less than optimal when you made a subtle coding mistake.
Thank you so much for these awesome resources! I'll go check them out!
dangi12012 wrote: ↑Thu Jul 28, 2022 11:30 pm
NN math and training fits on a half napkin.
Sure, but writing something with the least amount of words, and teaching someone the same thing are two very different things. Especially if you consider different starting points. Maybe someone has never thought about a multivariable gradient before. Additionally, it is often much better for truly understanding a concept (and finally building new things upon it) to comprehend how people thought of it/created it in the first place.
I agree. I first looked at Neural Networks about 3-4 years ago. But I was too afraid of the math to start playing around with them.
It wasn't until I took multivariable calculus (about 2 years ago) that the idea of minimizing loss with gradient descent actually clicked. But even after taking all of the intermediate math courses, I am still pretty scared to dive into neural networks.
I think that, at the end of the day, fear really is the biggest roadblock to understanding. And for me, studying and absorbing lots of little pieces of information about the bigger picture is how I keep from feeling overwhelmed when I see it all at once.
RedBedHed wrote: ↑Wed Jul 27, 2022 6:41 am
I probably should add that the engine currently uses only two threads: one for the search and one for garbage collection. I'd estimate that it expands the search tree at about 500,000-700,000 nodes per second with little-to-no search optimization at this point. In 20 seconds it allocs about 12-13,000,000 nodes, performing an evaluation with each allocation and searching as deep as 250 plies. The tree policy is pure UCT with 1.42 as the exploration constant
Below a 20 sec search of my basic UCT implementation in Stockfish.
No rollouts, NNUE eval only. No special move-ordering, no quiescence search. (C = 2.26)
Joerg Oster wrote: ↑Fri Jul 29, 2022 5:31 pm
Below a 20 sec search of my basic UCT implementation in Stockfish.
How did you implement UCB1?
Do you pick the highest scoring move or the most often visited node?
I'm using the standard UCB1 formula.
Currently I'm choosing the Robust child (most visited root move) as best move.
Although it is not clear to me if this is always the best choice for the game of chess.
Joerg Oster wrote: ↑Fri Jul 29, 2022 6:56 pm
Currently I'm choosing the Robust child (most visited root move) as best move.
Although it is not clear to me if this is always the best choice for the game of chess.
Yes! Thats was the intention behind my comment. The literature says that the most often visited node should be picked.
So here is one node with 50% winrate with 5001 visites and there another with 90% winrate with 2500 visits. MCTS will play the 50% winrate move.
I was asking because - do have the means to quickly test this with cutechess?
One engine build picks the robust child. One engine picks the highest winrate.
What is the stronger engine? And how big ist the difference?
If you need compute power for cutechess I can provide it.