However, I still have some really basic questions that aren't covered in the github article:
I assume the positions used to train the network are "stable" and the best move isn't a capture that changes the material balance — is this correct?
How do you deal with positions where there is a forced tactical sequence e.g., a check or a fork and then winning a piece? Ignore? Only use positions where the PV has the same material balance at the end as at the beginning?
Is it best practice to train the network on win / draw / loss logistic function (i.e., +1 / 0 / -1), or on the (hand-tuned evaluation) score from a shallow search?
How many positions do you need in the training set to obtain a decent NN evaluation? Any rules of thumb will be helpful
From what I've tried, I think it is best to train the network on "stable" positions.
In many training implementations I've seen, you feed a shallow search cp score through a logistic function to get it between -1 and 1. Then, you train the network on a weighted sum between that and the win / draw / loss score. Usually, the scaled shallow search cp score is weighed more.
In order to obtain a decent NN evaluation, I think you need anywhere from several hundreds of millions to a few billion positions. However, this is dependent on network architecture. For instance, if you are training a small 2x768-128-1 network, you can train a decent net with a couple hundred million positions. But, if you want to index by king square or king bucket, you may need at least a billion.
Is it best practice to train the network on win / draw / loss logistic function (i.e., +1 / 0 / -1), or on the (hand-tuned evaluation) score from a shallow search?
The "lambda" parameter controls this. 0 means tune on results only, 1.0 means tune on eval only, intermediate values use a weighted average. Arasan's latest tuning run used the nodchip tuner with lambda=0.75. I believe recent Stockfish versions use lambda=1.0.
Steve Maughan wrote: ↑Wed Dec 28, 2022 5:25 pm
How many positions do you need in the training set to obtain a decent NN evaluation? Any rules of thumb will be helpful[/list]
Steve
alvinypeng wrote: ↑Wed Dec 28, 2022 11:28 pm
In order to obtain a decent NN evaluation, I think you need anywhere from several hundreds of millions to a few billion positions. However, this is dependent on network architecture. For instance, if you are training a small 2x768-128-1 network, you can train a decent net with a couple hundred million positions. But, if you want to index by king square or king bucket, you may need at least a billion.
For smaller nets (~768 inputs, no king-relative features) you'll need ~100M positions. In Caissa, I managed to get decent net with just 80M positions.
Witek wrote: ↑Thu Dec 29, 2022 12:45 am
For smaller nets (~768 inputs, no king-relative features) you'll need ~100M positions. In Caissa, I managed to get decent net with just 80M positions.
That's interesting. I had heard that you need billions of annotated positions and that scared me from looking into NNUE so far.
So this questions goes to all the NNUE developers: Are there any other sources/books/tutorials you can recommend? How did you get started? Did you use proven NN-architectures, weights, tuners, datasets first? Or everything from scratch?
Minimal Chess (simple, open source, C#) - Youtube & Github Leorik (competitive, in active development, C#) - Github & Lichess
Witek wrote: ↑Thu Dec 29, 2022 12:45 am
For smaller nets (~768 inputs, no king-relative features) you'll need ~100M positions. In Caissa, I managed to get decent net with just 80M positions.
That's interesting. I had heard that you need billions of annotated positions and that scared me from looking into NNUE so far.
So this questions goes to all the NNUE developers: Are there any other sources/books/tutorials you can recommend? How did you get started? Did you use proven NN-architectures, weights, tuners, datasets first? Or everything from scratch?
HalfKA nets such as those found in Stockfish have 32 times the inputs as nets without king-relative features. So if a net with no king-relative features requires ~100m positions, a rough estimate of how many positions a HalfKA net requires is 32 * 100m, which is a few billion.
I think nnue-pytorch docs goes into plenty of detail already. There's also a section in Neural Networks for Chess that talks a little bit about NNUE, though I haven't read it.
After I understood how NNUE worked, I was able to write a simple training pipeline in tensorflow. Using a machine learning framework is a lot easier than having to write training code from complete scratch.