NNUE — Newbie Questions...

Steve Maughan · Post by **Steve Maughan** » Wed Dec 28, 2022 5:25 pm

I'm thinking of adding NNUE to Maverick (as well as a whole new rewrite). I found the following resource incredibly helpful:

https://github.com/glinscott/nnue-pytor ... cs/nnue.md

However, I still have some really basic questions that aren't covered in the github article:

I assume the positions used to train the network are "stable" and the best move isn't a capture that changes the material balance — is this correct?

How do you deal with positions where there is a forced tactical sequence e.g., a check or a fork and then winning a piece? Ignore? Only use positions where the PV has the same material balance at the end as at the beginning?

Is it best practice to train the network on win / draw / loss logistic function (i.e., +1 / 0 / -1), or on the (hand-tuned evaluation) score from a shallow search?

How many positions do you need in the training set to obtain a decent NN evaluation? Any rules of thumb will be helpful

All help appreciated — thanks!

Steve

alvinypeng · Post by **alvinypeng** » Wed Dec 28, 2022 11:28 pm

From what I've tried, I think it is best to train the network on "stable" positions.

In many training implementations I've seen, you feed a shallow search cp score through a logistic function to get it between -1 and 1. Then, you train the network on a weighted sum between that and the win / draw / loss score. Usually, the scaled shallow search cp score is weighed more.

In order to obtain a decent NN evaluation, I think you need anywhere from several hundreds of millions to a few billion positions. However, this is dependent on network architecture. For instance, if you are training a small 2x768-128-1 network, you can train a decent net with a couple hundred million positions. But, if you want to index by king square or king bucket, you may need at least a billion.

Steve Maughan · Post by **Steve Maughan** » Wed Dec 28, 2022 11:43 pm

Thanks Alvin — this is helpful!

Steve

jdart · Post by **jdart** » Thu Dec 29, 2022 12:05 am

I assume the positions used to train the network are "stable" and the best move isn't a capture that changes the material balance — is this correct?

The --smart-fen-skipping option to the trainer skips moves that are captures and moves in which the King is in check. This is recommended.

jdart · Post by **jdart** » Thu Dec 29, 2022 12:10 am

Is it best practice to train the network on win / draw / loss logistic function (i.e., +1 / 0 / -1), or on the (hand-tuned evaluation) score from a shallow search?

The "lambda" parameter controls this. 0 means tune on results only, 1.0 means tune on eval only, intermediate values use a weighted average. Arasan's latest tuning run used the nodchip tuner with lambda=0.75. I believe recent Stockfish versions use lambda=1.0.

Witek · Post by **Witek** » Thu Dec 29, 2022 12:45 am

Steve Maughan wrote: ↑Wed Dec 28, 2022 5:25 pm How many positions do you need in the training set to obtain a decent NN evaluation? Any rules of thumb will be helpful[/list]
Steve

alvinypeng wrote: ↑Wed Dec 28, 2022 11:28 pm In order to obtain a decent NN evaluation, I think you need anywhere from several hundreds of millions to a few billion positions. However, this is dependent on network architecture. For instance, if you are training a small 2x768-128-1 network, you can train a decent net with a couple hundred million positions. But, if you want to index by king square or king bucket, you may need at least a billion.

For smaller nets (~768 inputs, no king-relative features) you'll need ~100M positions. In Caissa, I managed to get decent net with just 80M positions.

lithander · Post by **lithander** » Thu Dec 29, 2022 4:33 am

Witek wrote: ↑Thu Dec 29, 2022 12:45 am For smaller nets (~768 inputs, no king-relative features) you'll need ~100M positions. In Caissa, I managed to get decent net with just 80M positions.

That's interesting. I had heard that you need billions of annotated positions and that scared me from looking into NNUE so far.

So this questions goes to all the NNUE developers: Are there any other sources/books/tutorials you can recommend? How did you get started? Did you use proven NN-architectures, weights, tuners, datasets first? Or everything from scratch?

alvinypeng · Post by **alvinypeng** » Thu Dec 29, 2022 8:14 pm

lithander wrote: ↑Thu Dec 29, 2022 4:33 am
Witek wrote: ↑Thu Dec 29, 2022 12:45 am For smaller nets (~768 inputs, no king-relative features) you'll need ~100M positions. In Caissa, I managed to get decent net with just 80M positions.
That's interesting. I had heard that you need billions of annotated positions and that scared me from looking into NNUE so far.

So this questions goes to all the NNUE developers: Are there any other sources/books/tutorials you can recommend? How did you get started? Did you use proven NN-architectures, weights, tuners, datasets first? Or everything from scratch?

HalfKA nets such as those found in Stockfish have 32 times the inputs as nets without king-relative features. So if a net with no king-relative features requires ~100m positions, a rough estimate of how many positions a HalfKA net requires is 32 * 100m, which is a few billion.

I think nnue-pytorch docs goes into plenty of detail already. There's also a section in Neural Networks for Chess that talks a little bit about NNUE, though I haven't read it.

After I understood how NNUE worked, I was able to write a simple training pipeline in tensorflow. Using a machine learning framework is a lot easier than having to write training code from complete scratch.

NNUE — Newbie Questions...

NNUE — Newbie Questions...

Re: NNUE — Newbie Questions...

Re: NNUE — Newbie Questions...

Re: NNUE — Newbie Questions...

Re: NNUE — Newbie Questions...

Re: NNUE — Newbie Questions...

Re: NNUE — Newbie Questions...

Re: NNUE — Newbie Questions...