NNUE — Newbie Questions...

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Steve Maughan
Posts: 1273
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

NNUE — Newbie Questions...

Post by Steve Maughan »

I'm thinking of adding NNUE to Maverick (as well as a whole new rewrite). I found the following resource incredibly helpful:

https://github.com/glinscott/nnue-pytor ... cs/nnue.md

However, I still have some really basic questions that aren't covered in the github article:
  • I assume the positions used to train the network are "stable" and the best move isn't a capture that changes the material balance — is this correct?
  • How do you deal with positions where there is a forced tactical sequence e.g., a check or a fork and then winning a piece? Ignore? Only use positions where the PV has the same material balance at the end as at the beginning?
  • Is it best practice to train the network on win / draw / loss logistic function (i.e., +1 / 0 / -1), or on the (hand-tuned evaluation) score from a shallow search?
  • How many positions do you need in the training set to obtain a decent NN evaluation? Any rules of thumb will be helpful
All help appreciated — thanks!

Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
alvinypeng
Posts: 36
Joined: Thu Mar 03, 2022 7:29 am
Full name: Alvin Peng

Re: NNUE — Newbie Questions...

Post by alvinypeng »

From what I've tried, I think it is best to train the network on "stable" positions.

In many training implementations I've seen, you feed a shallow search cp score through a logistic function to get it between -1 and 1. Then, you train the network on a weighted sum between that and the win / draw / loss score. Usually, the scaled shallow search cp score is weighed more.

In order to obtain a decent NN evaluation, I think you need anywhere from several hundreds of millions to a few billion positions. However, this is dependent on network architecture. For instance, if you are training a small 2x768-128-1 network, you can train a decent net with a couple hundred million positions. But, if you want to index by king square or king bucket, you may need at least a billion.
User avatar
Steve Maughan
Posts: 1273
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: NNUE — Newbie Questions...

Post by Steve Maughan »

Thanks Alvin — this is helpful!

Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
jdart
Posts: 4398
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: NNUE — Newbie Questions...

Post by jdart »

I assume the positions used to train the network are "stable" and the best move isn't a capture that changes the material balance — is this correct?
The --smart-fen-skipping option to the trainer skips moves that are captures and moves in which the King is in check. This is recommended.
jdart
Posts: 4398
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: NNUE — Newbie Questions...

Post by jdart »

Is it best practice to train the network on win / draw / loss logistic function (i.e., +1 / 0 / -1), or on the (hand-tuned evaluation) score from a shallow search?
The "lambda" parameter controls this. 0 means tune on results only, 1.0 means tune on eval only, intermediate values use a weighted average. Arasan's latest tuning run used the nodchip tuner with lambda=0.75. I believe recent Stockfish versions use lambda=1.0.
Witek
Posts: 87
Joined: Thu Oct 07, 2021 12:48 am
Location: Warsaw, Poland
Full name: Michal Witanowski

Re: NNUE — Newbie Questions...

Post by Witek »

Steve Maughan wrote: Wed Dec 28, 2022 5:25 pm How many positions do you need in the training set to obtain a decent NN evaluation? Any rules of thumb will be helpful[/list]
Steve
alvinypeng wrote: Wed Dec 28, 2022 11:28 pm In order to obtain a decent NN evaluation, I think you need anywhere from several hundreds of millions to a few billion positions. However, this is dependent on network architecture. For instance, if you are training a small 2x768-128-1 network, you can train a decent net with a couple hundred million positions. But, if you want to index by king square or king bucket, you may need at least a billion.
For smaller nets (~768 inputs, no king-relative features) you'll need ~100M positions. In Caissa, I managed to get decent net with just 80M positions.
Author of Caissa Chess Engine: https://github.com/Witek902/Caissa
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: NNUE — Newbie Questions...

Post by lithander »

Witek wrote: Thu Dec 29, 2022 12:45 am For smaller nets (~768 inputs, no king-relative features) you'll need ~100M positions. In Caissa, I managed to get decent net with just 80M positions.
That's interesting. I had heard that you need billions of annotated positions and that scared me from looking into NNUE so far.

So this questions goes to all the NNUE developers: Are there any other sources/books/tutorials you can recommend? How did you get started? Did you use proven NN-architectures, weights, tuners, datasets first? Or everything from scratch?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
alvinypeng
Posts: 36
Joined: Thu Mar 03, 2022 7:29 am
Full name: Alvin Peng

Re: NNUE — Newbie Questions...

Post by alvinypeng »

lithander wrote: Thu Dec 29, 2022 4:33 am
Witek wrote: Thu Dec 29, 2022 12:45 am For smaller nets (~768 inputs, no king-relative features) you'll need ~100M positions. In Caissa, I managed to get decent net with just 80M positions.
That's interesting. I had heard that you need billions of annotated positions and that scared me from looking into NNUE so far.

So this questions goes to all the NNUE developers: Are there any other sources/books/tutorials you can recommend? How did you get started? Did you use proven NN-architectures, weights, tuners, datasets first? Or everything from scratch?
HalfKA nets such as those found in Stockfish have 32 times the inputs as nets without king-relative features. So if a net with no king-relative features requires ~100m positions, a rough estimate of how many positions a HalfKA net requires is 32 * 100m, which is a few billion.

I think nnue-pytorch docs goes into plenty of detail already. There's also a section in Neural Networks for Chess that talks a little bit about NNUE, though I haven't read it.

After I understood how NNUE worked, I was able to write a simple training pipeline in tensorflow. Using a machine learning framework is a lot easier than having to write training code from complete scratch.