NNUE training set generation

Edsel Apostol · Post by **Edsel Apostol** » Sat Jul 03, 2021 2:12 am

I have a noob question regarding NNUE training set generation. Would games/positions generated through a random mover be sufficient compared to an actual game played with a depth or time constraint? Random mover games should be fast. You can just run games from an opening book like the one used in your SPRT testing for example, but all moves after that will be random.

My next question, if we use lets say for example Leela training games/positions but we run our own eval to those positions, would the net be similar to the other nets trained in the same set of positions but with their own initial eval as well?

Would the nets become similar after a few generations? Lets say generation 1 net trained from initial HCE, then generation 2 net trained from generation 1 net, and so on. Would those nets be closer now to the nets of other engines using the same sets but started out with their own HCE eval?

Is the latest SF using Leela games and eval in their training or just the positions but they use their own eval to those positions?

connor_mcmonigle · Post by **connor_mcmonigle** » Mon Jul 05, 2021 5:29 pm

I have a noob question regarding NNUE training set generation. Would games/positions generated through a random mover be sufficient compared to an actual game played with a depth or time constraint? Random mover games should be fast. You can just run games from an opening book like the one used in your SPRT testing for example, but all moves after that will be random.

I've never tested this myself, but I believe one of the Koivisto authors did some testing with training on positions resulting from random positions and saw poor results. The goal is to achieve a good balance between the balanced and imbalanced positions in your training set. Too many insanely imbalanced positions isn't desirable as you'll almost immediately get a cutoff in such positions when they're reached in your search (understanding just how losing/winning a random position is is mostly irrelevant in light of this). The usual approach is to generate a book consisting of all random moves up to ply N (N=4 is typical) and then play out fixed depth self play games from those positions to generate data. The resultant positions are sufficiently varied resulting in a good representation of both imbalanced and balanced positions.

My next question, if we use lets say for example Leela training games/positions but we run our own eval to those positions, would the net be similar to the other nets trained in the same set of positions but with their own initial eval as well?

I'd guess not, provided your existing evaluation function is original and the rest of your implementation is your own as well, I think you can expect a fairly unique network. Note that rescoring with fixed depth search results doesn't really save any time as compared to just playing out fixed depth games to produce both unique labels and positions simultaneously.

Would the nets become similar after a few generations? Lets say generation 1 net trained from initial HCE, then generation 2 net trained from generation 1 net, and so on. Would those nets be closer now to the nets of other engines using the same sets but started out with their own HCE eval?

I'm unsure. In practice, if solely using evaluations for training, you'll find that only a few iterations can be completed before the entire process destabilizes anyways. However, many iterations can be performed when using the fixed depth self play approach outlined above and mixing in game result. I'd guess that this self play process might come pretty close to converging provided the same network architecture/topology is used across otherwise indepedent runs...

Is the latest SF using Leela games and eval in their training or just the positions but they use their own eval to those positions?

The latest Stockfish is trained on a mix of pretty much raw Leela data (with some processing to convert to the Stockfish team's packed binary data format) and some Stockfish self play generated using the fixed depth playouts from a random book approach outlined above.

connor_mcmonigle · Post by **connor_mcmonigle** » Mon Jul 05, 2021 6:38 pm

To clarify, for the majority of the positions the Stockfish 14 network is trained upon, the labels are supplied by Lc0.

MikeB · Post by **MikeB** » Wed Jul 07, 2021 4:59 am

Edsel Apostol wrote: ↑Sat Jul 03, 2021 2:12 am I have a noob question regarding NNUE training set generation. Would games/positions generated through a random mover be sufficient compared to an actual game played with a depth or time constraint? Random mover games should be fast. You can just run games from an opening book like the one used in your SPRT testing for example, but all moves after that will be random.

My next question, if we use lets say for example Leela training games/positions but we run our own eval to those positions, would the net be similar to the other nets trained in the same set of positions but with their own initial eval as well?

Would the nets become similar after a few generations? Lets say generation 1 net trained from initial HCE, then generation 2 net trained from generation 1 net, and so on. Would those nets be closer now to the nets of other engines using the same sets but started out with their own HCE eval?

Is the latest SF using Leela games and eval in their training or just the positions but they use their own eval to those positions?

The best training nets for Stockfish recently have been training on a combination binpack of leela/lc0) games mixed in with a binpack where the static eval does not represent the searched eval - commonly known as the "wrong" binpack. All of the recent SF nets making master have been trained using as prior net as the base (not from scratch) and the two most recent nets were prior nets enhanced through SPSA tuning. see https://github.com/official-stockfish/S ... e686412681. Based on values that now change in net when tuned, we are either stuck or we are close to the optimized net, The values that matter in the net flip flop back and forth by "1" usually. We may need more (deeper) “wrong" nnues or use a great number of digits in the nnue file. Most of the values that matter are just three digits, so if we make them 6 digits - will that matter? In the short term, we can probably get more Elo if we can move to smaller more efficient updateable NNUE, with respect to binpacks, it is not just the depth, but the quality of the binpack positions - positions that have real practicality and not simply random generated as well as sufficient number of positions where the static eval is not the true representation of a searched eval of the position.

NNUE training set generation

NNUE training set generation

Re: NNUE training set generation

Re: NNUE training set generation

Re: NNUE training set generation