Pytorch NNUE training

gladius · Post by **gladius** » Sun Nov 08, 2020 9:56 pm

I started an implementation of the SF NNUE training in Pytorch: https://github.com/glinscott/nnue-pytorch. It's mostly working now, but the huge gap is exporting data that matches the quantization that the SF nodchip trainer does. Training process is working quite well.

I've implemented two quantization approaches at the moment:
1. nodchip - https://github.com/glinscott/nnue-pytor ... rialize.py - which tries to exactly match the nodchip implementation. So far, it results in some fairly busted evaluations though - even for nets that train to relatively low loss. Notably, the net is taught to directly predict SF internal score (very roughly 0x100 for pawn, 0x300 knight, etc.). The relu implementation is also quite non-standard, clamping the output to (0, 1). This makes training pretty challenging as well, although the nodchip trainer uses some interesting weight/bias initialization to help with this it appears. Pytorch makes the implementation of this so simple though, it's really an awesome framework (https://github.com/glinscott/nnue-pytor ... py#L23-L31):

Code: Select all

  def forward(self, us, them, w_in, b_in):
    w = self.input(w_in)
    b = self.input(b_in)
    l0_ = (us * torch.cat([w, b], dim=1)) + (them * torch.cat([b, w], dim=1))
    l0_ = torch.clamp(l0_, 0.0, 1.0)
    l1_ = torch.clamp(self.l1(l0_), 0.0, 1.0)
    l2_ = torch.clamp(self.l2(l1_), 0.0, 1.0)
    x = self.output(l2_)
    return x

2. Pytorch quantization - this uses the pytorch framework (https://github.com/glinscott/nnue-pytor ... py#L60-L87), which does quantization based on statistics of the weights/biases and the activations that are seen in practice. This should theoretically result in more accurate loss - but, it only uses 8 bit weights and activations. The nodchip implementation has some really interesting choices there, using 16 bits for the feature transformer layer, and 8 bits for the fully connected layers. The tricky part here would be porting facebooks fbgemm (https://engineering.fb.com/2018/11/07/m ... ns/fbgemm/) implementation into SF, but I think this approach has more headroom to grow. It also avoids the clipped relu training issues, as it's just a normal relu.

Fascinating stuff - and huge thanks to Sopel who rewrote the entire data pipeline to give an over 100x speed up by using sparse tensors

.

gladius · Post by **gladius** » Mon Nov 09, 2020 5:51 pm

I finally managed to get a network that gets reasonable evals exported - there were two key points that were required:
1. Emulating the clipped relu - this was mentioned in the first post
2. Scaling the network output by 600! This is snuck into the nodchip trainer as `kPonanzaConstant` - https://github.com/nodchip/Stockfish/bl ... r.cpp#L212, and for computing the training loss, the network output is multiplied by 600. This is super important, or the network has to learn huge biases/weights for the final 32->1 layer to get reasonable scores out (which makes sense, as the values are clipped to 1 in float space, or 127 in the integer implementation).

Evaluation output (from the awesome https://hxim.github.io/Stockfish-Evaluation-Guide/):

Guenther · Post by **Guenther** » Mon Nov 09, 2020 10:06 pm

Thanks for all the interesting stuff Gary!

sedicla · Post by **sedicla** » Tue Nov 10, 2020 2:05 pm

Hi Gary,

Thank you for making this available, (and sf, and lc0, and fishtest, etc, etc...

)

I have a question, from what I understood this trainer loads the data from binary files, how do you generate those files? using nodchip's SF trainer code?
The output is generated in a format for SF. If I want to use in my engine would have to port it, right?
Currently I'm doing some work on my engine with a smaller net in tensorflow, but in the future I may test a bigger net like SF.

Thanks.

Martin · Post by **Martin** » Tue Nov 10, 2020 6:18 pm

sedicla wrote: ↑Tue Nov 10, 2020 2:05 pm
I have a question, from what I understood this trainer loads the data from binary files, how do you generate those files? using nodchip's SF trainer code?
The output is generated in a format for SF. If I want to use in my engine would have to port it, right?

If you want to generate training data in SF format based on you engines search/eval you can take a look at this script, https://github.com/bmdanielsson/marvin- ... nuedata.py . It only relies on standard UCI commands so it should be easy to adapt to any engine. It can output data in both plain and bin formats.

sedicla · Post by **sedicla** » Wed Nov 11, 2020 1:36 am

Martin wrote: ↑Tue Nov 10, 2020 6:18 pm
If you want to generate training data in SF format based on you engines search/eval you can take a look at this script, https://github.com/bmdanielsson/marvin- ... nuedata.py . It only relies on standard UCI commands so it should be easy to adapt to any engine. It can output data in both plain and bin formats.

Hi Martin,
Maybe I am missing something but this script generates the fen position in the output file, right? I'm looking at write_position method.
I think my question is what is the format of the bin file used for training? I guess I can figure it out from the data loader, but wondering if there's something already.
Thanks

gladius · Post by **gladius** » Wed Nov 11, 2020 2:40 am

sedicla wrote: ↑Tue Nov 10, 2020 2:05 pm Hi Gary,

Thank you for making this available, (and sf, and lc0, and fishtest, etc, etc... )

I have a question, from what I understood this trainer loads the data from binary files, how do you generate those files? using nodchip's SF trainer code?
The output is generated in a format for SF. If I want to use in my engine would have to port it, right?
Currently I'm doing some work on my engine with a smaller net in tensorflow, but in the future I may test a bigger net like SF.

Thanks.

Yup, the .bin format is generated by the nodchip gensfen command, eg. here is the one I’m running right now:

Code: Select all

#!/bin/bash

DEPTH=5
GAMES=10000000

options="
uci
setoption name PruneAtShallowDepth value false
setoption name Use NNUE value true
setoption name Threads value 4
setoption name Hash value 1024
isready
gensfen set_recommended_uci_options ensure_quiet depth $DEPTH loop $GAMES output_file_name d${DEPTH}_${GAMES}"

printf "$options" | ./stockfish

There is a simpler format as well, which is just the fens called .plain, it is a lot slower to parse though, which is why I’m not using it. The training code in nodchip has a convert function between all three types though, and I believe the library is independent of SF.

sedicla · Post by **sedicla** » Wed Nov 11, 2020 3:29 am

ok, I will take a look. I can either port to my engine or create a utility pgn->bin as suggested by Martin.

Thanks!

Martin · Post by **Martin** » Wed Nov 11, 2020 4:53 pm

sedicla wrote: ↑Wed Nov 11, 2020 1:36 am
Maybe I am missing something but this script generates the fen position in the output file, right? I'm looking at write_position method.
I think my question is what is the format of the bin file used for training? I guess I can figure it out from the data loader, but wondering if there's something already.

Sorry, I forgot to push the bin support to GitHub. It's there now.

sedicla · Post by **sedicla** » Thu Nov 12, 2020 1:04 am

Martin wrote: ↑Wed Nov 11, 2020 4:53 pm
sedicla wrote: ↑Wed Nov 11, 2020 1:36 am
Maybe I am missing something but this script generates the fen position in the output file, right? I'm looking at write_position method.
I think my question is what is the format of the bin file used for training? I guess I can figure it out from the data loader, but wondering if there's something already.

Sorry, I forgot to push the bin support to GitHub. It's there now.

Thanks a lot Martin, looks great !

Alcides.

Pytorch NNUE training

Pytorch NNUE training

Re: Pytorch NNUE training

Re: Pytorch NNUE training

Re: Pytorch NNUE training

Re: Pytorch NNUE training

Re: Pytorch NNUE training

Re: Pytorch NNUE training

Re: Pytorch NNUE training

Re: Pytorch NNUE training

Re: Pytorch NNUE training