Tensorflow NNUE training
Posted: Wed Nov 11, 2020 12:57 am
Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.
First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.
Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py
Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.
First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.
Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py
Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp