I've implemented two quantization approaches at the moment:

1. nodchip - https://github.com/glinscott/nnue-pytor ... rialize.py - which tries to exactly match the nodchip implementation. So far, it results in some fairly busted evaluations though - even for nets that train to relatively low loss. Notably, the net is taught to directly predict SF internal score (very roughly 0x100 for pawn, 0x300 knight, etc.). The relu implementation is also quite non-standard, clamping the output to (0, 1). This makes training pretty challenging as well, although the nodchip trainer uses some interesting weight/bias initialization to help with this it appears. Pytorch makes the implementation of this so simple though, it's really an awesome framework (https://github.com/glinscott/nnue-pytor ... py#L23-L31):

Code: Select all

```
def forward(self, us, them, w_in, b_in):
w = self.input(w_in)
b = self.input(b_in)
l0_ = (us * torch.cat([w, b], dim=1)) + (them * torch.cat([b, w], dim=1))
l0_ = torch.clamp(l0_, 0.0, 1.0)
l1_ = torch.clamp(self.l1(l0_), 0.0, 1.0)
l2_ = torch.clamp(self.l2(l1_), 0.0, 1.0)
x = self.output(l2_)
return x
```

Fascinating stuff - and huge thanks to Sopel who rewrote the entire data pipeline to give an over 100x speed up by using sparse tensors .