I started with 1-byte inputs like you do in your tensorflow script (which I used for some draughts experiments btw). For the libTorch trainer I switched to a file format with 1-bit inputs, not because it is faster but because I can load almost 8 times more positions in memory (of course the labels are still float). libTorch (which is based on Caffe2) only supports float inputs, for each batch I have to translate the 1-bit inputs to float which goes very fast in C++. The biggest slowdown is caused by translating the floats to tensors and loading them on the GPU.jonkr wrote: ↑Sat Jan 23, 2021 5:29 am For my tensorflow script I use the numPy loading of a binary file with 1-byte per on-off input, it was surprisingly quick even for 10+ gigabyte files. I export the binary training data from my dev slow.exe which includes my auto-trainer code. My first try with tensorflow I loaded and converted text but that was like 100 times slower.
I have about 10M positions for my general net, I think it takes like 5 minutes for me to train which is nice to play around with. I'm sure in general more is better, my position library was slowly growing as I ran my tests/training.
libTorch also supports dynamic quantized layers (still experimental, which usually means bugs), maybe these can help to solve the slowdown caused by the float inputs. With these layers it is possible to define different types for the inputs, the layer, and the activation. I still have to dive into this.