More experiments with neural nets

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: More experiments with neural nets

Post by Joost Buijs »

jonkr wrote: Sat Jan 23, 2021 5:29 am For my tensorflow script I use the numPy loading of a binary file with 1-byte per on-off input, it was surprisingly quick even for 10+ gigabyte files. I export the binary training data from my dev slow.exe which includes my auto-trainer code. My first try with tensorflow I loaded and converted text but that was like 100 times slower.
I have about 10M positions for my general net, I think it takes like 5 minutes for me to train which is nice to play around with. I'm sure in general more is better, my position library was slowly growing as I ran my tests/training.
I started with 1-byte inputs like you do in your tensorflow script (which I used for some draughts experiments btw). For the libTorch trainer I switched to a file format with 1-bit inputs, not because it is faster but because I can load almost 8 times more positions in memory (of course the labels are still float). libTorch (which is based on Caffe2) only supports float inputs, for each batch I have to translate the 1-bit inputs to float which goes very fast in C++. The biggest slowdown is caused by translating the floats to tensors and loading them on the GPU.

libTorch also supports dynamic quantized layers (still experimental, which usually means bugs), maybe these can help to solve the slowdown caused by the float inputs. With these layers it is possible to define different types for the inputs, the layer, and the activation. I still have to dive into this.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: More experiments with neural nets

Post by Joost Buijs »

I've been looking at PyTorch/libTorch quantized layers, unfortunately they don't support this on a GPU yet. They have what they call quantization aware training which is done with float32 arithmetic. Pytorch/libTorch is in this respect still behind Tensorflow.

The last week I've been working on updating my libTorch trainer, with file-mapping for the bit-inputs and a multi-threaded data-loader. When the program calculates the gradients for a batch the data for next batch is being prepared in parallel which gives a nice speedup. I still have to add a sparse Tensor for the inputs which I hopefully get to work this week.

Last (but not least) I bite the bullet and ordered a Gigabyte RTX-3090 Turbo which will arrive tomorrow. Hopefully this will give an additional speedup over the RTX-2060S that I'm currently using. According to Tim Dettmers https://timdettmers.com/2020/09/07/whic ... -learning/ the RTX-3090 should be almost 3 times as fast. NN's is a matter of trial and error, the quicker you can train the quicker you will have results.