Neural Networks weights type
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 172
 Joined: Fri Apr 11, 2014 8:45 am
 Full name: Fabio Gobbato
 Contact:
Neural Networks weights type
I have seen in Stockfish NNUE that the network uses integer types for the weights instead of floating point types. One advantage is surely the speed but there could also be some drawbacks. What are the differences between integer and floating point networks? Is it possible to build a good net that runs on cpu with floating point weights or it's better to use integer weights?

 Posts: 436
 Joined: Mon Apr 24, 2006 6:06 pm
 Contact:
Re: Neural Networks weights type
8bit accuracy is often accurate enough, and faster than floating point.
The tensor cores of the most recent NVIDIA GPUs can do 4bit calculation (in addition to 8bit interger, and 16bit float). The next generation will also allow sparsity, which is another big potential for performance improvement. Training sparse 4bit neural network is a bit tricky, though.
Some even do 1bit neural networks:
https://jmlr.csail.mit.edu/papers/v18/16456.html
The tensor cores of the most recent NVIDIA GPUs can do 4bit calculation (in addition to 8bit interger, and 16bit float). The next generation will also allow sparsity, which is another big potential for performance improvement. Training sparse 4bit neural network is a bit tricky, though.
Some even do 1bit neural networks:
https://jmlr.csail.mit.edu/papers/v18/16456.html
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran ElYaniv, Yoshua Bengio; 18(187):1−30, 2018.
Abstract
We introduce a method to train Quantized Neural Networks (QNNs)  neural networks with extremely low precision (e.g., 1bit) weights and activations, at runtime. At traintime the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bitwise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32bit counterparts. For example, our quantized version of AlexNet with 1bit weights and 2bit activations achieves $51\%$ top1 accuracy. Moreover, we quantize the parameter gradients to 6bits as well which enables gradients computation using only bitwise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32bit counterparts using only 4bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online.
Re: Neural Networks weights type
It seems as though NNs don't require much accuracy. A couple of data types they tend to use:
* half precision  link
* brain float  link
TPU's are a bit like graphics cards, but they use low precision arithmetic, which enables them to do much more NN work for the same amount of hardware. It seems a natural step to have them using integers.
* half precision  link
* brain float  link
TPU's are a bit like graphics cards, but they use low precision arithmetic, which enables them to do much more NN work for the same amount of hardware. It seems a natural step to have them using integers.
Writing is the antidote to confusion