Hey all,
Out of curiosity, has anyone tried using a Binary Neural Network for chess? i.e., with weights quantized to 1-bit. I'd imagine that it'd be fast enough for NNUE, especially considering the bitboard representation can be fed in natively. I'm not super well versed on the subject but I believe that convolutions can also be quantized down to 1-bit? Perhaps an interesting idea to try.
Binary neural network for chess?
Moderator: Ras
-
- Posts: 80
- Joined: Fri Jul 29, 2022 1:30 am
- Full name: Aaron Li
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Binary neural network for chess?
Yes Cuda supports experimental_1b convolutions. The Performance expected is more than 2 peta ops. A dot product with 1 bit = popcnt(xnor (x,y)) < 32. Thats the same as popcnt(xor(x,y)) >= 32.
But you need to read the "vertical bits" so without hardware like Cuda, FPGA or CPUs with AVX-galois field extensions it's nothing.
Training is hard because with 1b what is the gradient?
But I tackled this for a move generator to prove general viability:
https://github.com/Gigantua/Chess_BinaryNeuralNetwork
I even found a way to do it quite fast on normal CPUs by finding 32x4 popcounts in a handful instructions.
Code to get the up to 14 attacked bits by a Rook:
In Summary: Very interesting topic - especially for the first layer
But you need to read the "vertical bits" so without hardware like Cuda, FPGA or CPUs with AVX-galois field extensions it's nothing.
Training is hard because with 1b what is the gradient?
But I tackled this for a move generator to prove general viability:
https://github.com/Gigantua/Chess_BinaryNeuralNetwork
I even found a way to do it quite fast on normal CPUs by finding 32x4 popcounts in a handful instructions.
Code to get the up to 14 attacked bits by a Rook:
Code: Select all
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 0 * 32)))))) > 16) << 0 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 1 * 32)))))) > 16) << 1 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 2 * 32)))))) > 16) << 2 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 3 * 32)))))) > 16) << 3 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 4 * 32)))))) > 16) << 4 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 5 * 32)))))) > 16) << 5 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 6 * 32)))))) > 16) << 6 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 7 * 32)))))) > 16) << 7 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 8 * 32)))))) > 16) << 8 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 9 * 32)))))) > 16) << 9 ;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 10 * 32)))))) > 16) << 10;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 11 * 32)))))) > 16) << 11;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 12 * 32)))))) > 16) << 12;
result |= (std::popcount<uint32_t>(_mm256_movemask_epi8(ChessBNN::popcount8x32_SmallerThan4(_mm256_xor_si256(input, _mm256_load_si256(reinterpret_cast<const __m256i*>(weights + 13 * 32)))))) > 16) << 13;
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 87
- Joined: Thu Oct 07, 2021 12:48 am
- Location: Warsaw, Poland
- Full name: Michal Witanowski
Re: Binary neural network for chess?
I think it could be usefull in first one or two layers where it would help recognizing patterns. But you still need rather smooth and countinuous output from the network.
Author of Caissa Chess Engine: https://github.com/Witek902/Caissa
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Binary neural network for chess?
Exactly. This would be the way to go from a BB to a 2nd layer without the need to expand to 64x4b nibble representation first.
CUBLAS/CUTLASS templates do not support it - you have to go one level deeper and this is very hard for non nvidia engineers. You need to call the tensorop intrinsics "by hand".
You get these types:
https://docs.nvidia.com/cuda/cuda-c-pro ... ma-subbyte
And code will look like page 6:
https://arxiv.org/pdf/2006.16578.pdf
And you need to build a NN from that. Doable - but a few weeks of (fulltime) work imo.
But that is the way to go to implement maximum performance of binary neural networks in the year 2022 for consumer machines.
A naive approach will get you nowhere fast.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer