256 in NNUE?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

kinderchocolate
Posts: 454
Joined: Mon Nov 01, 2010 6:55 am
Full name: Ted Wong

256 in NNUE?

Post by kinderchocolate »

https://www.chessprogramming.org/Stockfish_NNUE. In the NNUE network architecture, I can see 256 weights in the first hidden layer. It requires an int16 to keep the weights.

Can we reduce the number of hidden layer neurons from say 256 to 128, and it will reduce the size of the weights to int8? Also, what's the reason for keeping it to 256 and not any other numbers? It looks by reducing the neurons just one, the weight vectors sharply dropped to int8.
tomitank
Posts: 276
Joined: Sat Mar 04, 2017 12:24 pm
Location: Hungary

Re: 256 in NNUE?

Post by tomitank »

kinderchocolate wrote: Thu Jan 28, 2021 8:16 pm https://www.chessprogramming.org/Stockfish_NNUE. In the NNUE network architecture, I can see 256 weights in the first hidden layer. It requires an int16 to keep the weights.

Can we reduce the number of hidden layer neurons from say 256 to 128, and it will reduce the size of the weights to int8? Also, what's the reason for keeping it to 256 and not any other numbers? It looks by reducing the neurons just one, the weight vectors sharply dropped to int8.
Yes you can reduce. 256 is only a tested number (i think). There are some recommendations for neurons, but there is no perfect recipe.
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: 256 in NNUE?

Post by BeyondCritics »

kinderchocolate wrote: Thu Jan 28, 2021 8:16 pm ...
Can we reduce the number of hidden layer neurons from say 256 to 128, and it will reduce the size of the weights to int8? ...
The number of weights tells you something about the dimension of your problem, the size of ints something about your precision. Obviously there is no correspondence between them. You can of course try to vary them, as already mentioned. AFAIK there exists still no recommendable guideline to prefer one over some other, except for cornercases.
AndrewGrant
Posts: 1752
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: 256 in NNUE?

Post by AndrewGrant »

You can have a network with 2x128 neurons in L1 as opposed to 2x256 like SF. Its what I do in Ethereal, it works fine.
The number of weights has nothing to do with the size of the individual weights.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
kinderchocolate
Posts: 454
Joined: Mon Nov 01, 2010 6:55 am
Full name: Ted Wong

Re: 256 in NNUE?

Post by kinderchocolate »

Thanks! Follow up questions:

1.) Why integer16 for the weights? Shouldn't the weights be floating numbers?
2.) Why int16 in the first layer but int8 in the remaining of the network?
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: 256 in NNUE?

Post by hgm »

My guess is that this is just trial and error. Heavily influenced by what the hardware can do in terms of SIMD instructions. The weights of the first layer never have to be multiplied (as the inputs are just 0/1), and are furthermore updated incrementally. Which means in practice only adding / subtracting a very small fraction of them (those for which the input toggled) to the KPST sums. So I suppose there is very little gain in reducing the precision to 8 bits. Which of course would not be enough of a reason to refrain doing it if making them 16 bits would serve no purpose at all.
derjack
Posts: 16
Joined: Fri Dec 27, 2019 8:47 pm
Full name: Jacek Dermont

Re: 256 in NNUE?

Post by derjack »

It's called neural network quantization. You convert into int16 or int8, you lose some precision, which may worsen the net a liitle bit, but SIMD calculations then can be faster as more weights fit into registers. As for int16 for first layer, this reduces memory by 2x (floats are 32bits), so more or entire neural network can fit into cache, thus again it becomes faster.
AndrewGrant
Posts: 1752
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: 256 in NNUE?

Post by AndrewGrant »

kinderchocolate wrote: Sat Jan 30, 2021 11:06 am Thanks! Follow up questions:

1.) Why integer16 for the weights? Shouldn't the weights be floating numbers?
2.) Why int16 in the first layer but int8 in the remaining of the network?
https://chess.stackexchange.com/questio ... 3736#33736
That answers why the input is 16 bits and the others 8 bits. TL;DR: Due the the way its computed _without_ multiplication.

As for floats -- you can pack ints more heavily. Also, you remove the possibility of variance between how compilers and platforms treat floats. Not all floating point expressions will be evaluated the same way on all compilers / platforms.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: 256 in NNUE?

Post by Sesse »

AndrewGrant wrote: Sat Jan 30, 2021 3:21 pm As for floats -- you can pack ints more heavily. Also, you remove the possibility of variance between how compilers and platforms treat floats. Not all floating point expressions will be evaluated the same way on all compilers / platforms.
But most certainly will, especially now that x87 is effectively dead and gone. (The biggest differences are in library functions like exp() and the likes.)