I have reread what chessprogramming wiki has to say about NNUE (https://www.chessprogramming.org/NNUE & https://www.chessprogramming.org/Stockfish_NNUE).
I'd guess that the general NNUE is to be used in our program as the Stockfish NNUE is used in a specific program. Although probably it is worth keeping an eye on it.
I have some very basic questions about NNUE. Why only one hidden layer? From theory we know that we need two layers for aproximating any function.
Why not more normal activation funtions like say TANH? Is that because of the binary imput?
Also (per my ealier question) have noticed that having an binary input layer makes a RBFNN not that usable, unless I have missed something.
Even though in orthodox chess one layer suffices maybe for fairy chess it would not.
Amoeba : A Modern TSCP Type Engine
Moderator: Ras
-
- Posts: 32
- Joined: Thu Feb 16, 2023 12:56 pm
- Full name: Florea Aurelian
-
- Posts: 28321
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Amoeba : A Modern TSCP Type Engine
I don't follow Stockfish development, and IIRC the original Shogi NNUE had two hidden layers. So I suppose that for Chess one of these layers could be optimized away.
I think ReLu activation functions are used for efficiency reasons. They only require the sum of the inputs to be clipped to a smaller number of bits, in a saturating way. Intel-architecture CPUs have instructions that make it possible to do thos for many integers packed in a single word simultaneously. They doný have instructions that calculate a large number of tanh functions simultaneously in a single clock cycle.
I don't think the input layer applies any activation function; it is just a fancy name for the data fed to the network. It is binary because chess allows only 0 or 1 piece on each square. The first layer of weights produces a piece-square sum, which can assume many values over a wide range, so applying different activation functions to that sum really makes a difference.
I think ReLu activation functions are used for efficiency reasons. They only require the sum of the inputs to be clipped to a smaller number of bits, in a saturating way. Intel-architecture CPUs have instructions that make it possible to do thos for many integers packed in a single word simultaneously. They doný have instructions that calculate a large number of tanh functions simultaneously in a single clock cycle.
I don't think the input layer applies any activation function; it is just a fancy name for the data fed to the network. It is binary because chess allows only 0 or 1 piece on each square. The first layer of weights produces a piece-square sum, which can assume many values over a wide range, so applying different activation functions to that sum really makes a difference.
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Amoeba : A Modern TSCP Type Engine
HGM, Is this project still in active development?
-
- Posts: 28321
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Amoeba : A Modern TSCP Type Engine
In principle it is, but I have not been active in the area of chess programming these past weeks, because of other pressing business. I hope to resume it soon; I was already close to having a version that would be able to play a game of chess using PST-only evaluation.
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Amoeba : A Modern TSCP Type Engine
hgm wrote: ↑Tue Mar 18, 2025 7:25 pm In principle it is, but I have not been active in the area of chess programming these past weeks, because of other pressing business. I hope to resume it soon; I was already close to having a version that would be able to play a game of chess using PST-only evaluation.

-
- Posts: 912
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Amoeba : A Modern TSCP Type Engine
You need to be able to run millions of evals per second. Everything that speeds up the eval may be worth a sacrifice in accuracy. If you consider how well PSQTs work already and that each neuron in the hidden layer is basically it's own set of PSQTs it's no surprise that hundreds of them can provide already quite good eval for chess.catugocatugocatugo wrote: ↑Tue Mar 11, 2025 9:46 am I have reread what chessprogramming wiki has to say about NNUE (https://www.chessprogramming.org/NNUE & https://www.chessprogramming.org/Stockfish_NNUE).
I'd guess that the general NNUE is to be used in our program as the Stockfish NNUE is used in a specific program. Although probably it is worth keeping an eye on it.
I have some very basic questions about NNUE. Why only one hidden layer? From theory we know that we need two layers for aproximating any function.
Why not more normal activation funtions like say TANH? Is that because of the binary imput?
Also (per my ealier question) have noticed that having an binary input layer makes a RBFNN not that usable, unless I have missed something.
Even though in orthodox chess one layer suffices maybe for fairy chess it would not.
With flexible trainers like bullet you can relatively easily experiment with different network architectures and try a 2nd layer if you want.
What seems to be more promising is to use buckets e.g. have multiples of the 768 inputs. But for a learning engine I would keep it simple like (768->256)x2->1
Another things that is done for the sake of speed at the cost of accuracy is quantization. So instead of using floats you map floats to [0..255] (8-bits) and then use shorts (16-bits) as much as possible while accumulating. To get some accuracy back actually SCRelu (squared, clipped Relu) is better than CRelu as it basically allocates more precision (bits) to smaller values. When you switch from CRelu to SCRelu you can gain some 20 Elo, but only if you take extra care that you don't have to widen the shorts to integers to early and half your throughput. You can get around this by ordering the required operations carefully. (See Lizards optimization described in the Wiki)
-
- Posts: 32
- Joined: Thu Feb 16, 2023 12:56 pm
- Full name: Florea Aurelian
Re: Amoeba : A Modern TSCP Type Engine
Just a small caveat mister lithander . This engine is meant to also play some simple, or even maybe more complex fairy chess. And that is actually my main interest here. But I do think that your advice still holds. It is just that maybe the nets should be bigger. Thanks for your help!
-
- Posts: 32
- Joined: Thu Feb 16, 2023 12:56 pm
- Full name: Florea Aurelian
Re: Amoeba : A Modern TSCP Type Engine
So if I understand correctly, for the AI piece values+PST+NNUE is enough for a good engine. I am not sure though, for the fairy chess games where there is not much work done for knowing the piece values, how will the NNUE adapt. I am not talking about gross errors, but errors less that 1.5 pawns, as I think that within 1.5 pawns I can aproximate the value of any piece. The same goes for the PST. There the discussion is even more complex.
-
- Posts: 28321
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Amoeba : A Modern TSCP Type Engine
I don't think NNUE engines use piece values or PST. Just NNUE. It will learn the piece values and their average dependence on location during training.
Problem with variants could be the amount of training required. To grasp a simple concept like piece values the training material should contain a large-enough representative selection of each material combination. (Large enough to average out all positional factors.) With only 5 piece types (the King is always present), that is not too hard. Now try to extend that to, say, Chu Shogi, where you have some 36 different piece types (distinguishing promoted and promotable types). How many more material combinations will have to be learned now?
Problem with variants could be the amount of training required. To grasp a simple concept like piece values the training material should contain a large-enough representative selection of each material combination. (Large enough to average out all positional factors.) With only 5 piece types (the King is always present), that is not too hard. Now try to extend that to, say, Chu Shogi, where you have some 36 different piece types (distinguishing promoted and promotable types). How many more material combinations will have to be learned now?
-
- Posts: 32
- Joined: Thu Feb 16, 2023 12:56 pm
- Full name: Florea Aurelian
Re: Amoeba : A Modern TSCP Type Engine
Please, pardon my misunderstanding with the PV and PST.hgm wrote: ↑Thu Apr 03, 2025 8:30 am I don't think NNUE engines use piece values or PST. Just NNUE. It will learn the piece values and their average dependence on location during training.
Problem with variants could be the amount of training required. To grasp a simple concept like piece values the training material should contain a large-enough representative selection of each material combination. (Large enough to average out all positional factors.) With only 5 piece types (the King is always present), that is not too hard. Now try to extend that to, say, Chu Shogi, where you have some 36 different piece types (distinguishing promoted and promotable types). How many more material combinations will have to be learned now?
Anyway, it is not just the large sample that is needed but also a larger hidden layer(accumulator). The two games I am working on both have 14 piece types and 100 squares. This means an input layer of 2800 values. So, an accumulator for this should be at least 4096 I'd say. The question I'm pondering is should there be another hidden layer, 64 neurons second hidden layer. Probably initially no, but for a stronger engine this is certainty required, especially if it figures that there are some nested concepts in the game.