Pawn King Neural Network

tomitank · Post by **tomitank** » Thu Nov 26, 2020 12:38 pm

I'm trying to add a Pawn King Neural Network (like Ethereal) to my JavaScript chess engine.
What i do:
1.) Extract pawn king eval for 0.7M position at depth 6. (cp eval for MG and EG)
2.) Befor training eval all positions with qsearch same as like above.
3.) During 2. point fill a input array with pawn and king positions. (0 for empty square and 1 for piece)
4.) During 2.point fill a output (target) array with activated eval difference:

Code: Select all

var output = [
	Activations.chess((currEval.MG > evalMG ? -1 : 1) * Math.abs(evalMG - currEval.MG)),
	Activations.chess((currEval.EG > evalEG ? -1 : 1) * Math.abs(evalEG - currEval.EG))
];

Activations.chess is:

Code: Select all

1 / (1 + Math.pow(10, (-x / 400)))

5.) Training with own writed Neural Network [224x32x2]
6.) After training i get the output in cp, because i don't use activation on the output layer. (it's not true for hidden layer)

The training is currently running with batch size: 1024 and Adam optimizer.
The network is full connected, work with matrices and probably stable, i tested with a few logic gate.
I don't know the results yet.
My question would be, has anyone tried anything like this before? (What results can I expect?)
What should be changed?
Best practices?

(I'm new in NN and i don't want to use machine learning platform.)

Thanks,
Tamás

jdart · Post by **jdart** » Thu Nov 26, 2020 4:41 pm

Overview available here: https://www.chessprogramming.org/NNUE

For speed, Stockfish does an incremental update of (I believe) the first network layer during move do/undo.

AndrewGrant · Post by **AndrewGrant** » Fri Nov 27, 2020 6:45 am

tomitank wrote: ↑Thu Nov 26, 2020 12:38 pm (I'm new in NN and i don't want to use machine learning platform.)

Firstly, good on you. You don't understand NN, so you are taking steps on your own to program the trainer, in order to learn and understand. That puts you a mile ahead of far too many on these forums who know nothing of what they have in their engines.

tomitank wrote: ↑Thu Nov 26, 2020 12:38 pm The training is currently running with batch size: 1024 and Adam optimizer.
The network is full connected, work with matrices and probably stable, i tested with a few logic gate.
I don't know the results yet.
My question would be, has anyone tried anything like this before? (What results can I expect?)
What should be changed?
Best practices?

My PK networks were a bit strange in how you trained them. They would train the output, to add to a static eval, if a position, using a sigmoid loss function. I never actually replaced my King+Pawn evaluation with the Neural Network, I simply augmented it.

Ideally, the Network could entirely replace the King+Pawn efforts. I will say that you can probably use a larger size quite easily. I trained my Network on ~32M positions, but I have billions now if I cared to improve upon it. Your 700k dataset will likely be not super great if overtrained.

Adam optimization for these NNs (+NNUE), are, in my testing, far far far better than other optimizers. My custom trainer, at one time, had support for Adagrad, Adadelta, Momentum, pure SGD, a few others, and ADAM. ADAM won, by a _very_ large margin. Nothing compared. I used a batch size of 8192 to train my PK networks, and I now use batch sizes of 16384 for my NNUE efforts, due to optimization techniques.

tomitank · Post by **tomitank** » Fri Nov 27, 2020 9:26 am

Hi!

My PK networks were a bit strange in how you trained them. They would train the output, to add to a static eval, if a position, using a sigmoid loss function. I never actually replaced my King+Pawn evaluation with the Neural Network, I simply augmented it.

Yes i would train the output but only the difference. I think this is similar to your solution, but -->

Ideally, the Network could entirely replace the King+Pawn efforts.

Yes, you're right..If the first phase succeeds then I will try.

What I noticed in your solution was that you didn't add passed pawn base eval (wich is depend only for pawns) to pkeval. What is the reason for this?

Another thing is that you used RELU. How do you relate this to elo or cp? How do you enter a target value? (only if it's no secret)

RELU is much more efficient than Sigmoid if we update the network inputs.

AndrewGrant · Post by **AndrewGrant** » Tue Dec 01, 2020 6:10 pm

tomitank wrote: ↑Fri Nov 27, 2020 9:26 am What I noticed in your solution was that you didn't add passed pawn base eval (wich is depend only for pawns) to pkeval. What is the reason for this?

In Ethereal, ``pkeval`` refers to the evaluation of Pawns + Kings, without knowledge of any Rooks, Knights, Bishops, or Queens on the board. It is a function of only Pawns and Kings, and as a result can be placed into the Pawn+King Hash table. Passed pawns take into account other factors, like safety, ability to advance, etc, which include knowledge of other pieces.

tomitank wrote: ↑Fri Nov 27, 2020 9:26 am Another thing is that you used RELU. How do you relate this to elo or cp? How do you enter a target value? (only if it's no secret)
RELU is much more efficient than Sigmoid if we update the network inputs.

So from the "Texel tuning" days, quotes because my paper is quite removed from "Texel tuning", the way I map centipawns onto the WDL space is by a sigmoid of S = 1/(1+e^(-Kx)). In my case, I've found a K value near ~2.38 to map CP into WDL with the lowest MSE. Before I explain how I do the training, know that what I did is probably not at all optimal. It was the first and only attempt at it. It worked, but who is to say what the limits are.

So a given training sample contains a Static Evaluation Y, and a game result R {0.0, 0.5, 1.0}. The NN is trained to output two neurons at the end, MG and EG. To compute the loss, we mix the MG and EG to form a phased evaluation E. One could skip this step and simply output a single neuron.

Then, to compute the loss for that sample, we take [R - sigmoid(E+Y)]^2. This trains the Network to roughly output a score in Centipawns meant to adjust the initial evaluation towards the expected result.

tomitank · Post by **tomitank** » Tue Dec 01, 2020 7:42 pm

AndrewGrant wrote: ↑Tue Dec 01, 2020 6:10 pm In Ethereal, ``pkeval`` refers to the evaluation of Pawns + Kings, without knowledge of any Rooks, Knights, Bishops, or Queens on the board. It is a function of only Pawns and Kings, and as a result can be placed into the Pawn+King Hash table. Passed pawns take into account other factors, like safety, ability to advance, etc, which include knowledge of other pieces.

Oh, now I see:
https://github.com/AndyGrant/Ethereal/b ... te.c#L1034
I started from my own evaluation. I add a score which is based on only the rank of the passed passed.

AndrewGrant wrote: ↑Tue Dec 01, 2020 6:10 pm Then, to compute the loss for that sample, we take [R - sigmoid(E+Y)]^2. This trains the Network to roughly output a score in Centipawns meant to adjust the initial evaluation towards the expected result.

That is the key! I saw a similar one in the nodchip.
And if i good think, i can use RELU on the hidden layer(s).
The sigmoid goes to the output only, which will disappear when used.

Thanks!

For now, I'll try my own version, with more examples.
0.7M position It was enough to winning ~5 Elo.

Pawn King Neural Network

Pawn King Neural Network

Re: Pawn King Neural Network

Re: Pawn King Neural Network

Re: Pawn King Neural Network

Re: Pawn King Neural Network

Re: Pawn King Neural Network