Tensorflow NNUE training

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Tensorflow NNUE training

Post by Daniel Shawul »

Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
castlehaven
Posts: 5
Joined: Thu Jun 18, 2020 9:22 pm
Full name: Andrew Metrick

Re: Tensorflow NNUE training

Post by castlehaven »

I have read some of the tensorflow vs. pytorch debates but come out still feeling confused. Is there any reason to prefer one of these environments to the other for NNUE development? Or is it 99% driven by whichever one you are most familiar with? I ask because I am looking to starting from scratch, happily unburdened by any prior knowledge :-), but don’t know which one to learn.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: Tensorflow NNUE training

Post by maksimKorzh »

Daniel Shawul wrote: Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Tensorflow NNUE training

Post by Henk »

maksimKorzh wrote: Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: Tensorflow NNUE training

Post by maksimKorzh »

Henk wrote: Fri Nov 13, 2020 8:19 pm
maksimKorzh wrote: Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Tensorflow NNUE training

Post by Henk »

maksimKorzh wrote: Fri Nov 13, 2020 9:04 pm
Henk wrote: Fri Nov 13, 2020 8:19 pm
maksimKorzh wrote: Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Maybe something to do with normalizing (initial) weights, mini batches, or using right activation function.

O wait start with using smaller learning parameters. Sorry I was busy with this stuff two or three years ago but looks like I've forgotten almost all.
I remember I managed to make it learn 1000-10000 training examples. That's all. Maybe I will lookup my source code and the youtube video's which I used. But looks like I am not so interested to repeat it again. I used these stanford university youtube video's if I am right.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: Tensorflow NNUE training

Post by maksimKorzh »

Henk wrote: Fri Nov 13, 2020 9:28 pm
maksimKorzh wrote: Fri Nov 13, 2020 9:04 pm
Henk wrote: Fri Nov 13, 2020 8:19 pm
maksimKorzh wrote: Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Maybe something to do with normalizing (initial) weights, mini batches, or using right activation function.

O wait start with using smaller learning parameters. Sorry I was busy with this stuff two or three years ago but looks like I've forgotten almost all.
I remember I managed to make it learn 1000-10000 training examples. That's all. Maybe I will lookup my source code and the youtube video's which I used. But looks like I am not so interested to repeat it again. I used these stanford university youtube video's if I am right.
Please share your sources if find any.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Tensorflow NNUE training

Post by Henk »

maksimKorzh wrote: Fri Nov 13, 2020 10:09 pm
Henk wrote: Fri Nov 13, 2020 9:28 pm
maksimKorzh wrote: Fri Nov 13, 2020 9:04 pm
Henk wrote: Fri Nov 13, 2020 8:19 pm
maksimKorzh wrote: Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Maybe something to do with normalizing (initial) weights, mini batches, or using right activation function.

O wait start with using smaller learning parameters. Sorry I was busy with this stuff two or three years ago but looks like I've forgotten almost all.
I remember I managed to make it learn 1000-10000 training examples. That's all. Maybe I will lookup my source code and the youtube video's which I used. But looks like I am not so interested to repeat it again. I used these stanford university youtube video's if I am right.
Please share your sources if find any.
If you study these stanford university youtube video's about neural networks you know all I could add.