Tensorflow NNUE training

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
Daniel Shawul
Posts: 4101
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Tensorflow NNUE training

Post by Daniel Shawul » Tue Nov 10, 2020 11:57 pm

Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp

castlehaven
Posts: 5
Joined: Thu Jun 18, 2020 7:22 pm
Full name: Andrew Metrick

Re: Tensorflow NNUE training

Post by castlehaven » Thu Nov 12, 2020 5:28 am

I have read some of the tensorflow vs. pytorch debates but come out still feeling confused. Is there any reason to prefer one of these environments to the other for NNUE development? Or is it 99% driven by whichever one you are most familiar with? I ask because I am looking to starting from scratch, happily unburdened by any prior knowledge :-), but don’t know which one to learn.

User avatar
maksimKorzh
Posts: 630
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Tensorflow NNUE training

Post by maksimKorzh » Fri Nov 13, 2020 5:55 am

Daniel Shawul wrote:
Tue Nov 10, 2020 11:57 pm
Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Wukong Xiangqi (Chinese chess engine + apps to embed into 3rd party websites):
https://github.com/maksimKorzh/wukong-xiangqi

Chess programming YouTube channel:
https://www.youtube.com/channel/UCB9-pr ... KKqDgXhsMQ

Henk
Posts: 6768
Joined: Mon May 27, 2013 8:31 am

Re: Tensorflow NNUE training

Post by Henk » Fri Nov 13, 2020 7:19 pm

maksimKorzh wrote:
Fri Nov 13, 2020 5:55 am
Daniel Shawul wrote:
Tue Nov 10, 2020 11:57 pm
Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.

User avatar
maksimKorzh
Posts: 630
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Tensorflow NNUE training

Post by maksimKorzh » Fri Nov 13, 2020 8:04 pm

Henk wrote:
Fri Nov 13, 2020 7:19 pm
maksimKorzh wrote:
Fri Nov 13, 2020 5:55 am
Daniel Shawul wrote:
Tue Nov 10, 2020 11:57 pm
Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Wukong Xiangqi (Chinese chess engine + apps to embed into 3rd party websites):
https://github.com/maksimKorzh/wukong-xiangqi

Chess programming YouTube channel:
https://www.youtube.com/channel/UCB9-pr ... KKqDgXhsMQ

Henk
Posts: 6768
Joined: Mon May 27, 2013 8:31 am

Re: Tensorflow NNUE training

Post by Henk » Fri Nov 13, 2020 8:28 pm

maksimKorzh wrote:
Fri Nov 13, 2020 8:04 pm
Henk wrote:
Fri Nov 13, 2020 7:19 pm
maksimKorzh wrote:
Fri Nov 13, 2020 5:55 am
Daniel Shawul wrote:
Tue Nov 10, 2020 11:57 pm
Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Maybe something to do with normalizing (initial) weights, mini batches, or using right activation function.

O wait start with using smaller learning parameters. Sorry I was busy with this stuff two or three years ago but looks like I've forgotten almost all.
I remember I managed to make it learn 1000-10000 training examples. That's all. Maybe I will lookup my source code and the youtube video's which I used. But looks like I am not so interested to repeat it again. I used these stanford university youtube video's if I am right.

User avatar
maksimKorzh
Posts: 630
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Tensorflow NNUE training

Post by maksimKorzh » Fri Nov 13, 2020 9:09 pm

Henk wrote:
Fri Nov 13, 2020 8:28 pm
maksimKorzh wrote:
Fri Nov 13, 2020 8:04 pm
Henk wrote:
Fri Nov 13, 2020 7:19 pm
maksimKorzh wrote:
Fri Nov 13, 2020 5:55 am
Daniel Shawul wrote:
Tue Nov 10, 2020 11:57 pm
Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Maybe something to do with normalizing (initial) weights, mini batches, or using right activation function.

O wait start with using smaller learning parameters. Sorry I was busy with this stuff two or three years ago but looks like I've forgotten almost all.
I remember I managed to make it learn 1000-10000 training examples. That's all. Maybe I will lookup my source code and the youtube video's which I used. But looks like I am not so interested to repeat it again. I used these stanford university youtube video's if I am right.
Please share your sources if find any.
Wukong Xiangqi (Chinese chess engine + apps to embed into 3rd party websites):
https://github.com/maksimKorzh/wukong-xiangqi

Chess programming YouTube channel:
https://www.youtube.com/channel/UCB9-pr ... KKqDgXhsMQ

Henk
Posts: 6768
Joined: Mon May 27, 2013 8:31 am

Re: Tensorflow NNUE training

Post by Henk » Fri Nov 13, 2020 9:24 pm

maksimKorzh wrote:
Fri Nov 13, 2020 9:09 pm
Henk wrote:
Fri Nov 13, 2020 8:28 pm
maksimKorzh wrote:
Fri Nov 13, 2020 8:04 pm
Henk wrote:
Fri Nov 13, 2020 7:19 pm
maksimKorzh wrote:
Fri Nov 13, 2020 5:55 am
Daniel Shawul wrote:
Tue Nov 10, 2020 11:57 pm
Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Maybe something to do with normalizing (initial) weights, mini batches, or using right activation function.

O wait start with using smaller learning parameters. Sorry I was busy with this stuff two or three years ago but looks like I've forgotten almost all.
I remember I managed to make it learn 1000-10000 training examples. That's all. Maybe I will lookup my source code and the youtube video's which I used. But looks like I am not so interested to repeat it again. I used these stanford university youtube video's if I am right.
Please share your sources if find any.
If you study these stanford university youtube video's about neural networks you know all I could add.

Post Reply