Tensorflow NNUE training

Daniel Shawul · Post by **Daniel Shawul** » Wed Nov 11, 2020 12:57 am

Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp

castlehaven · Post by **castlehaven** » Thu Nov 12, 2020 6:28 am

I have read some of the tensorflow vs. pytorch debates but come out still feeling confused. Is there any reason to prefer one of these environments to the other for NNUE development? Or is it 99% driven by whichever one you are most familiar with? I ask because I am looking to starting from scratch, happily unburdened by any prior knowledge

, but don’t know which one to learn.

maksimKorzh · Post by **maksimKorzh** » Fri Nov 13, 2020 6:55 am

Daniel Shawul wrote: ↑Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp

Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:

Code: Select all

import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))

And the output should be say 55
I was trying something like:

Code: Select all

def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))

but it adjust weights only ones and gives an output like this:

Code: Select all

[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]

I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.

Henk · Post by **Henk** » Fri Nov 13, 2020 8:19 pm

maksimKorzh wrote: ↑Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: ↑Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:
Code: Select all
import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:
Code: Select all
def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:
Code: Select all
[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.

Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.

maksimKorzh · Post by **maksimKorzh** » Fri Nov 13, 2020 9:04 pm

Henk wrote: ↑Fri Nov 13, 2020 8:19 pm
maksimKorzh wrote: ↑Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: ↑Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:
Code: Select all
import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:
Code: Select all
def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:
Code: Select all
[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.

XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".

Henk · Post by **Henk** » Fri Nov 13, 2020 9:28 pm

maksimKorzh wrote: ↑Fri Nov 13, 2020 9:04 pm
Henk wrote: ↑Fri Nov 13, 2020 8:19 pm
maksimKorzh wrote: ↑Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: ↑Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:
Code: Select all
import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:
Code: Select all
def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:
Code: Select all
[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".

Maybe something to do with normalizing (initial) weights, mini batches, or using right activation function.

O wait start with using smaller learning parameters. Sorry I was busy with this stuff two or three years ago but looks like I've forgotten almost all.
I remember I managed to make it learn 1000-10000 training examples. That's all. Maybe I will lookup my source code and the youtube video's which I used. But looks like I am not so interested to repeat it again. I used these stanford university youtube video's if I am right.

maksimKorzh · Post by **maksimKorzh** » Fri Nov 13, 2020 10:09 pm

Henk wrote: ↑Fri Nov 13, 2020 9:28 pm
maksimKorzh wrote: ↑Fri Nov 13, 2020 9:04 pm
Henk wrote: ↑Fri Nov 13, 2020 8:19 pm
maksimKorzh wrote: ↑Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: ↑Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:
Code: Select all
import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:
Code: Select all
def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:
Code: Select all
[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Maybe something to do with normalizing (initial) weights, mini batches, or using right activation function.

O wait start with using smaller learning parameters. Sorry I was busy with this stuff two or three years ago but looks like I've forgotten almost all.
I remember I managed to make it learn 1000-10000 training examples. That's all. Maybe I will lookup my source code and the youtube video's which I used. But looks like I am not so interested to repeat it again. I used these stanford university youtube video's if I am right.

Please share your sources if find any.

Henk · Post by **Henk** » Fri Nov 13, 2020 10:24 pm

maksimKorzh wrote: ↑Fri Nov 13, 2020 10:09 pm
Henk wrote: ↑Fri Nov 13, 2020 9:28 pm
maksimKorzh wrote: ↑Fri Nov 13, 2020 9:04 pm
Henk wrote: ↑Fri Nov 13, 2020 8:19 pm
maksimKorzh wrote: ↑Fri Nov 13, 2020 6:55 am
Daniel Shawul wrote: ↑Wed Nov 11, 2020 12:57 am Mirroring Gary's post, here is my announcement of NNUE training code using tensorflow for whatever it is worth.
The training is done with the existing training code I have for training regular ResNet's.
The input to NNUE is 384 (32x12) channels of 8x8 boards. Note that I consider vertical symmetry of king, so only 32 squares for king,
and I also have 12 pieces including both kings, instead of the 10 pieces SF-NNUE uses.

First thing first, tensorflow c++ for inference is darn slow with such a tiny net. This is mainly due to overhead of tensorflow per call of about 20ms.
My hand-wriitten inference code is 300x faster with AVX2 and INT8 quantization. FP32 is about 2x slower than INT8.
Quantization is done post-training i.e. weights are saved with FP32 and a constant scale factor of 64 is used for all weights.
It maybe better to do dynamic calibration with a dataset -- for example i do this for ResNet's for example.

Training:
https://github.com/dshawul/nn-train/blo ... rc/nnue.py

Inference:
https://github.com/dshawul/nncpu-probe/ ... /nncpu.cpp
Hi Daniel, I'm trying to make one simple proof-of-concept test:
1. Convert board to input matrix
2. Predict eval score using single perceptron model with no hidden layers (I understand the linear separability limitation)

Here's the code where I try to convert board position to a matrix:
Code: Select all
import chess
import numpy as np

board = chess.Board()
board_matrix = []

piece_vectors = {
    'None': [0, 0, 0, 0, 0, 0, 0],
    'P': [1, 0, 0, 0, 0, 0, 0],
    'N': [0, 1, 0, 0, 0, 0, 0],
    'B': [0, 0, 1, 0, 0, 0, 0],
    'R': [0, 0, 0, 1, 0, 0, 0],
    'Q': [0, 0, 0, 0, 1, 0, 0],
    'K': [0, 0, 0, 0, 0, 1, 0],
    'p': [1, 0, 0, 0, 0, 0, 1],
    'n': [0, 1, 0, 0, 0, 0, 1],
    'b': [0, 0, 1, 0, 0, 0, 1],
    'r': [0, 0, 0, 1, 0, 0, 1],
    'q': [0, 0, 0, 0, 1, 0, 1],
    'k': [0, 0, 0, 0, 0, 1, 1]
}

for row in range(8):
    row_vectors = []
    for col in range(8):
        square = row * 8 + col
        piece = str(board.piece_at(square))

        for value in piece_vectors[piece]:
            row_vectors.append(value)
    
    print(len(row_vectors))
    board_matrix.append(row_vectors)

board_matrix = np.array(board_matrix)

weights = np.random.uniform(-1, 1, size=(56, 8))
And the output should be say 55
I was trying something like:
Code: Select all
def sig(x):
    return 1 / (1 + np.exp(-x))

def deriv(x):
    return x * (1 - x)


for i in range(10000):
    out = sig(np.dot(board_matrix, weights))
    error = 55 - out
    weights += np.dot(board_matrix.T, error * deriv(out))
but it adjust weights only ones and gives an output like this:
Code: Select all
[[1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1.  1.  1. ]
 [1.  1.  1.  1.  1.  1.  1.  1. ]]
I found NN classification tutorials, but this seems to be a regression problem, but I couldn't find any tutorials on somewhat similar to this issue.
Could you please kindly explain how can I train a single layer perceptron model using board matrix as input and score as output?

P.S. My board to matrix transformation is most likely horribly wrong, could you please show the proper way of transforming board into matrix as well?
I'm not trying to make something decent from the chess strength perspective, but just the simplest thing possible.
I feel desperate, lost, confused and stuck at dead point due to complete dumbness, please help.
Being so stupid to react on this. If you have source code then implement an x-or or something far more simple then what you are using now. Step
through the code and see what happens does it change all weights more then once. Calculate an example by hand etc. So you can check all weights are similar/equal to what you expected.

If you don't have source code I would quit. I wrote each statement myself. If you do the same then you have the source code and you know exactly what it does. Otherwise you need to read tutorial very carefully and hopefully it contains a simple example you can reproduce.
XOR is probably the only stuff I can reproduce/understand but when it comes to somewhat more complicated I feel lost and stop understanding what's going on. I wish I could have an existent example but all examples use deep learning. I can't find a simple code that would do what I need - that's the whole problem. Everyone around is too smart... I'm very close to the decision on dropping this NN stuff forever and never come back - I'm trying/learning for 3 weeks now - read theory, tried example codes but when it comes to chess - I'm doomed. Probably this NN stuff is just for much smarter people than I. I hate topics "everyone understands and discusses" but can't explain to "five year old kid".
Maybe something to do with normalizing (initial) weights, mini batches, or using right activation function.

O wait start with using smaller learning parameters. Sorry I was busy with this stuff two or three years ago but looks like I've forgotten almost all.
I remember I managed to make it learn 1000-10000 training examples. That's all. Maybe I will lookup my source code and the youtube video's which I used. But looks like I am not so interested to repeat it again. I used these stanford university youtube video's if I am right.
Please share your sources if find any.

If you study these stanford university youtube video's about neural networks you know all I could add.

Tensorflow NNUE training

Tensorflow NNUE training

Re: Tensorflow NNUE training

Re: Tensorflow NNUE training

Re: Tensorflow NNUE training

Re: Tensorflow NNUE training

Re: Tensorflow NNUE training

Re: Tensorflow NNUE training

Re: Tensorflow NNUE training