They do that because it's easy to throw crazy amounts of hardware at a problem and hard to do anything else.Tony P. wrote: ↑Mon Oct 05, 2020 9:44 pmOne of the enigmas of CC is why people were spending crazy amounts on GPUs to train Leela nets instead of figuring out an NN architecture that would play superstrong on CPUs and save the users so much hardware cost that they'd even be better off paying for the engine
A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 57
 Joined: Sat Nov 09, 2019 2:24 pm
 Full name: .
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

 Posts: 117
 Joined: Wed Aug 15, 2007 10:18 am
 Location: Munich
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
I love that! Nemorino and Igel are fantastic engines with a long history.Gary Internet wrote: ↑Tue Oct 06, 2020 5:47 amThis is what Andrew is talking about. Nemorino 6, probably in the last couple of months has pretty much caught up to where Houdini 6.03 was.
Look at the headtohead results from Stefan Pohl posted below.
It's also worth noting that Igel is probably around the same level of strength as Nemorino at the moment, or certainly catching up fast.
Code: Select all
14 Nemorino 6.00 avx2 : 3432 7000 (+1943,=3574,1483), 53.3 % Ethereal 12.62 avx2 : 1000 (+268,=589,143), 56.3 % Slow Chess 2.3 popc : 1000 (+360,=512,128), 61.6 % Stockfish 12 200902 : 1000 (+ 3,=412,585), 20.9 % Komodo 14 bmi2 : 1000 (+207,=569,224), 49.1 % Xiphos 0.6 bmi2 : 1000 (+432,=459,109), 66.2 % Fire 7.1 popc : 1000 (+454,=470, 76), 68.9 % Houdini 6 pext : 1000 (+219,=563,218), 50.0 %
Torsten
 maksimKorzh
 Posts: 628
 Joined: Sat Sep 08, 2018 3:37 pm
 Location: Ukraine
 Full name: Maksim Korzh
 Contact:
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
Milos wrote: ↑Sat Oct 03, 2020 2:42 amWhy not, a week after all has 168 hours, so still 8 extra hours left to do other stuff .AndrewGrant wrote: ↑Fri Oct 02, 2020 10:38 pmI'm working again now  for chess.com actually. Its a juggling act, one which I'm losing, since I can't do 60 hours of Ethereal and 40 hours of work and 20 hours of family and 40 hours of sleep. Something has to go, and I'm pretty keen on all 4.
Jokes aside, when one is your age, one can actually afford to have a 60h/week hobby whether it's playing video games, hanging out with friends, gambling, smoking weed, or being a chess programmer geek. Once you get a bit older, priorities in life usually change.
What is kind of interesting with this computer chess hobby, that it's a hobby of mainly either quite young or quite old ppl (and there are a few inbetween that are trying to earn some money out of it).
OMG! I'm 32 and I was just revealedWhat is kind of interesting with this computer chess hobby, that it's a hobby of mainly either quite young or quite old ppl (and there are a few inbetween that are trying to earn some money out of it).
Wukong Xiangqi (Chinese chess engine + apps to embed into 3rd party websites):
https://github.com/maksimKorzh/wukongxiangqi
Chess programming YouTube channel:
https://www.youtube.com/channel/UCB9pr ... KKqDgXhsMQ
https://github.com/maksimKorzh/wukongxiangqi
Chess programming YouTube channel:
https://www.youtube.com/channel/UCB9pr ... KKqDgXhsMQ
 maksimKorzh
 Posts: 628
 Joined: Sat Sep 08, 2018 3:37 pm
 Location: Ukraine
 Full name: Maksim Korzh
 Contact:
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
fabianVDW wrote: ↑Mon Oct 05, 2020 2:50 pmWhere is the line? A PSQT is an in NNUE fashion incrementally updated Perceptron. Or do we only start calling it NN when it has multiple layers? Or perhaps when the inputs are more sparse, like a KingxPiece ([64][64]) table(which I've recently added to FabChess)?mvanthoor wrote: ↑Mon Oct 05, 2020 2:08 pm
The solution is simple. Split the rating lists:
1. One for 'classic' engines without any neural networks.
2. One for 'hybrid' engines that use classic stuff but also some sort of neural network addon.
3. One for 'full' neural network engines such as Leela.
Then everyone can compete in exactly the space they want; people who want to compete with classic engines with handcrafted evaluation (with eval tuning as the only automated option) can do so without getting frustrated of seeing other engines pass their own just because they got a NNUE added; if they do, those engines go to the 2nd rating list. People who are into generating and researching different networks can compete with other networks on the third list.
I'm hearing this all the time. Could you please explain WHY this is so. I have difficulties with understanding because I don't understand what is "perceptron" and what does it mean "have multiple levels"? Could you please give an example of PSQT with multiple levels?A PSQT is an in NNUE fashion incrementally updated Perceptron
THANKS IN ADVANCE!
Wukong Xiangqi (Chinese chess engine + apps to embed into 3rd party websites):
https://github.com/maksimKorzh/wukongxiangqi
Chess programming YouTube channel:
https://www.youtube.com/channel/UCB9pr ... KKqDgXhsMQ
https://github.com/maksimKorzh/wukongxiangqi
Chess programming YouTube channel:
https://www.youtube.com/channel/UCB9pr ... KKqDgXhsMQ
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
I don't know enough of NNUE and my knowledge of neural nets is a little bit rusty also. But I try to explain it as I understand it.
A neuron in a net is a unit with some inputs that can be activated or not (0, 1) and an output that is also 0 or 1. It calculates its output as a function of the sum of weighted inputs. Typically this function is triggered slowly or abrubly at some threshold. The output than goes to neurons in the next layer. There may be an extra input with value 1 and there may be more complicated versions of this simple neuron.
A perceptron is a simple neural net with one input layer and one output layer of neurons and some weights between all possible in to out connections. For example it can be used for simple image processing of a bit raster of pixels and the output neurons indicate some image features or "recognise" something. It can calculate the logical AND function on bits or the OR function but it has its restrictions and cannot recognize the XOR function.
This is similar to the piece square tables seen as bitboard representations of pieces on a chessboard as input layer and the weights to the output layer are the PST values when you combine the evaluation score in the output neuron(s).
Modern and more advanced (deep) neural nets have some hidden layers between input and output layer with different sizes and topologies. There may be shortcuts or backward connections. Then there are lots of training methods that describe how random weights in the network are slowly trained with known data or other stimulations and how the initial errors are corrected backwards through the network layers.
For chess the input layer neurons may include pieces on squares and other information of the last few positions in the game. The output layer neurons may represent best scores or the best moves or search hints or whatever.
A neuron in a net is a unit with some inputs that can be activated or not (0, 1) and an output that is also 0 or 1. It calculates its output as a function of the sum of weighted inputs. Typically this function is triggered slowly or abrubly at some threshold. The output than goes to neurons in the next layer. There may be an extra input with value 1 and there may be more complicated versions of this simple neuron.
A perceptron is a simple neural net with one input layer and one output layer of neurons and some weights between all possible in to out connections. For example it can be used for simple image processing of a bit raster of pixels and the output neurons indicate some image features or "recognise" something. It can calculate the logical AND function on bits or the OR function but it has its restrictions and cannot recognize the XOR function.
This is similar to the piece square tables seen as bitboard representations of pieces on a chessboard as input layer and the weights to the output layer are the PST values when you combine the evaluation score in the output neuron(s).
Modern and more advanced (deep) neural nets have some hidden layers between input and output layer with different sizes and topologies. There may be shortcuts or backward connections. Then there are lots of training methods that describe how random weights in the network are slowly trained with known data or other stimulations and how the initial errors are corrected backwards through the network layers.
For chess the input layer neurons may include pieces on squares and other information of the last few positions in the game. The output layer neurons may represent best scores or the best moves or search hints or whatever.
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
I thought the perceptron is a binary classifier?
one can indeed view the "classical" evaluation function as a NN without hidden layers, where the input (eval fn features) maps to 2 output neurons based on game phase (opening, endgame), where each PSQT is represented with 64 inputs, so I wouldn't really tie it to psqts, the input can be mobility of individual pieces and so on
one can indeed view the "classical" evaluation function as a NN without hidden layers, where the input (eval fn features) maps to 2 output neurons based on game phase (opening, endgame), where each PSQT is represented with 64 inputs, so I wouldn't really tie it to psqts, the input can be mobility of individual pieces and so on
Martin Sedlak
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
That's a binary neuron, typically used on resourceconstrained devices like mobile, where fullaccuracy NNs consume too much memory. Most commonly used neurons use floatingpoint numbers as inputs and the output. There are also complex, quaternion and even octonionvalued neural networks, though they seem to rival realvalued ones on 3D and physics problems only.
A neural layer multiplies the vector of inputs by a matrix, a nonlinear 'activation function' is applied componentwise to the product, and the resulting vector is put into the next layer. The matrix used in the output layer usually has 1 row, so the product is a number (the linear combination of the input components) instead of a vector, or relatively few rows.
A common activation is the rectified linear unit. ReLU(x) == 0 if x<b, == x+b if x>=b (where b is a 'bias'; different components may use different biases). Its graph looks like _/
The PSQT is a linear regressor that's equivalent to one multiplication of a matrix by a sparse 'onehot encoding' vector (the vector has 1s in the components corresponding to the piecesquare occupancies that take place in the position and 0s in the other components). There's even no activation, or if you wish, the activation is the identity function. It can be called a (1layer) neural network if the goal is to ride the AI hype train and tell a tale to customers or investors
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
yes, the output of a neuron is determined by the activation function used
for sigmoid, this would be 0..1, for ReLU 0..n
a simple feedforward NN is nothing but a bunch of dot products (or matrixvector multiplication if you want, where the matrix is the weights and vector is the output of the previous layer) plus a bias, which is then fed through the activation function
there's always m*n weights, where m is number of outputs (=neurons) of the previous layer and n is number of neurons in the current layer and there's n biases, so the whole network can be viewed as a simple 1d vector of weights (including biases) with a given topology
for sparse input a simple trick can be used, namely to collect all indices of all nonzero inputs and then do a reduced dot product times n => for 8x8 board this would be the same as iterating over a sparse bitset the way we already do: while (n) {idx=pop_bit(n);append(idx);}
for sigmoid, this would be 0..1, for ReLU 0..n
a simple feedforward NN is nothing but a bunch of dot products (or matrixvector multiplication if you want, where the matrix is the weights and vector is the output of the previous layer) plus a bias, which is then fed through the activation function
there's always m*n weights, where m is number of outputs (=neurons) of the previous layer and n is number of neurons in the current layer and there's n biases, so the whole network can be viewed as a simple 1d vector of weights (including biases) with a given topology
for sparse input a simple trick can be used, namely to collect all indices of all nonzero inputs and then do a reduced dot product times n => for 8x8 board this would be the same as iterating over a sparse bitset the way we already do: while (n) {idx=pop_bit(n);append(idx);}
Martin Sedlak
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
Thanks for correcting my blunder: from the perspective of programming, it's indeed better to view biases as a constant vector added to the matrixvector product, as then one can make use of SIMD if the activation function is the same for all the components (or if there are only a few different functions in a layer).
Another optimization for SIMD is the use of CNNs (convolutional NNs), where the rows of the matrices are mostly shifted versions of one another.
Another optimization for SIMD is the use of CNNs (convolutional NNs), where the rows of the matrices are mostly shifted versions of one another.
Last edited by Tony P. on Wed Oct 07, 2020 9:07 pm, edited 1 time in total.
Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance
as for incremental updates, the idea is also simple:
if we remember a previous dot product a1*b1+a2*b2+a3*b2+... and we want to set a1 to 0, we simply subtract a1(old value)*b1
similarly, we can do the same for say a5 going from 0 to n by adding n*b5
of course, this only holds for the first layer, but it typically contains the vast majority of the weights anyway
if we remember a previous dot product a1*b1+a2*b2+a3*b2+... and we want to set a1 to 0, we simply subtract a1(old value)*b1
similarly, we can do the same for say a5 going from 0 to n by adding n*b5
of course, this only holds for the first layer, but it typically contains the vast majority of the weights anyway
Martin Sedlak