A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Collingwood
Posts: 89
Joined: Sat Nov 09, 2019 3:24 pm
Full name: .

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by Collingwood »

Tony P. wrote: Mon Oct 05, 2020 11:44 pm One of the enigmas of CC is why people were spending crazy amounts on GPUs to train Leela nets instead of figuring out an NN architecture that would play superstrong on CPUs and save the users so much hardware cost that they'd even be better off paying for the engine
They do that because it's easy to throw crazy amounts of hardware at a problem and hard to do anything else.
Dokterchen
Posts: 133
Joined: Wed Aug 15, 2007 12:18 pm
Location: Munich

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by Dokterchen »

Gary Internet wrote: Tue Oct 06, 2020 7:47 am This is what Andrew is talking about. Nemorino 6, probably in the last couple of months has pretty much caught up to where Houdini 6.03 was.

Look at the head-to-head results from Stefan Pohl posted below.

It's also worth noting that Igel is probably around the same level of strength as Nemorino at the moment, or certainly catching up fast.

Code: Select all

14 Nemorino 6.00 avx2    : 3432 7000 (+1943,=3574,-1483), 53.3 %

Ethereal 12.62 avx2      : 1000 (+268,=589,-143), 56.3 %
Slow Chess 2.3 popc      : 1000 (+360,=512,-128), 61.6 %
Stockfish 12 200902      : 1000 (+  3,=412,-585), 20.9 %
Komodo 14 bmi2           : 1000 (+207,=569,-224), 49.1 %
Xiphos 0.6 bmi2          : 1000 (+432,=459,-109), 66.2 %
Fire 7.1 popc            : 1000 (+454,=470,- 76), 68.9 %
Houdini 6 pext           : 1000 (+219,=563,-218), 50.0 %
I love that! Nemorino and Igel are fantastic engines with a long history.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by maksimKorzh »

Milos wrote: Sat Oct 03, 2020 4:42 am
AndrewGrant wrote: Sat Oct 03, 2020 12:38 am I'm working again now -- for chess.com actually. Its a juggling act, one which I'm losing, since I can't do 60 hours of Ethereal and 40 hours of work and 20 hours of family and 40 hours of sleep. Something has to go, and I'm pretty keen on all 4.
Why not, a week after all has 168 hours, so still 8 extra hours left to do other stuff ;).
Jokes aside, when one is your age, one can actually afford to have a 60h/week hobby whether it's playing video games, hanging out with friends, gambling, smoking weed, or being a chess programmer geek. Once you get a bit older, priorities in life usually change.
What is kind of interesting with this computer chess hobby, that it's a hobby of mainly either quite young or quite old ppl (and there are a few in-between that are trying to earn some money out of it). ;)
What is kind of interesting with this computer chess hobby, that it's a hobby of mainly either quite young or quite old ppl (and there are a few in-between that are trying to earn some money out of it). ;)
OMG! I'm 32 and I was just revealed :lol:
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by maksimKorzh »

fabianVDW wrote: Mon Oct 05, 2020 4:50 pm
mvanthoor wrote: Mon Oct 05, 2020 4:08 pm
The solution is simple. Split the rating lists:

1. One for 'classic' engines without any neural networks.
2. One for 'hybrid' engines that use classic stuff but also some sort of neural network add-on.
3. One for 'full' neural network engines such as Leela.

Then everyone can compete in exactly the space they want; people who want to compete with classic engines with hand-crafted evaluation (with eval tuning as the only automated option) can do so without getting frustrated of seeing other engines pass their own just because they got a NNUE added; if they do, those engines go to the 2nd rating list. People who are into generating and researching different networks can compete with other networks on the third list.
Where is the line? A PSQT is an in NNUE fashion incrementally updated Perceptron. Or do we only start calling it NN when it has multiple layers? Or perhaps when the inputs are more sparse, like a KingxPiece ([64][64]) table(which I've recently added to FabChess)?
A PSQT is an in NNUE fashion incrementally updated Perceptron
I'm hearing this all the time. Could you please explain WHY this is so. I have difficulties with understanding because I don't understand what is "perceptron" and what does it mean "have multiple levels"? Could you please give an example of PSQT with multiple levels?

THANKS IN ADVANCE!
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by Harald »

I don't know enough of NNUE and my knowledge of neural nets is a little bit rusty also. But I try to explain it as I understand it.

A neuron in a net is a unit with some inputs that can be activated or not (0, 1) and an output that is also 0 or 1. It calculates its output as a function of the sum of weighted inputs. Typically this function is triggered slowly or abrubly at some threshold. The output than goes to neurons in the next layer. There may be an extra input with value 1 and there may be more complicated versions of this simple neuron.

A perceptron is a simple neural net with one input layer and one output layer of neurons and some weights between all possible in to out connections. For example it can be used for simple image processing of a bit raster of pixels and the output neurons indicate some image features or "recognise" something. It can calculate the logical AND function on bits or the OR function but it has its restrictions and cannot recognize the XOR function.

This is similar to the piece square tables seen as bitboard representations of pieces on a chessboard as input layer and the weights to the output layer are the PST values when you combine the evaluation score in the output neuron(s).

Modern and more advanced (deep) neural nets have some hidden layers between input and output layer with different sizes and topologies. There may be shortcuts or backward connections. Then there are lots of training methods that describe how random weights in the network are slowly trained with known data or other stimulations and how the initial errors are corrected backwards through the network layers.

For chess the input layer neurons may include pieces on squares and other information of the last few positions in the game. The output layer neurons may represent best scores or the best moves or search hints or whatever.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by mar »

I thought the perceptron is a binary classifier?
one can indeed view the "classical" evaluation function as a NN without hidden layers, where the input (eval fn features) maps to 2 output neurons based on game phase (opening, endgame), where each PSQT is represented with 64 inputs, so I wouldn't really tie it to psqts, the input can be mobility of individual pieces and so on
Martin Sedlak
Tony P.
Posts: 216
Joined: Sun Jan 22, 2017 8:30 pm
Location: Russia

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by Tony P. »

Harald wrote: Wed Oct 07, 2020 10:04 pm A neuron in a net is a unit with some inputs that can be activated or not (0, 1) and an output that is also 0 or 1.
That's a binary neuron, typically used on resource-constrained devices like mobile, where full-accuracy NNs consume too much memory. Most commonly used neurons use floating-point numbers as inputs and the output. There are also complex, quaternion and even octonion-valued neural networks, though they seem to rival real-valued ones on 3D and physics problems only.

A neural layer multiplies the vector of inputs by a matrix, a nonlinear 'activation function' is applied component-wise to the product, and the resulting vector is put into the next layer. The matrix used in the output layer usually has 1 row, so the product is a number (the linear combination of the input components) instead of a vector, or relatively few rows.

A common activation is the rectified linear unit. ReLU(x) == 0 if x<-b, == x+b if x>=-b (where b is a 'bias'; different components may use different biases). Its graph looks like _/

The PSQT is a linear regressor that's equivalent to one multiplication of a matrix by a sparse 'one-hot encoding' vector (the vector has 1s in the components corresponding to the piece-square occupancies that take place in the position and 0s in the other components). There's even no activation, or if you wish, the activation is the identity function. It can be called a (1-layer) neural network if the goal is to ride the AI hype train and tell a tale to customers or investors :mrgreen:
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by mar »

yes, the output of a neuron is determined by the activation function used
for sigmoid, this would be 0..1, for ReLU 0..n
a simple feedforward NN is nothing but a bunch of dot products (or matrix-vector multiplication if you want, where the matrix is the weights and vector is the output of the previous layer) plus a bias, which is then fed through the activation function
there's always m*n weights, where m is number of outputs (=neurons) of the previous layer and n is number of neurons in the current layer and there's n biases, so the whole network can be viewed as a simple 1d vector of weights (including biases) with a given topology
for sparse input a simple trick can be used, namely to collect all indices of all non-zero inputs and then do a reduced dot product times n => for 8x8 board this would be the same as iterating over a sparse bitset the way we already do: while (n) {idx=pop_bit(n);append(idx);}
Martin Sedlak
Tony P.
Posts: 216
Joined: Sun Jan 22, 2017 8:30 pm
Location: Russia

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by Tony P. »

Thanks for correcting my blunder: from the perspective of programming, it's indeed better to view biases as a constant vector added to the matrix-vector product, as then one can make use of SIMD if the activation function is the same for all the components (or if there are only a few different functions in a layer).

Another optimization for SIMD is the use of CNNs (convolutional NNs), where the rows of the matrices are mostly shifted versions of one another.
Last edited by Tony P. on Wed Oct 07, 2020 11:07 pm, edited 1 time in total.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: A Crossroad in Computer Chess; Or Desperate Flailing for Relevance

Post by mar »

as for incremental updates, the idea is also simple:
if we remember a previous dot product a1*b1+a2*b2+a3*b2+... and we want to set a1 to 0, we simply subtract a1(old value)*b1
similarly, we can do the same for say a5 going from 0 to n by adding n*b5
of course, this only holds for the first layer, but it typically contains the vast majority of the weights anyway
Martin Sedlak