Leela is more Chess-domain specific than regular Chess engines

AdminX · Post by **AdminX** » Fri Aug 10, 2018 10:46 am

Gian-Carlo Pascutto wrote: ↑Fri Aug 10, 2018 10:33 am
Branko Radovanovic wrote: ↑Thu Aug 09, 2018 11:46 pm Looking at TCEC Div4 games, what is striking to me is not that NN eval is simply "too optimistic", as many have noted - it's that it often rises significantly as the game progresses, without a meaningful improvement of the position.
This has more to do with the temperature used in training than anything, I believe. It is trained to expect opponent blunders.

Having watched many games, I am not sure if it is bad or not. She wins many games she shouldn't have, because she's better at judging when the opponent's position is harder to play.

That seems to make sense, as I was wondering why DeusX evals seemed so high in certain positions relative to LC0. Given that it (DeusX) was trained on Human games that would answer alot.

Laskos · Post by **Laskos** » Fri Aug 10, 2018 11:39 am

Michel wrote: ↑Fri Aug 10, 2018 5:49 am It's a bit disappointing but not unexpected.

http://nautil.us/blog/this-neural-net-h ... ates-sheep

If a NN is confronted with a type of data different from what it is trained on, it will fail spectacularly.

Yes, a bit disappointed.

Thanks for the link!

With image recognition the people have shown that deep learning can be easily fooled. Scramble a relative handful of pixels, and the technology can mistake a frog for a rifle or a parking sign for a washing machine. That's similar to "single pixel attack" described by Branko, and is probably inherent to DNNs.

It seems that the patterns extracted by deep learning are more superficial than they initially appear. That scary multi-layered black-box built with a humongous amount of data might seem to bring abstraction and inference, while it seems to be more like an expert system, or a classifier book with variations upon it. Deep learning results seem to be confined narrowly to fields where those huge data sets are available and the tasks are well defined, like labeling images or making moves in Chess. They are even narrower than hand-crafted inputted knowledge of the specific problem, shown here with Chess. That is, meaning, inference and common-sense or holistic knowledge is completely eluding these black boxes. There was much enthusiasm about deep learning software beating medical diagnosis performed by humans, but again, it was very specific in its scope, and had little to do with how a good general practitioner holistically assesses someone's health. And for the time being the job of "good general practitioner" is safe.

Maybe it's still better for my personal fun to fool around in Prolog (I managed to install my old Borland Turbo Prolog from my childhood in DosBox). It was designed for the reasoning and knowledge representation side of AI, which processes facts and concepts, and tries to complete tasks that are not always well defined. Well, it's not very romantic either, as it is used mostly for expert systems too, but at least I remember I had some fun with its abstractions and sometimes unexpected inferences and generalizations.

corres · Post by **corres** » Fri Aug 10, 2018 1:11 pm

If we want a perfect chess engine we need a 32 figures database.
A 32 figures database can not be mapped onto even a 20X256 NN.
So it is obvious no matter how much learning game Leela plays there are always a lot of hole in its evaluation.
The question is how big NN needs to map perfectly a 32 figures database - theoretically and (mainly) technically.

Milos · Post by **Milos** » Fri Aug 10, 2018 2:18 pm

corres wrote: ↑Fri Aug 10, 2018 1:11 pm If we want a perfect chess engine we need a 32 figures database.
A 32 figures database can not be mapped onto even a 20X256 NN.
So it is obvious no matter how much learning game Leela plays there are always a lot of hole in its evaluation.
The question is how big NN needs to map perfectly a 32 figures database - theoretically and (mainly) technically.

A perfect 32men TB would have WDL or even better DTM in every possible position in chess. There are no shortcuts to that if you want a perfect solution.
And it requires far more storage space than there are atoms in universe. Actually, more than there are quarks in universe.

corres · Post by **corres** » Fri Aug 10, 2018 3:04 pm

Milos wrote: ↑Fri Aug 10, 2018 2:18 pm
corres wrote: ↑Fri Aug 10, 2018 1:11 pm If we want a perfect chess engine we need a 32 figures database.
A 32 figures database can not be mapped onto even a 20X256 NN.
So it is obvious no matter how much learning game Leela plays there are always a lot of hole in its evaluation.
The question is how big NN needs to map perfectly a 32 figures database - theoretically and (mainly) technically.
A perfect 32men TB would have WDL or even better DTM in every possible position in chess. There are no shortcuts to that if you want a perfect solution.
And it requires far more storage space than there are atoms in universe. Actually, more than there are quarks in universe.

It is not an accident that team of Alpha0 experimentally get the result the 20x256 NN seems to usable for a strong engine. MC technique help to patch the gaps but if there are too much hole MC can not help enough.

Branko Radovanovic · Post by **Branko Radovanovic** » Fri Aug 10, 2018 4:35 pm

Gian-Carlo Pascutto wrote: ↑Fri Aug 10, 2018 10:34 am
Branko Radovanovic wrote: ↑Thu Aug 09, 2018 11:46 pm Also, your conclusion seems to confirm my feeling that Leela would suffer greatly in Chess960. Obviously, its opening-phase advantage over other engines would be nullified, but it's probably much more than that.
If you train it to Chess960 too, it'd work fine.

If I write a classical engine and make it assume castling goes to c1 and g1, it's going to have some problems in Chess960 too.

Of course, I'm sure that - if Leela was specifically trained for Chess960 - things would be quite different and conventional engines would suffer against her since they are not tuned for Chess960. (Whoever wants to beat Stockfish with a NN - Chess960 may be the way to go!) However, that's precisely Kai's point: the "knowledge transfer" (i.e. the ability to apply it to a different domain) seems to be lower for current NN engines. Surely there will be new approaches in the future, after all this is only the first generation of mainstream chess-playing NNs.

oreopoulos · Post by **oreopoulos** » Fri Aug 10, 2018 4:43 pm

This is why deep residual neural networks perform better. They combine many convolution filters.

They require better hardware to run on though.

Branko Radovanovic · Post by **Branko Radovanovic** » Fri Aug 10, 2018 8:34 pm

oreopoulos wrote: ↑Fri Aug 10, 2018 4:43 pm This is why deep residual neural networks perform better. They combine many convolution filters.

They require better hardware to run on though.

Yes - the problem with Leela is that it relies on an NN architecture which was used primarily for image recognition (where it performed brilliantly) and was re-purposed to play chess. (I guess Google worked with what they had, and that was experience with the ImageNet competition, for example.) Image recognition is classification, not evaluation, and - aside from that - there are two major differences. In image recognition, small changes (a couple of pixels) do not (or should not, at any rate) affect the classification, whereas in chess this is not the case. The other is that image recognition works well without "connecting" pixels in the opposite sides of the image, while chess evaluation is non-local in that aspect, hence these problems with the 3x3 convolution.

This is why other approaches may be more promising - but only 6 years have passed since the ImageNet breakthrough, it's still early days for DNNs.

BTW these limitations affect both the value head and the policy head, but how exactly and to what extent is anyone's guess. While I have a degree of understanding how VH works, I can't say I really understand PH. Given their low eval speed, human GMs obviously must have extremely good candidate move selection (PH), this is surely where NNs are lagging behind.

jorose · Post by **jorose** » Sat Aug 11, 2018 12:00 am

Branko Radovanovic wrote: ↑Fri Aug 10, 2018 8:34 pm
oreopoulos wrote: ↑Fri Aug 10, 2018 4:43 pm This is why deep residual neural networks perform better. They combine many convolution filters.

They require better hardware to run on though.
Yes - the problem with Leela is that it relies on an NN architecture which was used primarily for image recognition (where it performed brilliantly) and was re-purposed to play chess. (I guess Google worked with what they had, and that was experience with the ImageNet competition, for example.) Image recognition is classification, not evaluation, and - aside from that - there are two major differences. In image recognition, small changes (a couple of pixels) do not (or should not, at any rate) affect the classification, whereas in chess this is not the case. The other is that image recognition works well without "connecting" pixels in the opposite sides of the image, while chess evaluation is non-local in that aspect, hence these problems with the 3x3 convolution.

This is why other approaches may be more promising - but only 6 years have passed since the ImageNet breakthrough, it's still early days for DNNs.

BTW these limitations affect both the value head and the policy head, but how exactly and to what extent is anyone's guess. While I have a degree of understanding how VH works, I can't say I really understand PH. Given their low eval speed, human GMs obviously must have extremely good candidate move selection (PH), this is surely where NNs are lagging behind.

You guys can't keep throwing around the claim that the issues have to do with filter sizes if you have nothing to back it up.

While it is true that CNNs deal much better with local features than features spanning an entire image, it is important to realize that in the case of chess literally everything is local. NVIDIA has some really impressive results in a paper from ICLR this year where they generate some amazing images of the size 1024 x 1024 without any filters larger than 3x3. If 3x3 filters can deal with features spanning a 1024 x 1024 image than they for sure can deal with features on a small 8x8 board.

The only real difference with precise position of features being important is that we can't use pooling. Without pooling there is no information lost and a CNN can easily deal with problems which are sensitive to precise pixel positions.

Some of the state of the art approaches actually go even further, and decompose 3x3 convolutions into 1x3 convolutions followed by 3x1 convolutions. Many architectures even utilize 1x1 convolutions for things like feature compression within the network.

Laskos · Post by **Laskos** » Sat Aug 11, 2018 2:23 am

jorose wrote: ↑Sat Aug 11, 2018 12:00 am
Branko Radovanovic wrote: ↑Fri Aug 10, 2018 8:34 pm
oreopoulos wrote: ↑Fri Aug 10, 2018 4:43 pm This is why deep residual neural networks perform better. They combine many convolution filters.

They require better hardware to run on though.
Yes - the problem with Leela is that it relies on an NN architecture which was used primarily for image recognition (where it performed brilliantly) and was re-purposed to play chess. (I guess Google worked with what they had, and that was experience with the ImageNet competition, for example.) Image recognition is classification, not evaluation, and - aside from that - there are two major differences. In image recognition, small changes (a couple of pixels) do not (or should not, at any rate) affect the classification, whereas in chess this is not the case. The other is that image recognition works well without "connecting" pixels in the opposite sides of the image, while chess evaluation is non-local in that aspect, hence these problems with the 3x3 convolution.

This is why other approaches may be more promising - but only 6 years have passed since the ImageNet breakthrough, it's still early days for DNNs.

BTW these limitations affect both the value head and the policy head, but how exactly and to what extent is anyone's guess. While I have a degree of understanding how VH works, I can't say I really understand PH. Given their low eval speed, human GMs obviously must have extremely good candidate move selection (PH), this is surely where NNs are lagging behind.
You guys can't keep throwing around the claim that the issues have to do with filter sizes if you have nothing to back it up.

Then why the problems with gliders? Just checked with a very recent net, ID10575. First, regular Chess, a very strong net, one of the strongest as of now against AB engines. This is the baseline. From the odd non-glider starting position, this net under-performs by 150 +/- 30 Elo points. From odd heavy-pieces starting position, it under-performs by 350 +/- 50 Elo points. So,

1/ Leela underperforms in odd positions
2/ It under-performs significantly more with lots of gliders

It seems, the 3x3 patterns in consecutive layers used are not adapted extremely well to Chess. There are no patterns along board rays, while these are known to be important for Chess due to gliders (X-ray protection, discovered threats, etc). These all have to be built by cascading 3x3 patterns in consecutive layers, which takes lots of learning and data.

While it is true that CNNs deal much better with local features than features spanning an entire image, it is important to realize that in the case of chess literally everything is local. NVIDIA has some really impressive results in a paper from ICLR this year where they generate some amazing images of the size 1024 x 1024 without any filters larger than 3x3. If 3x3 filters can deal with features spanning a 1024 x 1024 image than they for sure can deal with features on a small 8x8 board.

The only real difference with precise position of features being important is that we can't use pooling. Without pooling there is no information lost and a CNN can easily deal with problems which are sensitive to precise pixel positions.

Some of the state of the art approaches actually go even further, and decompose 3x3 convolutions into 1x3 convolutions followed by 3x1 convolutions. Many architectures even utilize 1x1 convolutions for things like feature compression within the network.

Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines

Re: Leela is more Chess-domain specific than regular Chess engines