LC0 on the WAC test suite, bad results

Uri Blass · Post by **Uri Blass** » Thu Apr 26, 2018 7:17 am

Dann Corbit wrote:
jkiliani wrote:OK, and what's your conclusion from all that?

It's no secret that Leela's weakest point is tactics, especially if you just give her FENs instead of positions with move history. If you want a tactics solver, you're simply using the wrong engine here..
I think it is important to understand why tactics stink so bad.
The performance in openings (by contrast) is spectacular.
If the issue is simply that most tactical positions involve a sacrifice, that would give engines a strategy against the NN format.

I guess that if you trained on tactical sets, you would get a tactical monster that stinks at real chess, but that is pure speculation on my part.

It might conclude that sacrifices in general are a good idea, and of course they are usually a bad idea.

sacrifices are part of tactics but tactics is not only sacrifices.

I believe that the problem is that LC0 is not trained correctly like Alphazero
and I see no reason to support LC0 when it use a relatively stupid NN(for example I read it is using only 3*3 boards and not 8*1 lines only because Alphazero did it).

jkiliani · Post by **jkiliani** » Thu Apr 26, 2018 7:45 am

Uri Blass wrote:
Dann Corbit wrote:I think it is important to understand why tactics stink so bad.
The performance in openings (by contrast) is spectacular.
If the issue is simply that most tactical positions involve a sacrifice, that would give engines a strategy against the NN format.

I guess that if you trained on tactical sets, you would get a tactical monster that stinks at real chess, but that is pure speculation on my part.

It might conclude that sacrifices in general are a good idea, and of course they are usually a bad idea.
sacrifices are part of tactics but tactics is not only sacrifices.

I believe that the problem is that LC0 is not trained correctly like Alphazero
and I see no reason to support LC0 when it use a relatively stupid NN(for example I read it is using only 3*3 boards and not 8*1 lines only because Alphazero did it).

There's nothing wrong with the neural network architecture used by LCZero, for the 128x10 network its performance relative to human play is actually very similar to that by Leela Zero (Go) on the same network size. Also, Deepmind write in their Alphazero paper: "Other representations could have been
used; in our experiments the training algorithm worked robustly for many reasonable choices."

The reason for 3x3 filters is that it's computationally very efficient to do this for residual neural nets, and since we're using 20 convolutional layers already now, this is enough for moves by any piece to propagate across the board three times already. But if you know of a better implementation, we're looking forward to your pull request for Lc0 (provided that you implement and train it, and prove the strength benefit).

frankp · Post by **frankp** » Thu Apr 26, 2018 9:49 am

Useful to remember that this is still early days for the project, which has not been going very long at all.
The structure of the NN is not yet finalised and its training far from over.
The 40M games point should be an interesting time compare its strength to that A0 - in so far as this can be done.
Early success - but it is still early.
It seems to me to have immense potential as a new way of creating a very interesting chess engine. The dismissive comments at this stage of the project are interesting to say the least.

Milos · Post by **Milos** » Thu Apr 26, 2018 2:09 pm

jkiliani wrote:The reason for 3x3 filters is that it's computationally very efficient to do this for residual neural nets, and since we're using 20 convolutional layers already now, this is enough for moves by any piece to propagate across the board three times already. But if you know of a better implementation, we're looking forward to your pull request for Lc0 (provided that you implement and train it, and prove the strength benefit).

The reason is actually that only guy who knows how to create a working DNN for a board game is Gian-Carlo and you just copied most of the LC0 from L0. The rest you copied from SF and that part was for such a long time so full of bugs that it is simply embarrassing.
I don't think any of LC0 contributors have any actual expertise with DNNs beside playing around in Keras. So give me a break with your "explanations".

hgm · Post by **hgm** » Thu Apr 26, 2018 2:23 pm

jkiliani wrote:The reason for 3x3 filters is that it's computationally very efficient to do this for residual neural nets, and since we're using 20 convolutional layers already now, this is enough for moves by any piece to propagate across the board three times already. But if you know of a better implementation, we're looking forward to your pull request for Lc0 (provided that you implement and train it, and prove the strength benefit).

8x1 an 1x8 filters are not any less efficient, are they? I still think it is a bad mistake (if you don't have unlimited resources) to allow some of the possible moves to fall outside the 'viewing field' of all filters. It needlessly drives up the required number of residual blocks required for recognizing relevant patterns, and the number of required filters per block to pass on simple information to deeper layers.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Apr 26, 2018 2:37 pm

hgm wrote: 8x1 an 1x8 filters are not any less efficient, are they?

Not sure, is there a practical Winograd reformulation for them? There won't be one in cuDNN, for sure.

I'm not sure exactly what design you guys have in mind, but most image recognition research went from bigger and varied filters (11x11, 17x17 etc implemented using FFTs) into pure 3x3 stacks (implemented using Winograd filtering). Apparently that just works better. It's up to the backprop to determine whether the extra layers are used for vision around the board or higher level abstractions.

Of course some of the underlying reasons for the choice for symmetric/square filters for vision applications doesn't necessarily hold true for board games. But this seems like the thing that needs empirical testing.

Milos · Post by **Milos** » Thu Apr 26, 2018 2:48 pm

Gian-Carlo Pascutto wrote:
hgm wrote: 8x1 an 1x8 filters are not any less efficient, are they?
Not sure, is there a practical Winograd reformulation for them? There won't be one in cuDNN, for sure.

I'm not sure exactly what design you guys have in mind, but most image recognition research went from bigger and varied filters (11x11, 17x17 etc implemented using FFTs) into pure 3x3 stacks (implemented using Winograd filtering). Apparently that just works better. It's up to the backprop to determine whether the extra layers are used for vision around the board or higher level abstractions.

Of course some of the underlying reasons for the choice for symmetric/square filters for vision applications doesn't necessarily hold true for board games. But this seems like the thing that needs empirical testing.

You at least need hyperparameter exploration of various filter sizes. Just pulling 3x3 because it is the fastest to compute and is good for CV makes little sense to me, so I'm with HGM on this.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Apr 26, 2018 3:04 pm

Milos wrote: You at least need hyperparameter exploration of various filter sizes. Just pulling 3x3 because it is the fastest to compute and is good for CV makes little sense to me, so I'm with HGM on this.

Sure. You should be able to do this with the existing training data.

Change the TF code that constructs the network, and if at the end of the training you end up with a lower loss or a comparable loss and a faster forward pass, you win.

But if we're going to add game specific tweaks to the Zero construction, I can suggest a few more productive places than the network layout, I think...

Just pulling 3x3 because it is the fastest to compute and is good for CV makes little sense to me, so I'm with HGM on this.

Well, HGM asked if 1x8 & 8x1 was less efficient to compute, and at least in practice, the answer should be yes. It's possible you can then drop enough layers to catch up, but for the reasons already outlined, this is not something I'd want to guess at or consider obvious.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Apr 26, 2018 3:22 pm

[D]1r4k1/1q2bp2/3p2p1/2pP4/p1N4R/2P2QP1/1P3PK1/8 w - - 0 1
[D]rn3rk1/pbppq1pp/1p2pb2/4N2Q/3PN3/3B4/PPP2PPP/R3K2R w KQ - 0 1

Both trivial tactics for even weak engines, basically unsolvable for current Leela networks.

It's very uneven. Saccing the queen here isn't a problem, and is faster than some Deep Sjeng's:
[D]2r1k2r/2pn1pp1/1p3n1p/p3PP2/4q2B/P1P5/2Q1N1PP/R4RK1 w q - 0 1

Some of the tactical weakness is due to a tweak DeepMind did to the UCT algorithm: they scale the exploration part by the policy prior. This means that low policy moves aren't explored at all, unlike regular UCT that would force some brute forceness.

But fixing that (by adding an additional non-policy weighted term) does not seem to be the solution: I gained like 5 Elo when I did so. Probably enough to get an SPRT pass if it were Stockfish, but I think LC0 people are going to laugh their pants off at introducing complexity for so little gain. There may be better tweaks possible, of course.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 26, 2018 9:55 pm

jkiliani wrote:At some point in the future, it could probably be tried to train Leela (or a fork of her) on a mix of self-play games and tactical suites like this one. There's a chance that this would make the net at least more familiar with these types of positions, which is what's required to solve such puzzles. May take quite a bit of trial and error though to do that without weakening the general gameplay of the neural net, and it will definitely require a very LARGE database of tactical puzzles to train on.

I am dumbfounded by people who think its tactics could be improved by training. Tactics is something that is dynamic and should be evaluated precisely; not something suitable for a universal approximator like a NN. If you train on WAC it is not going to help you anywhere else. My guess is its tactics is gonna suck forever unless elements of alpha-beta are introduced such as, using minmax backups, alpha-beta rollouts etc..

About the 3x3 filters, why does LCzero have code for inference anyway ? My first take with NN chess engine would probably use the c++ tensorflow backend for inference, which would use cuDNN as its backend which probably has better optimized algorithms than hand-written ones. Is this done for to avoid dependency on tensorflow or am I missing something ?

Daniel

LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results

Re: LC0 on the WAC test suite, bad results