LC0 on the WAC test suite, bad results

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10299
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: LC0 on the WAC test suite, bad results

Post by Uri Blass »

Dann Corbit wrote:
jkiliani wrote:OK, and what's your conclusion from all that?

It's no secret that Leela's weakest point is tactics, especially if you just give her FENs instead of positions with move history. If you want a tactics solver, you're simply using the wrong engine here..
I think it is important to understand why tactics stink so bad.
The performance in openings (by contrast) is spectacular.
If the issue is simply that most tactical positions involve a sacrifice, that would give engines a strategy against the NN format.

I guess that if you trained on tactical sets, you would get a tactical monster that stinks at real chess, but that is pure speculation on my part.

It might conclude that sacrifices in general are a good idea, and of course they are usually a bad idea.
sacrifices are part of tactics but tactics is not only sacrifices.

I believe that the problem is that LC0 is not trained correctly like Alphazero
and I see no reason to support LC0 when it use a relatively stupid NN(for example I read it is using only 3*3 boards and not 8*1 lines only because Alphazero did it).
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: LC0 on the WAC test suite, bad results

Post by jkiliani »

Uri Blass wrote:
Dann Corbit wrote:I think it is important to understand why tactics stink so bad.
The performance in openings (by contrast) is spectacular.
If the issue is simply that most tactical positions involve a sacrifice, that would give engines a strategy against the NN format.

I guess that if you trained on tactical sets, you would get a tactical monster that stinks at real chess, but that is pure speculation on my part.

It might conclude that sacrifices in general are a good idea, and of course they are usually a bad idea.
sacrifices are part of tactics but tactics is not only sacrifices.

I believe that the problem is that LC0 is not trained correctly like Alphazero
and I see no reason to support LC0 when it use a relatively stupid NN(for example I read it is using only 3*3 boards and not 8*1 lines only because Alphazero did it).
There's nothing wrong with the neural network architecture used by LCZero, for the 128x10 network its performance relative to human play is actually very similar to that by Leela Zero (Go) on the same network size. Also, Deepmind write in their Alphazero paper: "Other representations could have been
used; in our experiments the training algorithm worked robustly for many reasonable choices."

The reason for 3x3 filters is that it's computationally very efficient to do this for residual neural nets, and since we're using 20 convolutional layers already now, this is enough for moves by any piece to propagate across the board three times already. But if you know of a better implementation, we're looking forward to your pull request for Lc0 (provided that you implement and train it, and prove the strength benefit).
frankp
Posts: 228
Joined: Sun Mar 12, 2006 3:11 pm

Re: LC0 on the WAC test suite, bad results

Post by frankp »

Useful to remember that this is still early days for the project, which has not been going very long at all.
The structure of the NN is not yet finalised and its training far from over.
The 40M games point should be an interesting time compare its strength to that A0 - in so far as this can be done.
Early success - but it is still early.
It seems to me to have immense potential as a new way of creating a very interesting chess engine. The dismissive comments at this stage of the project are interesting to say the least.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: LC0 on the WAC test suite, bad results

Post by Milos »

jkiliani wrote:The reason for 3x3 filters is that it's computationally very efficient to do this for residual neural nets, and since we're using 20 convolutional layers already now, this is enough for moves by any piece to propagate across the board three times already. But if you know of a better implementation, we're looking forward to your pull request for Lc0 (provided that you implement and train it, and prove the strength benefit).
The reason is actually that only guy who knows how to create a working DNN for a board game is Gian-Carlo and you just copied most of the LC0 from L0. The rest you copied from SF and that part was for such a long time so full of bugs that it is simply embarrassing.
I don't think any of LC0 contributors have any actual expertise with DNNs beside playing around in Keras. So give me a break with your "explanations".
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: LC0 on the WAC test suite, bad results

Post by hgm »

jkiliani wrote:The reason for 3x3 filters is that it's computationally very efficient to do this for residual neural nets, and since we're using 20 convolutional layers already now, this is enough for moves by any piece to propagate across the board three times already. But if you know of a better implementation, we're looking forward to your pull request for Lc0 (provided that you implement and train it, and prove the strength benefit).
8x1 an 1x8 filters are not any less efficient, are they? I still think it is a bad mistake (if you don't have unlimited resources) to allow some of the possible moves to fall outside the 'viewing field' of all filters. It needlessly drives up the required number of residual blocks required for recognizing relevant patterns, and the number of required filters per block to pass on simple information to deeper layers.
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: LC0 on the WAC test suite, bad results

Post by Gian-Carlo Pascutto »

hgm wrote: 8x1 an 1x8 filters are not any less efficient, are they?
Not sure, is there a practical Winograd reformulation for them? There won't be one in cuDNN, for sure.

I'm not sure exactly what design you guys have in mind, but most image recognition research went from bigger and varied filters (11x11, 17x17 etc implemented using FFTs) into pure 3x3 stacks (implemented using Winograd filtering). Apparently that just works better. It's up to the backprop to determine whether the extra layers are used for vision around the board or higher level abstractions.

Of course some of the underlying reasons for the choice for symmetric/square filters for vision applications doesn't necessarily hold true for board games. But this seems like the thing that needs empirical testing.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: LC0 on the WAC test suite, bad results

Post by Milos »

Gian-Carlo Pascutto wrote:
hgm wrote: 8x1 an 1x8 filters are not any less efficient, are they?
Not sure, is there a practical Winograd reformulation for them? There won't be one in cuDNN, for sure.

I'm not sure exactly what design you guys have in mind, but most image recognition research went from bigger and varied filters (11x11, 17x17 etc implemented using FFTs) into pure 3x3 stacks (implemented using Winograd filtering). Apparently that just works better. It's up to the backprop to determine whether the extra layers are used for vision around the board or higher level abstractions.

Of course some of the underlying reasons for the choice for symmetric/square filters for vision applications doesn't necessarily hold true for board games. But this seems like the thing that needs empirical testing.
You at least need hyperparameter exploration of various filter sizes. Just pulling 3x3 because it is the fastest to compute and is good for CV makes little sense to me, so I'm with HGM on this.
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: LC0 on the WAC test suite, bad results

Post by Gian-Carlo Pascutto »

Milos wrote: You at least need hyperparameter exploration of various filter sizes. Just pulling 3x3 because it is the fastest to compute and is good for CV makes little sense to me, so I'm with HGM on this.
Sure. You should be able to do this with the existing training data.

Change the TF code that constructs the network, and if at the end of the training you end up with a lower loss or a comparable loss and a faster forward pass, you win.

But if we're going to add game specific tweaks to the Zero construction, I can suggest a few more productive places than the network layout, I think...
Just pulling 3x3 because it is the fastest to compute and is good for CV makes little sense to me, so I'm with HGM on this.
Well, HGM asked if 1x8 & 8x1 was less efficient to compute, and at least in practice, the answer should be yes. It's possible you can then drop enough layers to catch up, but for the reasons already outlined, this is not something I'd want to guess at or consider obvious.
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: LC0 on the WAC test suite, bad results

Post by Gian-Carlo Pascutto »

[D]1r4k1/1q2bp2/3p2p1/2pP4/p1N4R/2P2QP1/1P3PK1/8 w - - 0 1
[D]rn3rk1/pbppq1pp/1p2pb2/4N2Q/3PN3/3B4/PPP2PPP/R3K2R w KQ - 0 1

Both trivial tactics for even weak engines, basically unsolvable for current Leela networks.

It's very uneven. Saccing the queen here isn't a problem, and is faster than some Deep Sjeng's:
[D]2r1k2r/2pn1pp1/1p3n1p/p3PP2/4q2B/P1P5/2Q1N1PP/R4RK1 w q - 0 1

Some of the tactical weakness is due to a tweak DeepMind did to the UCT algorithm: they scale the exploration part by the policy prior. This means that low policy moves aren't explored at all, unlike regular UCT that would force some brute forceness.

But fixing that (by adding an additional non-policy weighted term) does not seem to be the solution: I gained like 5 Elo when I did so. Probably enough to get an SPRT pass if it were Stockfish, but I think LC0 people are going to laugh their pants off at introducing complexity for so little gain. There may be better tweaks possible, of course.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: LC0 on the WAC test suite, bad results

Post by Daniel Shawul »

jkiliani wrote:At some point in the future, it could probably be tried to train Leela (or a fork of her) on a mix of self-play games and tactical suites like this one. There's a chance that this would make the net at least more familiar with these types of positions, which is what's required to solve such puzzles. May take quite a bit of trial and error though to do that without weakening the general gameplay of the neural net, and it will definitely require a very LARGE database of tactical puzzles to train on.
I am dumbfounded by people who think its tactics could be improved by training. Tactics is something that is dynamic and should be evaluated precisely; not something suitable for a universal approximator like a NN. If you train on WAC it is not going to help you anywhere else. My guess is its tactics is gonna suck forever unless elements of alpha-beta are introduced such as, using minmax backups, alpha-beta rollouts etc..

About the 3x3 filters, why does LCzero have code for inference anyway ? My first take with NN chess engine would probably use the c++ tensorflow backend for inference, which would use cuDNN as its backend which probably has better optimized algorithms than hand-written ones. Is this done for to avoid dependency on tensorflow or am I missing something ?

Daniel