LCzero sacs a knight for nothing

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

LCzero sacs a knight for nothing

Post by Daniel Shawul »

[D]3r1bk1/1p3ppp/2p2p2/2Pq4/1P1Pr3/3R1NP1/2Q2P1P/3R2K1 w - - 5 24

Here LC0 moved Ne5 on TCEC's 43-core hardware! Note that this blunder is probably not due to a bug as most other engines would have it, but that the algorithm is working as intended and can produce such tactical blunders even on this massive hardware.

Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?

I suspect the averaging of scores is responsible for this blunder. When a position has a few good moves and the policy network fails to pick them, these things can happen.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: LCzero sacs a knight for nothing

Post by Michel »

Duplicate. Cannot delete for some reason.
Last edited by Michel on Thu Apr 19, 2018 9:15 pm, edited 1 time in total.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: LCzero sacs a knight for nothing

Post by Michel »

Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: LCzero sacs a knight for nothing

Post by Dann Corbit »

Since the new neural net setup (0.7) and participation of Google Colab, it has started to take off again.

In fact, the steep, linear, upward slope at this Elo level looks like exponential learning rate.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: LCzero sacs a knight for nothing

Post by Daniel Shawul »

Michel wrote:
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.

Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: LCzero sacs a knight for nothing

Post by gladius »

Daniel Shawul wrote:
Michel wrote:
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.

Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).

Don't you think that the network can learn to predict tactics?

This is totally separate from the HW question as well. I don't really care about the HW issue unless it's not feasible to run it, and it is feasible and commonly available with GPUs, so why not play with a totally different way of doing things? This is what you do with your engines after all :).
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: LCzero sacs a knight for nothing

Post by Michel »

That is a one ply tactic right there!
It is not really one ply, is it? LC0 did not see the rook check on e1, which to be honest I also had not seen at first sight.

I guess the policy head will have to learn about checks that detract a defender.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: LCzero sacs a knight for nothing

Post by Daniel Shawul »

gladius wrote:
Daniel Shawul wrote:
Michel wrote:
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.

Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).

Don't you think that the network can learn to predict tactics?

This is totally separate from the HW question as well. I don't really care about the HW issue unless it's not feasible to run it, and it is feasible and commonly available with GPUs, so why not play with a totally different way of doing things? This is what you do with your engines after all :).
Gary, first of i hope you don't take my posts to be negative voice of the L0 project. Infact, I like it a lot so that we can prove once and for all how AlphaZero "did it" ..

As I posted elesewhere, the policy network maybe able to identify things like "don't put you piece where it can be captured", or "move your piece away so that it won't be capturred". I don't see it solving precise tactics even at qsearch level. It can only learn those general rules...

Lets then assume it learned the above kind of rules and has a good policy network. The problem is that a trap, by definiton, is something that looks bad but will turn out to be good if searched to x-plies. So whethere the policy network is good or bad it is not going to help you much -- well it better be good atleast to look decent but a tactical engine will find its tactical weakness anyway. This is because its policy network rules are static unlike alphabeta engines who analyze these tactics dynamically!

Almost every game LC0 is missing some tactics in TCEC.
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: LCzero sacs a knight for nothing

Post by Karlo Bala »

gladius wrote:
Don't you think that the network can learn to predict tactics?
1. Feed-forward NN - maybe, very shallow tactics
2. Recurrent NN - one day perhaps, but not today, not tomorrow,...
Best Regards,
Karlo Balla Jr.
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: LCzero sacs a knight for nothing

Post by gladius »

Daniel Shawul wrote:
gladius wrote:
Daniel Shawul wrote:
Michel wrote:
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.

Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).

Don't you think that the network can learn to predict tactics?

This is totally separate from the HW question as well. I don't really care about the HW issue unless it's not feasible to run it, and it is feasible and commonly available with GPUs, so why not play with a totally different way of doing things? This is what you do with your engines after all :).
Gary, first of i hope you don't take my posts to be negative voice of the L0 project. Infact, I like it a lot so that we can prove once and for all how AlphaZero "did it" ..

As I posted elesewhere, the policy network maybe able to identify things like "don't put you piece where it can be captured", or "move your piece away so that it won't be capturred". I don't see it solving precise tactics even at qsearch level. It can only learn those general rules...

Lets then assume it learned the above kind of rules and has a good policy network. The problem is that a trap, by definiton, is something that looks bad but will turn out to be good if searched to x-plies. So whethere the policy network is good or bad it is not going to help you much -- well it better be good atleast to look decent but a tactical engine will find its tactical weakness anyway. This is because its policy network rules are static unlike alphabeta engines who analyze these tactics dynamically!

Almost every game LC0 is missing some tactics in TCEC.
Not at all - I think you've raised some very interesting points! MCTS averaging does seem fundamentally mismatched to Chess. That's why I was so amazed A0 actually worked.

Once it gets getting a lot better, I think it will be pretty fascinating to do raw network evaluation of tactical positions and see how it does :). At the very least, then we can start playing with additional inputs/structure and see if we can make it better.