Playing the endgame like a boss !!

hgm · Post by **hgm** » Thu Mar 14, 2019 12:14 pm

To me this is the most amazing thing about this Zero-business: that an engine with so many glaring weaknesses can be so good at winning actual games, by learning to avoid positions where its weaknesses would be tested.

It demonstrates the more general principle that testing on game results (i.e. Elo) actually says pitifully little about how good an engine is at analyzing arbitrary positions. In this respect Elo seems a completely irrelevant quantity. And this could just as well hold for the conventional engines, although it is not that obvious there to humans. It is generally believed that Stockfish or Komodo would be much better for analysis than, say, The Baron. But this hasn't really been tested, and might very well be not true. It could be that Stockfish and Komodo, like LC0, are just tuned to avoid exposing their own weaknesses (except that they would have entirely different weaknesses).

jp · Post by jp » Thu Mar 14, 2019 2:37 pm

hgm wrote: ↑Thu Mar 14, 2019 9:45 am Obviously CCC was not using the EGT to adjudicate.

CCC uses TBs to adjudicate ... draws (but not wins).

I'm sure they intend all engines to use TBs, so if Lc wasn't it would have been an accident.

ernest · Post by **ernest** » Thu Mar 14, 2019 5:57 pm

I wonder if somebody using Lc0 could replay that game ending, with and without syzygys.
But perhaps move randomness would prevent any definite conclusions.

mwyoung · Post by **mwyoung** » Fri Mar 15, 2019 2:08 am

hgm wrote: ↑Thu Mar 14, 2019 12:14 pm To me this is the most amazing thing about this Zero-business: that an engine with so many glaring weaknesses can be so good at winning actual games, by learning to avoid positions where its weaknesses would be tested.

It demonstrates the more general principle that testing on game results (i.e. Elo) actually says pitifully little about how good an engine is at analyzing arbitrary positions. In this respect Elo seems a completely irrelevant quantity. And this could just as well hold for the conventional engines, although it is not that obvious there to humans. It is generally believed that Stockfish or Komodo would be much better for analysis than, say, The Baron. But this hasn't really been tested, and might very well be not true. It could be that Stockfish and Komodo, like LC0, are just tuned to avoid exposing their own weaknesses (except that they would have entirely different weaknesses).

+1

I agree. We think we know, but most of this is assumption.

mwyoung · Post by **mwyoung** » Fri Mar 15, 2019 5:06 am

ernest wrote: ↑Thu Mar 14, 2019 5:57 pm I wonder if somebody using Lc0 could replay that game ending, with and without syzygys.
But perhaps move randomness would prevent any definite conclusions.

I have seen Lc0 play in a style where it does not care how long it takes to mate. As long as it wins. And this makes perfect sense if your learned the game from ZERO. And all that matters is wins and losses and draw. You get no bonus for finding the shortest win.

To fix this issue. I don't know if you could call Lc0. Zero any more.

In my next test match. I will disable the resign feature for the test games. And make Lc0 play to mate. And see what kind of rating change, and craziness. The Zero aspect has on play.

With the resign set to +9 for both engines, play has looked ok for the most part.

Uri Blass · Post by **Uri Blass** » Fri Mar 15, 2019 7:24 am

hgm wrote: ↑Thu Mar 14, 2019 12:14 pm To me this is the most amazing thing about this Zero-business: that an engine with so many glaring weaknesses can be so good at winning actual games, by learning to avoid positions where its weaknesses would be tested.

It demonstrates the more general principle that testing on game results (i.e. Elo) actually says pitifully little about how good an engine is at analyzing arbitrary positions. In this respect Elo seems a completely irrelevant quantity. And this could just as well hold for the conventional engines, although it is not that obvious there to humans. It is generally believed that Stockfish or Komodo would be much better for analysis than, say, The Baron. But this hasn't really been tested, and might very well be not true. It could be that Stockfish and Komodo, like LC0, are just tuned to avoid exposing their own weaknesses (except that they would have entirely different weaknesses).

It seems that the weakness is mainly winning slowly and not finding the fastest mate.
in normal positions when it is not clear that one side is winning I think that it is clear that stockfish komodo or lc0 are better for analysis relative to engines like the baron.

I also think that in positions when there is a forced mate the baron is weaker relative to top engines except lc0
and they are going to find the mate faster in most cases.

It is easy to test.

Take a big pgn and ask engines to analyze every position for 1 second to see how many mate scores they can find.

Eduard · Post by **Eduard** » Fri Mar 15, 2019 8:35 am

I've seen on chess.com how leela checkmated with the queen and also with knight and bishop. But always before the 50. move. The problem must therefore be related to this 50. moves rule.

hgm · Post by **hgm** » Fri Mar 15, 2019 9:07 am

mwyoung wrote: ↑Fri Mar 15, 2019 5:06 amI have seen Lc0 play in a style where it does not care how long it takes to mate. As long as it wins. And this makes perfect sense if your learned the game from ZERO. And all that matters is wins and losses and draw. You get no bonus for finding the shortest win.

To fix this issue. I don't know if you could call Lc0. Zero any more.

I don't agree. In any game a faster win is preferable over a slower win. That is not domain-specific knowledge any more than that a win is preferable over a loss. It is not enough to know how to reach a position from which you theoretically can force a win if you do not know how to actually convert it. You have to train gae-playing entities how to make progress towards a win, especially in the Zero approach.

LC0 is just trained for the wrong thing. And it is likely this very much slows down its training, as in many of its training examples it will not be able to recognize it did something good because its inability to convert the won position will mask it.

megamau · Post by **megamau** » Fri Mar 15, 2019 12:11 pm

hgm wrote: ↑Fri Mar 15, 2019 9:07 am I don't agree. In any game a faster win is preferable over a slower win. That is not domain-specific knowledge any more than that a win is preferable over a loss. It is not enough to know how to reach a position from which you theoretically can force a win if you do not know how to actually convert it. You have to train gae-playing entities how to make progress towards a win, especially in the Zero approach.

LC0 is just trained for the wrong thing. And it is likely this very much slows down its training, as in many of its training examples it will not be able to recognize it did something good because its inability to convert the won position will mask it.

Not true.
In any game a more certain win is preferable over a less certain win. The length has no impact.
Would you rather make a dubious sacrifice to deliver a fast mate that could transform into a game loss or simplify into a long technically won endgame that you are certain to convert ?

Leela is trained on the correct metric (probability of win).
The problem is that when the position is 100% certain win (i.e. any move wins), she has no backup metric to differentiate between "quick" mates and "trolling" long mates.
But as long as she eventually converts there is no real issue.

hgm · Post by **hgm** » Fri Mar 15, 2019 1:36 pm

The problem with that is that when you do not know how to convert 'certain wins', they suddenly become a lot less certain...

Your statement also bypasses the fact that win probabilities as determined by the NN are not infinitely accurate, but are necessarily polluted by a great deal of noise. So in practice, if you have the choice between going for an estimated win probability of 90.1% with estimated remaining game length of 20 moves, and an estimated win probability of 90.0% with estimated remaining length of 50, when the estimation noise is 3%, it would be really foolish to go for the 50 moves like that extra 0.1% is real. The estimated remaining length is likely a much more reliable indicator for whether you are dealing with a 90%+3% case rather than a 90.1%-3% case.

So what I am basically saying is that witholding the duration info from the NN during training will severely degrade the accuracy with which it can eventually estimate the win probabilities. And taking the theoretically best decision based on compromised data will in practice often lead to the wrong decision. When game length had been folded into the reward function during training, the NN would, in the example above, probably not have said 90.1% vs 90% +/- 3%, but 88% vs 92% +/- 1%. And that would enable it to go for what is the highest win probability in reality, rather than the imagined one based on inaccuracy.

Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!

Re: Playing the endgame like a boss !!