AlphaZero - Tactactical Abilities

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Tue Dec 19, 2017 1:49 pm

hgm wrote:
Lyudmil Tsvetkov wrote:So that, no knowledge in Alpha, it was all outsearching.

It is so funny when people still believe Alpha has achieved some breakthrough. No breakthrough, just tremendous computer power.
So Stockfish was outsearched by an opponent that searched a ~1000 times smaller tree (80kps for AlphaZero vs 70Mnps for Stockfish).

Shouldn't that count as a beakthrough?

Lyudmil Tsvetkov wrote:But then, the verdict would have been: "It barely beat SF".
That still does't sound very bad for something that 9 hours earlier had sub-zero Elo, only knew the rules and was never taught anything to improve it...

How can one have sub-zero elo?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Tue Dec 19, 2017 1:50 pm

syzygy wrote:
Lyudmil Tsvetkov wrote:So they are using evaluation, after all.
Did not they claim their approach has nothing to do with alpha-beta?
Duh?

You have failed the Turing test.

More specifically?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Tue Dec 19, 2017 1:59 pm

hgm wrote:NN = neural network.

The NN produces an evaluation, in terms of winning probability (or actually score expectation), and move recommendations for searching.

The NN was trained by showing it positions from the self-play games, and the result of that game, for predicting results from patterns in the position. Initially the network was initialized randomly, but since it recognizes many patterns there will always be some that correlate with winning, and these will then be enhaced during the training. What patterns exactly the fully trained network recogizes is completely unknown, and would be very hard to find out, because the network is humongously large.

So, it is using Monte Carlo only during training, but in actual game play plain alpha-beta with evaluation.
Why do they claim it is not alpha-beta then, but Monte Carlo?

But that is the real question: what those patterns are, and how many.

I guess it is obvious, no one can tune more than 1000 good chess knowledge terms, so either they are tuning less, or their patterns are completely meaningless.

But I don't care at all what the patterns of a 2800 engine are, I know that pretty well, looking at an engine like Fruit, for example. Was not Fruit 2800 back a decade ago? Those guys are decade and a half behind in development...

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Tue Dec 19, 2017 2:02 pm

hgm wrote:This is very precisely described in the AG0 paper. The NN has many layers. The first layer breaks up the board in overlapping 3x3 areas, and in each such area 256 patterns are recognized. But then many layers follow, (like 19 or 39), which can recognize 'patterns in the patterns', which in many cases is no doubt used to create patterns of larger area, and eventually along entire board rays.

Those are not patterns, this is just random guessing.

Ras · Post by **Ras** » Tue Dec 19, 2017 2:14 pm

Lyudmil Tsvetkov wrote:Those are not patterns, this is just random guessing.

At first, it is. That's what gets dealt with in the learnign phase. Those guesses that happen to yield good output are enforced while those with bad output are weakened. Over time, the network will generate better output.

Pretty much like what you did in your childhood when you first touched a hot oven plate. The random network output "good idea to touch that" pretty quickly got weakened.

shrapnel · Post by **shrapnel** » Tue Dec 19, 2017 2:22 pm

Ras wrote:Pretty much like what you did in your childhood when you first touched a hot oven plate. The random network output "good idea to touch that" pretty quickly got weakened.

Nice Analogy, but wasted on Tsvetkov as he doesn't do neural networks, only Alpha-Beta Search !
He probably enjoyed touching the Hot Plate so much that he ended up sitting on it to enjoy it better

hgm · Post by **hgm** » Tue Dec 19, 2017 2:35 pm

Lyudmil Tsvetkov wrote:So, it is using Monte Carlo only during training, but in actual game play plain alpha-beta with evaluation.
Why do they claim it is not alpha-beta then, but Monte Carlo?

No, they also used (simulated) Monte-Carlo during the matches. Self-play and matches were largely the same, except that time per move during the matches was much longer in the matches.

But that is the real question: what those patterns are, and how many.

Indeed, I guess everyone would like to know that. But it will be hard to find out, if it can be done at all. Apparently the NN was able to find patterns that allowed it to outsearch Stockfish with only a fraction of the nodes.

I guess it is obvious, no one can tune more than 1000 good chess knowledge terms, so either they are tuning less, or their patterns are completely meaningless.

Most of the patterns will indeed be completely meaningless, and the will then be trained to either alter them into something useful, or ignore them. You have to have such 'spare' capacity in an NN to make it sufficiently general. Perhaps these useless patters would have made all the difference when you had been training it for Go or Draughts.

But I don't care at all what the patterns of a 2800 engine are, I know that pretty well, looking at an engine like Fruit, for example. Was not Fruit 2800 back a decade ago? Those guys are decade and a half behind in development...

But good old Fruit cannot outsearch Stockfish with only 1/10 of the nodes... You still seem to think this is about evaluation. It is not. The breakthrough is selective search that (according to you) causes 3500+ Elo play with only a 2800 Elo evaluation.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Tue Dec 19, 2017 8:30 pm

Ras wrote:
Lyudmil Tsvetkov wrote:Those are not patterns, this is just random guessing.
At first, it is. That's what gets dealt with in the learnign phase. Those guesses that happen to yield good output are enforced while those with bad output are weakened. Over time, the network will generate better output.

Pretty much like what you did in your childhood when you first touched a hot oven plate. The random network output "good idea to touch that" pretty quickly got weakened.

Chess is much more complex than touching a hot oven plate.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Tue Dec 19, 2017 8:39 pm

hgm wrote:
Lyudmil Tsvetkov wrote:So, it is using Monte Carlo only during training, but in actual game play plain alpha-beta with evaluation.
Why do they claim it is not alpha-beta then, but Monte Carlo?
No, they also used (simulated) Monte-Carlo during the matches. Self-play and matches were largely the same, except that time per move during the matches was much longer in the matches.

But that is the real question: what those patterns are, and how many.
Indeed, I guess everyone would like to know that. But it will be hard to find out, if it can be done at all. Apparently the NN was able to find patterns that allowed it to outsearch Stockfish with only a fraction of the nodes.

I guess it is obvious, no one can tune more than 1000 good chess knowledge terms, so either they are tuning less, or their patterns are completely meaningless.
Most of the patterns will indeed be completely meaningless, and the will then be trained to either alter them into something useful, or ignore them. You have to have such 'spare' capacity in an NN to make it sufficiently general. Perhaps these useless patters would have made all the difference when you had been training it for Go or Draughts.

But I don't care at all what the patterns of a 2800 engine are, I know that pretty well, looking at an engine like Fruit, for example. Was not Fruit 2800 back a decade ago? Those guys are decade and a half behind in development...
But good old Fruit cannot outsearch Stockfish with only 1/10 of the nodes... You still seem to think this is about evaluation. It is not. The breakthrough is selective search that (according to you) causes 3500+ Elo play with only a 2800 Elo evaluation.

Is not alpha-beta precisely the same: simulating play-outs, only that the play-outs end somewhere with an heuristic score instead of a known result?
What would be the cardinal change?

Outsearching was due to hardware, not to selective algorithms.
Btw., what kind of advanced algorithms could they apply in a MC search?

There NN is obviously BS, but again, they stress their achievement in the NN and not the search.
Why so?

Ras · Post by **Ras** » Tue Dec 19, 2017 8:42 pm

Lyudmil Tsvetkov wrote:Chess is much more complex than touching a hot oven plate.

Completely irrelevant to the argument, please re-read.

AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities

Re: AlphaZero - Tactactical Abilities