AlphaZero - Tactactical Abilities

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: AlphaZero - Tactactical Abilities

Post by CheckersGuy »

Lyudmil Tsvetkov wrote:
hgm wrote:
Lyudmil Tsvetkov wrote:So, it is using Monte Carlo only during training, but in actual game play plain alpha-beta with evaluation.
Why do they claim it is not alpha-beta then, but Monte Carlo?
No, they also used (simulated) Monte-Carlo during the matches. Self-play and matches were largely the same, except that time per move during the matches was much longer in the matches.
But that is the real question: what those patterns are, and how many.
Indeed, I guess everyone would like to know that. But it will be hard to find out, if it can be done at all. Apparently the NN was able to find patterns that allowed it to outsearch Stockfish with only a fraction of the nodes.
I guess it is obvious, no one can tune more than 1000 good chess knowledge terms, so either they are tuning less, or their patterns are completely meaningless.
Most of the patterns will indeed be completely meaningless, and the will then be trained to either alter them into something useful, or ignore them. You have to have such 'spare' capacity in an NN to make it sufficiently general. Perhaps these useless patters would have made all the difference when you had been training it for Go or Draughts.
But I don't care at all what the patterns of a 2800 engine are, I know that pretty well, looking at an engine like Fruit, for example. Was not Fruit 2800 back a decade ago? Those guys are decade and a half behind in development...
But good old Fruit cannot outsearch Stockfish with only 1/10 of the nodes... You still seem to think this is about evaluation. It is not. The breakthrough is selective search that (according to you) causes 3500+ Elo play with only a 2800 Elo evaluation.
Is not alpha-beta precisely the same: simulating play-outs, only that the play-outs end somewhere with an heuristic score instead of a known result?
What would be the cardinal change?

Outsearching was due to hardware, not to selective algorithms.
Btw., what kind of advanced algorithms could they apply in a MC search?

There NN is obviously BS, but again, they stress their achievement in the NN and not the search.
Why so?
Then tell me why NN are bs ? Why are they used practically anywhere but according to you would horribly fail at learning a chess evaluation function ? Neural networks can learn to drive cars but they cant even learn anything about chess ? :lol: I srsly suspect that you are a troll
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: AlphaZero - Tactactical Abilities

Post by hgm »

Lyudmil Tsvetkov wrote:Is not alpha-beta precisely the same: simulating play-outs, only that the play-outs end somewhere with an heuristic score instead of a known result?
What would be the cardinal change?
It is not exactly the same, but similar. I have seen claims that MCTS is less sensitive to precise evaluation, because, unlike alpha-beta, it does not derive its score from a single final position at the end of the PV, but from the average of many positions that might materialize. I am not sure whether that is an advantage or a disadvantage, though. It cannot be excluded that using a NN to make the search selective would work just as well for alpha-beta as it does for MCTS.
Outsearching was due to hardware, not to selective algorithms.
That is of course an utterly silly statement. Hardware has no effect on search, other than that it gives you more or fewer nodes. But it is only the number of nodes that counts. Stockfish searching a million nodes on a Commodore 64 is exactly as strong as Stockfish searching on a 64-core haswell machine overclocked to 6GHz. A million nodes is a million nodes.

A thousand times smaller tree would never be able to 'outsearch' the larger tree if it wasn't far more selective.
Btw., what kind of advanced algorithms could they apply in a MC search?
The trained NN.
There NN is obviously BS, but again, they stress their achievement in the NN and not the search.
Why so?
Because the BS is between your ears. The NN is an integral part of the search.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: AlphaZero - Tactactical Abilities

Post by Ovyron »

Lyudmil Tsvetkov wrote:How can one have sub-zero elo?
Elo is relative.

The only reason Stockfish has 3400 rating on the CCRL is because at some point the SSDF calibrated their ratings to match humans, and at the beginning of the CCRL they calibrated their engines to the SSDF.

What if instead of doing that they started at 0 ELO?

Well, Stockfish would show a rating of 800, Gogobello 1.3 64-bit would be right in the middle with a rating of 0, and Ziggurat 0.18 64-bit would show a rating of -800.

All engines weaker than Gogobello would be "sub-zero elo" engines.

Elo depends on calibration.

So, how can we have "sub-zero elo" engines while keeping Stockfish at 3400? It's complicated but not much.

First, let's assume very weak chess entities exist, but they still don't lose all their games, I'll make some of them up on the spot:

Ziggurat has a 1800 rating of the CCRL. <- real engine.

Enu has a result of +0 =5 -995 againts Ziggurat.

Owt has a result of +0 =5 -995 againts Enu.

Eerht has a result of +0 =5 -995 againts Owt.

Ruof has a result of +0 =5 -995 againts Eerht.

The ratings:

1800 Ziggurat
880 Enu
-40 Owt
-960 Eerht
-1880 Ruof

So an engine that loses badly against an engine that loses badly against an engine that loses badly against an engine that loses badly against Ziggurat can reach a sub-zero elo of -1880.

I guess they'd need to lose on purpose for this, so you're getting in there engines that get destroyed by random movers.
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: AlphaZero - Tactactical Abilities

Post by hgm »

The Elo scale for engines below ~1000 Elo is a bit poorly defined. The reason is that the normal Elo model doesn't correctly describe the behavior of buggy engines, and that engines in this range are almost always buggy. Standard rating extraction programs would attach enormous significance to just a single win of A against B, and would think it totally excludes that A is, say, 1000 Elo weaker than B. Because in the normal Elo modal, wins over a 1000 Elo stronger opponent just do not occur. But buggy programs do lose against arbitrary weak opponents, because they make illegal moves, forfeit on time, etc. Surely such defects should be hold against the program that exhibits such bugs. But the problem is that with standard Elo computation the opponent that got the point for free profits from it, and this is totally undeserved.

So you would either have to use a specialized Elo extractor, which uses an asymmetric Elo model, or base the list entirely on games between players that are not too different in strength (so that the free points they get because of opponent bugs are not so much 'overweighted'). But it is very difficult to cover the entire sub-1000 range with engines, without leaving big Elo gaps anywhere. There is especially a big gap between searching and on-searching engines.

My best attempts in this direction resulted in POS getting an Elo of around -50, ad a random mover (agaist which POS scores some 75%) somewhat lower. But I have always suspected that these ratings would be pushed down a lot if more players could be placed in the gap, as now they were based on (very poor, but non-zero) results against much stronger opponents above the gap.

The graphs in the AlphaZero paper show that the initial ratig (in the random-mover stage) is thousands of Elo below zero. This is lower than I expected, but etirely credible. Of course i the process of its training they generated an almost contiuous range of players, perfectly covering the entire range. And they are not buggy, just weak. So they can easily determine Elo in the stadard way. The only caveat is that it would be Elo determined by 'self-play', (newer versions of the same engine against older ones), which is known to exaggerate Elo differences by about a factor 2.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: AlphaZero - Tactactical Abilities

Post by Michel »

hgm wrote:which is known to exaggerate Elo differences by about a factor 2.
I have always been curious why that is (supposedly) the case. Since the draw ratio goes up in self play one would expect (logistic) elo to be actually _lower_ under self play.

The most plausible explanation of the elo "exaggeration" is that it is just a case of over fitting during tuning. In that case the self play aspect is a red herring. The exaggeration would also occur when tuning against a single foreign engine (and this is something that can be tested!).

If it is not over fitting then something more exotic (like a hypothetical "mind reading effect") must be at play. It would be interesting to know this too.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: AlphaZero - Tactactical Abilities

Post by hgm »

I have always though that it was because the search trees are more similar. If two players search trees with large non-overlapping sub-trees, they can find good results there that the opponent has not seen, and would not avoid before it is too late. This gives extra wins to both sides, which decrease the Elo difference. If the engine with the larger tree completely covers your own tree, it will be very hard to surprise it.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: AlphaZero - Tactactical Abilities

Post by Ovyron »

hgm wrote:Of course i the process of its training they generated an almost contiuous range of players, perfectly covering the entire range.
That's actually really cool. Over the years programmers have really struggled to weaken their engines so they can play against humans, suffering from the "Engine plays at Super-GM strength most of the game but gives away a rook now and then" syndrome, and similar problems that make the engine play too artificial (no chance to beat the turing test.)

If A0 is able to play at any strength wished from "strongest chess entity on planet" to "thousands of elo below 0", they have inadvertently solved this problem of getting software playing naturally at the user's strength.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: AlphaZero - Tactactical Abilities

Post by Michel »

hgm wrote:I have always though that it was because the search trees are more similar. If two players search trees with large non-overlapping sub-trees, they can find good results there that the opponent has not seen, and would not avoid before it is too late. This gives extra wins to both sides, which decrease the Elo difference. If the engine with the larger tree completely covers your own tree, it will be very hard to surprise it.
Something like this could be true. The easily observed fact that the draw ratio is drastically higher in self play established beyond doubt that mind reading does indeed have a measurable effect.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: AlphaZero - Tactactical Abilities

Post by Lyudmil Tsvetkov »

Ras wrote:
Lyudmil Tsvetkov wrote:Chess is much more complex than touching a hot oven plate.
Completely irrelevant to the argument, please re-read.
What do you mean?
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: AlphaZero - Tactactical Abilities

Post by Ras »

Lyudmil Tsvetkov wrote:
Ras wrote:
Lyudmil Tsvetkov wrote:Chess is much more complex than touching a hot oven plate.
Completely irrelevant to the argument, please re-read.
What do you mean?
That your argument is irrelevant and you should re-read my argument in order to understand why. Though, honestly, I have read nothing but nonsense by you with regard to NNs, and I don't think anymore that this will change anytime soon.