New DeepMind paper

chrisw · Post by **chrisw** » Sat Nov 23, 2019 8:04 pm

Daniel Shawul wrote: ↑Sat Nov 23, 2019 7:46 pm
chrisw wrote: ↑Sat Nov 23, 2019 7:07 pm
Jouni wrote: ↑Fri Nov 22, 2019 10:47 pm I don't believe, that You can learn to play without the rules of the game . Not possible.
Well, it does get the rules of the game told to it, but not in a conventional manner. And it doesn't "learn" the rules in any sort of conventional manner either. It doesn't really learn chess, it creates an internal model of chess, where the rules are probabalistic, not firm as in normal chess.

From the paper:
Actions available. AlphaZero used the set of legal actions obtained from the simulator to mask the prior
produced by the network everywhere in the search tree. MuZero only masks legal actions at the root of the
search tree where the environment can be queried, but does not perform any masking within the search tree.
This is possible because the network rapidly learns not to predict actions that never occur in the trajectories
it is trained on

It does get game rules, but in a different form to being told "bishops move on diagonals" and so on. At the root, it is told which moves are legal and which not. MuZero only masks legal actions at the root of the search tree where the environment can be queried.

Within the tree it can play any move, including Kh1-d8 if it wants, but since it "learns" from the games it produces, Kh1-d8 won't occur and "the network rapidly learns not to predict actions that never occur in the trajectories it is trained on".

It creates a probalistic model of chess with probable rules, and the model is hidden in network weights, as per usual. It ends up being able to simulate playing chess. Which is fine. Why not? Especially if it works and plays strong. It will never actually play illegal moves (because illegal moves are masked away at ply zero), but it will play all manner of illegal things in the search tree. MuZero does not give special treatment to terminal nodes and always uses the value predicted by the network. Inside the tree, the search can proceed past a terminal node - in this case the network is expected to always predict the same value.

Presumably then, in the tree, states can exist where one or both sides have no king, for example.

I have no idea what this means:
This is achieved by treating terminal states as absorbing states during training.

So, it is not quite true to say MuZero doesn't get told the rules of the game. It is told at ply zero. And the games that it generates in self-play will all be legal games (because each move can only be selected from the legal move list known at ply zero). Each self play games will also terminate legally, so it will get to understand termination rules via the ply one move knowledge.
I'm not sure if it gets told the game score, I'ld guess so. That would be more Rules information.
The ply=0 is an exception for the sake of complying to game play rules, not something they actually needed.
For example, in Shogi they could have let it play illegal moves at ply=0 too AND the rules of shogi allow it. The rule is if you make an illegal
move you immediately loose the game. They mention in the paper that the way pieces move and legal move generation are learned quickly
so it could be making illegal moves 1 in 1000 for all we know.

That’s not how I read it, this seems to be a quite specific masking off of illegals..,,

MuZero only masks legal actions at the root of the search tree where the environment can be queried.

so it will never play an illegal move (at ply zero) and never be presented with an illegal PGN. I think there’s quite a difference between “never seeing an illegal trajectory” and seeing illegal trajectories and learning bad scores for each of them. There are what? 4096 illegal trajectories for every 30 or 40 legal moves, so, in the latter “learn bad scores” scenario, only 1 in a 100 or so learns are useful ones. Big advantage to the former “never see an illegal trajectory” by knowing ply one move rules.
I’m not trying to trash the MuZero concept here, btw, just trying to understand it. But there’s a big philosophical difference between learning the rules entirely by observation, and being helped.

chrisw · Post by **chrisw** » Sat Nov 23, 2019 8:23 pm

Daniel Shawul wrote: ↑Sat Nov 23, 2019 7:58 pm
smatovic wrote: ↑Sat Nov 23, 2019 7:32 pm I guess another step would be to abandon the tree search completely and handle it by the neural network alone....

--
Srdja
That will probably will never happen, as search is about tactics instead of strategy. First you need a function approximator that is more accurate than SEE which I think current NN will probably not match interms of tactics.

But NN busts past that paradigm (static evaluation plus tactical search) and says chess is statistical (a move either works or doesn’t, trial and error process), and NN already appears to have incorporated what we call tactics into its ply one evaluation (LC0 at d=1, no search, can play fine against Stockfish at d=7 search, or whatever, I forget the equivalence depth).
No real reason why, with orders of magnitude more training games and some fancy new network architecture designs, that more “depth” can get squeezed into an NN.
In current situation though, all things being equal, I agree with you that actual path of LC0 vs SF is 50/50 either way.

towforce · Post by **towforce** » Tue Nov 26, 2019 8:58 pm

Daniel Shawul wrote: ↑Fri Nov 22, 2019 11:08 pm
shrapnel wrote: ↑Fri Nov 22, 2019 4:01 pm
IanO wrote: ↑Fri Nov 22, 2019 2:51 pmI can't believe the whole forum isn't a-buzz about this!
For the simple reason that its another nail in the coffin of the AB Engines and since most of the old die-hards here swear by AB Engines, you can't realistically expect them to be ecstatic about this new Development.
There was lots of discussions on lc0 discord about it yesterday, probably still ongoing. AB is not dead, Stockfish is still a contender for the top spot.

Also if stockfish's tactics pass a certain threshold, NN could die away, so it is a 50-50 for me.

The end of machine chess (and similar board games played by machine) will be developers learning how to do the following:

1. The construction of a relatively simple evaluation function (EF) that runs quickly and is highly accurate
2. Challenging the world to play from the starting position
3. Play only moves that the EF recommends
4. Find a way to play a game to a win
5. If nobody can, then this EF will be an optimal strategy

Beyond this, all that will remain will be to make the EF smaller and faster.

I don't think it will be an NN that gets us to this point. They're obviously good at learning, but it seems to me that this is too big an ask for this particular technology.

Ovyron · Post by **Ovyron** » Wed Nov 27, 2019 8:23 am

towforce wrote: ↑Tue Nov 26, 2019 8:58 pmBeyond this, all that will remain will be to make the EF smaller and faster.

What if the only way to make it faster was to make it bigger?

I picture it as an Immense Evaluation Function, say, some 80GB of chess data that stores chess patters and is able to tell you what is the best move by looking at the patters on the board and studying the structure of them to arrive at the right move without any search. Suppose that it required 24 hours of processing to arrive to an unbeatable response (against "the world" or any possible opponent), and beyond this time it'd only find alternative moves that perhaps increased winning chances.

And that to cut the time in half so it could do the miracle in 12 hours, you'd need to have at least twice the patterns, because you don't need to study as many structures if you have a "cheat sheet". So there could be a trade-off between speed and size.

towforce · Post by **towforce** » Wed Nov 27, 2019 9:49 am

Ovyron wrote: ↑Wed Nov 27, 2019 8:23 amI picture it as an Immense Evaluation Function, say, some 80GB of chess data that stores chess patters and is able to tell you what is the best move by looking at the patters on the board and studying the structure of them to arrive at the right move without any search.

It seems to me that you are making a presumption that the only way to make an ultimate evaluation function (EF) is to behave like an enhanced human brain. I cannot prove that this is wrong yet, but to me it seems likely that it is wrong.

In simple games (e.g. Nim), it's easy to discover simple rules that will choose optimal moves. Such rules haven't been found for chess. There are two possible reasons for this:

1. No such rule exists

2. Such rules do exist, but we've been looking in the wrong places

It seems to me that it's VERY likely that relatively simple (in comparison to the kinds of processes we use to choose chess moves today) rules do exist - but to find them we'll need to find better ways to search than we're using right now.

btw - I salute the work that's currently being done: to be able to choose GM level moves on small devices in very little time is a PHENOMENAL achievement by the standards of just 25 years ago!

dragontamer5788 · Post by **dragontamer5788** » Wed Nov 27, 2019 4:02 pm

Ovyron wrote: ↑Wed Nov 27, 2019 8:23 am
towforce wrote: ↑Tue Nov 26, 2019 8:58 pmBeyond this, all that will remain will be to make the EF smaller and faster.
What if the only way to make it faster was to make it bigger?

I picture it as an Immense Evaluation Function, say, some 80GB of chess data that stores chess patters and is able to tell you what is the best move by looking at the patters on the board and studying the structure of them to arrive at the right move without any search. Suppose that it required 24 hours of processing to arrive to an unbeatable response (against "the world" or any possible opponent), and beyond this time it'd only find alternative moves that perhaps increased winning chances.

And that to cut the time in half so it could do the miracle in 12 hours, you'd need to have at least twice the patterns, because you don't need to study as many structures if you have a "cheat sheet". So there could be a trade-off between speed and size.

Give me ~50-days (instead of 24-hours) with that computer, so that my computer evaluates all possible moves (ex: Instead of evaluating the start-position in chess, evaluate the 20-candidate moves: a4, b4, c4, d4, e4... etc. etc.). I then choose mini-max move.

EDIT: Give me 9-trillion times the base time, and I'll be able to Alpha-beta prune to roughly 15-ply with random move ordering. (despite perft scores of 2-billion-trillion positions, most of those positions are avoided with AB-pruning, even with random move ordering).

Search will always "improve" any heuristic function you have. Applying your heuristic to one-move into the future is far better than applying it to the current board state. It takes roughly 30x longer per ply to perform that kind of evaluation however (although random-move ordering + AB-pruning gets you down to #moves ^ .75 growth, so there are plenty of search optimizations available...)

There's clearly some effort that should be made at making a better heuristic function. LeelaZero / DeepMind is proving that "deep evaluations" can be competitive, even with only 50,000 nodes/second search performance. But if you're saying that 1/86400 nodes/second (aka: 1-eval per day) search functions might be feasible in the future... I think that's probably spending too much time on the evaluation heuristic.

Hard to say where the "proper tradeoff" is, but I'm going to bet it somewhere between 100,000,000 nodes/sec (Stockfish) and 50,000 nodes/sec (LeelaZero). Maybe 1,000 nodes/sec is a better lower-bound in the off-chance that LeelaZero's larger nets become stronger than their current nets.

Ovyron · Post by **Ovyron** » Thu Nov 28, 2019 4:48 am

dragontamer5788 wrote: ↑Wed Nov 27, 2019 4:02 pmGive me ~50-days (instead of 24-hours) with that computer, so that my computer evaluates all possible moves (ex: Instead of evaluating the start-position in chess, evaluate the 20-candidate moves: a4, b4, c4, d4, e4... etc. etc.). I then choose mini-max move.

Ah but if you already have a perfect move within 24 hours, then searching over that can actually be counter-productive. This is because of self-refutation, and we're talking about playing a game "against the world", which would be fallible. Then you have that it's possible that within 24 hours the output already gives the move that would be most likely to beat the world, but that after 50 days, it has refuted all promising lines and the move is random, or it's arbitrarily the move at the top when the cut happens, even though it's just the move left to refute.

I have been unable to defeat much inferior opponents because of this, there were variations where I knew what they had to play so I didn't go for them, but once I went for them anyway more games were won on the line.

Finding optimal moves is hard, because they tend to be suboptimal against stronger opposition, and if you spend your time refuting yourself, your minimax will resemble those 7men Lomonosov Tablebases that can only tell you everything draws.

M ANSARI · Post by **M ANSARI** » Fri Nov 29, 2019 7:41 am

This is quite interesting … there might be some surprises yet! Maybe chess DOES follow a statistical path but it just needs enough data to set that path into stone. More realistically though, I think that there will always be some permutations when statistics fall and some tactics will be missed … thus I still believe that for the near future the best chess engine will be an NN engine with an AB safety net to make sure no tactics are missed.

towforce · Post by **towforce** » Fri Nov 29, 2019 9:40 am

M ANSARI wrote: ↑Fri Nov 29, 2019 7:41 amThis is quite interesting … there might be some surprises yet! Maybe chess DOES follow a statistical path but it just needs enough data to set that path into stone. More realistically though, I think that there will always be some permutations when statistics fall and some tactics will be missed...

I'm thinking "insight" not "statistics". Think of the game of nim: it's a simple game, and I have no doubt that an NN (or other statistical technique) could be trained to play it perfectly - but it can be played perfectly with a very small program (which can run almost instantaneously) that follows a simple rule - and that rule was created by "insight".

A rule to evaluate a chess position would be much more complex than the rule to evaluate a nim position - but I can think of absolutely no reason whatsoever why such a rule wouldn't exist, and would be MASSIVELY quicker and simpler than current methods of evaluating a chess position. For me, it's a question of how to find (or construct) such an EF - a subject I discussed in a recent thread.

I think the mistake we've all been making (leaving aside NNs) is to work on EFs like a human, using human chess knowledge. It's likely that the ultimate EF will use knowledge that human's wouldn't stumble into while playing the game, because the knowledge is not easy to visualise.

tsoj · Post by **tsoj** » Thu Dec 03, 2020 1:15 am

If I understand it correctly then MuZero uses some tree search where every move it makes correspondents to a move made on the board. However, the interesting idea is that this is not necessarily the only possibility. One could imaging that MuZero makes sometimes an abstract move that doesn't need to correspond with a single move (a single timestep) on the board but an "idea" that, if executed, leads to another internal state that can be evaluated. I can't immediately think of a way how to train this, but this could be very interesting. It might not be a huge improvement in chess, but engineering an "idea" is something I find very interesting.

New DeepMind paper

Re: New DeepMind paper

Re: New DeepMind paper

Re: New DeepMind paper [MuZero]

Re: New DeepMind paper [MuZero]

Re: New DeepMind paper [MuZero]

Re: New DeepMind paper [MuZero]

Re: New DeepMind paper [MuZero]

Re: New DeepMind paper

Re: New DeepMind paper

Re: New DeepMind paper