New DeepMind paper

GregNeto · Post by **GregNeto** » Thu Nov 21, 2019 8:52 am

DeepMind published a new paper.
It might be interesting for the experts. It´s a new approach to learning, which they evaluated on several board games.
https://arxiv.org/pdf/1911.08265.pdf

IanO · Post by **IanO** » Fri Nov 22, 2019 2:51 pm

I can't believe the whole forum isn't a-buzz about this! The new algorithm MuZero is extending the zero knowledge concept by not even pre-programming the rules of the game! It is building the model from scratch based solely on board state transitions and expected rewards. All it is retaining is the tree search.

They tested it on the three AlphaZero games (chess, go, and shogi) plus screen-scraping 57 classic Atari video games. It matched or exceeded the state-of-the-art in every domain, often with drastically fewer training resources!

shrapnel · Post by **shrapnel** » Fri Nov 22, 2019 4:01 pm

IanO wrote: ↑Fri Nov 22, 2019 2:51 pm I can't believe the whole forum isn't a-buzz about this!

For the simple reason that its another nail in the coffin of the AB Engines and since most of the old die-hards here swear by AB Engines, you can't realistically expect them to be ecstatic about this new Development.

Jouni · Post by **Jouni** » Fri Nov 22, 2019 10:47 pm

I don't believe, that You can learn to play without the rules of the game

. Not possible.

Daniel Shawul · Post by **Daniel Shawul** » Fri Nov 22, 2019 11:08 pm

shrapnel wrote: ↑Fri Nov 22, 2019 4:01 pm
IanO wrote: ↑Fri Nov 22, 2019 2:51 pm I can't believe the whole forum isn't a-buzz about this!
For the simple reason that its another nail in the coffin of the AB Engines and since most of the old die-hards here swear by AB Engines, you can't realistically expect them to be ecstatic about this new Development.

There was lots of discussions on lc0 discord about it yesterday, probably still ongoing. AB is not dead, Stockfish is still a contender for the top spot.
Also if stockfish's tactics pass a certain threshold, NN could die away, so it is a 50-50 for me.

I don't believe, that You can learn to play without the rules of the game . Not possible.

Better believe it

It learned to generate legal moves (not always though) for all pieces, it learned when to stop the game (mate, stalemate) (again not always), and even learned 50-move rule. That is the whole point of the paper, and they were able to improve upon AlphaGo and match AlphaZero in chess on a fixed-nodes test.

towforce · Post by **towforce** » Sat Nov 23, 2019 4:53 pm

I would think that each human player has this experience. Most of us weren't told 'you may not start playing chess until you have passed a test to prove you know the rules'.

chrisw · Post by **chrisw** » Sat Nov 23, 2019 7:07 pm

Jouni wrote: ↑Fri Nov 22, 2019 10:47 pm I don't believe, that You can learn to play without the rules of the game . Not possible.

Well, it does get the rules of the game told to it, but not in a conventional manner. And it doesn't "learn" the rules in any sort of conventional manner either. It doesn't really learn chess, it creates an internal model of chess, where the rules are probabalistic, not firm as in normal chess.

From the paper:
Actions available. AlphaZero used the set of legal actions obtained from the simulator to mask the prior
produced by the network everywhere in the search tree. MuZero only masks legal actions at the root of the
search tree where the environment can be queried, but does not perform any masking within the search tree.
This is possible because the network rapidly learns not to predict actions that never occur in the trajectories
it is trained on

It does get game rules, but in a different form to being told "bishops move on diagonals" and so on. At the root, it is told which moves are legal and which not. MuZero only masks legal actions at the root of the search tree where the environment can be queried.

Within the tree it can play any move, including Kh1-d8 if it wants, but since it "learns" from the games it produces, Kh1-d8 won't occur and "the network rapidly learns not to predict actions that never occur in the trajectories it is trained on".

It creates a probalistic model of chess with probable rules, and the model is hidden in network weights, as per usual. It ends up being able to simulate playing chess. Which is fine. Why not? Especially if it works and plays strong. It will never actually play illegal moves (because illegal moves are masked away at ply zero), but it will play all manner of illegal things in the search tree. MuZero does not give special treatment to terminal nodes and always uses the value predicted by the network. Inside the tree, the search can proceed past a terminal node - in this case the network is expected to always predict the same value.

Presumably then, in the tree, states can exist where one or both sides have no king, for example.

I have no idea what this means:
This is achieved by treating terminal states as absorbing states during training.

So, it is not quite true to say MuZero doesn't get told the rules of the game. It is told at ply zero. And the games that it generates in self-play will all be legal games (because each move can only be selected from the legal move list known at ply zero). Each self play games will also terminate legally, so it will get to understand termination rules via the ply one move knowledge.
I'm not sure if it gets told the game score, I'ld guess so. That would be more Rules information.

smatovic · Post by **smatovic** » Sat Nov 23, 2019 7:32 pm

I guess another step would be to abandon the tree search completely and handle it by the neural network alone....

--
Srdja

Daniel Shawul · Post by **Daniel Shawul** » Sat Nov 23, 2019 7:46 pm

chrisw wrote: ↑Sat Nov 23, 2019 7:07 pm
Jouni wrote: ↑Fri Nov 22, 2019 10:47 pm I don't believe, that You can learn to play without the rules of the game . Not possible.
Well, it does get the rules of the game told to it, but not in a conventional manner. And it doesn't "learn" the rules in any sort of conventional manner either. It doesn't really learn chess, it creates an internal model of chess, where the rules are probabalistic, not firm as in normal chess.

From the paper:
Actions available. AlphaZero used the set of legal actions obtained from the simulator to mask the prior
produced by the network everywhere in the search tree. MuZero only masks legal actions at the root of the
search tree where the environment can be queried, but does not perform any masking within the search tree.
This is possible because the network rapidly learns not to predict actions that never occur in the trajectories
it is trained on

It does get game rules, but in a different form to being told "bishops move on diagonals" and so on. At the root, it is told which moves are legal and which not. MuZero only masks legal actions at the root of the search tree where the environment can be queried.

Within the tree it can play any move, including Kh1-d8 if it wants, but since it "learns" from the games it produces, Kh1-d8 won't occur and "the network rapidly learns not to predict actions that never occur in the trajectories it is trained on".

It creates a probalistic model of chess with probable rules, and the model is hidden in network weights, as per usual. It ends up being able to simulate playing chess. Which is fine. Why not? Especially if it works and plays strong. It will never actually play illegal moves (because illegal moves are masked away at ply zero), but it will play all manner of illegal things in the search tree. MuZero does not give special treatment to terminal nodes and always uses the value predicted by the network. Inside the tree, the search can proceed past a terminal node - in this case the network is expected to always predict the same value.

Presumably then, in the tree, states can exist where one or both sides have no king, for example.

I have no idea what this means:
This is achieved by treating terminal states as absorbing states during training.

So, it is not quite true to say MuZero doesn't get told the rules of the game. It is told at ply zero. And the games that it generates in self-play will all be legal games (because each move can only be selected from the legal move list known at ply zero). Each self play games will also terminate legally, so it will get to understand termination rules via the ply one move knowledge.
I'm not sure if it gets told the game score, I'ld guess so. That would be more Rules information.

The ply=0 is an exception for the sake of complying to game play rules, not something they actually needed.
For example, in Shogi they could have let it play illegal moves at ply=0 too AND the rules of shogi allow it. The rule is if you make an illegal
move you immediately loose the game. They mention in the paper that the way pieces move and legal move generation are learned quickly
so it could be making illegal moves 1 in 1000 for all we know.

It does get game rules, but in a different form to being told "bishops move on diagonals"

That would be a huge feeding of the rules of chess, and I don't think they do anything like that. They even went to the trouble of learning 50-move rule by expanding the history planes to 100!! Don't think history is that important.

Daniel Shawul · Post by **Daniel Shawul** » Sat Nov 23, 2019 7:58 pm

smatovic wrote: ↑Sat Nov 23, 2019 7:32 pm I guess another step would be to abandon the tree search completely and handle it by the neural network alone....

--
Srdja

That will probably will never happen, as search is about tactics instead of strategy. First you need a function approximator that is more accurate than SEE which I think current NN will probably not match interms of tactics.

New DeepMind paper

New DeepMind paper

Re: New DeepMind paper [MuZero]

Re: New DeepMind paper [MuZero]

Re: New DeepMind paper

Re: New DeepMind paper [MuZero]

Re: New DeepMind paper

Re: New DeepMind paper

Re: New DeepMind paper

Re: New DeepMind paper

Re: New DeepMind paper