Jouni wrote: ↑Fri Nov 22, 2019 10:47 pm
I don't believe, that You can learn to play without the rules of the game
. Not possible.
Well, it does get the rules of the game told to it, but not in a conventional manner. And it doesn't "learn" the rules in any sort of conventional manner either. It doesn't really learn chess, it creates an internal model of chess, where the rules are probabalistic, not firm as in normal chess.
From the paper:
Actions available. AlphaZero used the set of legal actions obtained from the simulator to mask the prior
produced by the network everywhere in the search tree. MuZero only masks legal actions at the root of the
search tree where the environment can be queried, but does not perform any masking within the search tree.
This is possible because the network rapidly learns not to predict actions that never occur in the trajectories
it is trained on
It does get game rules, but in a different form to being told "bishops move on diagonals" and so on. At the root, it is told which moves are legal and which not.
MuZero only masks legal actions at the root of the search tree where the environment can be queried.
Within the tree it can play any move, including Kh1-d8 if it wants, but since it "learns" from the games it produces, Kh1-d8 won't occur and "
the network rapidly learns not to predict actions that never occur in the trajectories it is trained on".
It creates a probalistic model of chess with probable rules, and the model is hidden in network weights, as per usual. It ends up being able to simulate playing chess. Which is fine. Why not? Especially if it works and plays strong. It will never actually play illegal moves (because illegal moves are masked away at ply zero), but it will play all manner of illegal things in the search tree.
MuZero does not give special treatment to terminal nodes and always uses the value predicted by the network. Inside the tree, the search can proceed past a terminal node - in this case the network is expected to always predict the same value.
Presumably then, in the tree, states can exist where one or both sides have no king, for example.
I have no idea what this means:
This is achieved by treating terminal states as absorbing states during training.
So, it is not quite true to say MuZero doesn't get told the rules of the game. It is told at ply zero. And the games that it generates in self-play will all be legal games (because each move can only be selected from the legal move list known at ply zero). Each self play games will also terminate legally, so it will get to understand termination rules via the ply one move knowledge.
I'm not sure if it gets told the game score, I'ld guess so. That would be more Rules information.