Google's AlphaGo team has been working on chess

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 7:07 pm

Henk wrote:At the start of training only random moves will be played. So that means all games will end in a fifty move draw. So how do they get the move probability distribution right in the very first stage of training.

Or did they use some end game knowledge.

Why do you think that all of those games will end in the 50 rule move ? Some of those will be losses and (very) few will be wins. If the ai learns that a given line leads to a draw it might try something else that either works or doesnt.

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 13, 2017 7:18 pm

CheckersGuy wrote:
Henk wrote:At the start of training only random moves will be played. So that means all games will end in a fifty move draw. So how do they get the move probability distribution right in the very first stage of training.

Or did they use some end game knowledge.
Why do you think that all of those games will end in the 50 rule move ? Some of those will be losses and (very) few will be wins. If the ai learns that a given line leads to a draw it might try something else that either works or doesnt.

I am not sure where the loss-win asymmetry comes from, or even what it means in the context of an engine playing itself.

A few years ago (I believe it was 2013) I trained a neural network that would compute an evaluation function for Spanish checkers, starting from random moves and using reinforcement learning to learn. This worked very well, but I did have to implement smallish (6-men) EGTBs for the process to work well, because the search I was using to generate games wasn't strong enough to discover some important facts about how much advantage is enough advantage to win the game.

hgm · Post by **hgm** » Wed Dec 13, 2017 7:18 pm

I would think there are as may losses as wins, in self-play.

I dug up a posting from the late Steven Edwards, on random games:

sje wrote:Random game mating probabilities

Some data from 24,478,109 games, each made from randomly generated moves:

There were 3,747,489 checkmates (15.31%)
Of the checkmates, 1,872,426 were White getting checkmated (49.96%).
Of the checkmates, 1,875,063 were Black getting checkmated (50.04%).

There were 1,499,382 stalemates (6.13%)
Of the stalemates, 754,025 were White getting stalemated (50.29%).
Of the stalemates, 745,357 were Black getting stalemated (49.71%).

Ad another one:

sje wrote:One billion random games:

Code: Select all

0.153051   checkmate
0.193435   fiftymoves
0.56713   insufficient
0.0251883   repetition
0.0611956   stalemate

mean length&#58; 334.354
limit&#58; 1000000000
usage&#58; 185920
frequency&#58; 5378.64
period&#58; 0.00018592

15% wins is a good starting point. The NN would probably quickly develop a preferece for pushimg Paws, as promotions would give it more material and thus a larger chace to accidetally checkmate. This tendecy would strongly suppress the 50-move draws in positions that aren't really draws.

trulses · Post by **trulses** » Wed Dec 13, 2017 7:31 pm

You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.

Milos · Post by **Milos** » Wed Dec 13, 2017 8:41 pm

trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.

It seems you got it wrong.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 8:48 pm

Milos wrote:
trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
It seems you got it wrong.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 13, 2017 9:19 pm

Milos wrote:
trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
It seems you got it wrong.

No, he is precisely correct.

There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.

In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.

syzygy · Post by **syzygy** » Wed Dec 13, 2017 9:36 pm

CheckersGuy wrote:
syzygy wrote:QS is an interesting point. It seems AlphaZero doesn't care whether the position being expanded and NN-evaluated is a quiet position or not.

But the QS point also shows that an evaluation function only works with a minimum of search. The evaluation function can know a lot about passed pawns, but if a relatively simple combination wins the pawn, that knowledge is not worth much. For the same reason I do not yet rule out that the strength of AlphaZero's evaluation function becomes more apparent as its search takes care of the immediate tactics that its NN cannot grasp.
I think you have something mixedup. The tree is only built up once per game and this tree won't be kept in the memory after the root position changes (your opponent or yourself made a move).

I think it is you who is mixing things up, since your second sentence bears absolutely no relation to what I wrote and you quoted.

AlphaZero actually does carry over the subtree of the move it plays to the next move, as you can read in the paper, but again, that has nothing to do with what I wrote.

Milos · Post by **Milos** » Wed Dec 13, 2017 9:43 pm

CheckersGuy wrote:Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches

You really don't get how UCT without rollouts operates, do you?
Try simulating it on the paper, it would help.
It has nothing to do with NN output, even with distribution of probabilities, because algorithm tries to get as even exploration of the tree as possible.
What I can immediately tell you that in training matches (800 MCTS per move) depth is hardly over 10, and actual games against SF expected (80000 MCTS per move) depth that was reached is up to 20 at most.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 9:51 pm

Milos wrote:
CheckersGuy wrote:Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches
You really don't get how UCT without rollouts operates, do you?
Try simulating it on the paper, it would help.
It has nothing to do with NN output, even with distribution of probabilities, because algorithm tries to get as even exploration of the tree as possible.
What I can immediately tell you that in training matches (800 MCTS per move) depth is hardly over 10, and actual games against SF expected (80000 MCTS per move) depth that was reached is up to 20 at most.

I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understand

The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....

In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.

Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess