In chess,AlphaZero outperformed Stockfish after just 4 hours

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Mon Dec 18, 2017 3:07 pm

Vinvin wrote:
Rebel wrote:...
the alleged 4 hours self-play
...
"4 hours" can be misleading because Google team can use 1 machine or 10 machines or 100 machines or ...

You should not be sceptical, I am fully certain, now, 2 weeks and 84*(4 hours) later they are already at 6500 elos.

Congratulations!

Henk · Post by **Henk** » Mon Dec 18, 2017 3:29 pm

I am waiting for the news that chess already has been solved.

By the way it is demotivating to work on your old school engine if you know that the new A0 approach give so much better results. Pity for all the work you spend on your alpha beta algorithm.

Ozymandias · Post by **Ozymandias** » Mon Dec 18, 2017 4:25 pm

kranium wrote:
Ozymandias wrote:The training phase... didn't it consist of 44 million games or something like that? If that's the case, I don't see how they could be played in just four hours.
Like MIlos said:

"4h on 5000TPUs where each TPU is equivalent to roughly 2 new GV100 or 10 1080Ti which is currently the top of the range graphics card normal individuals can afford. So those 4h of training time is like over 30 years of training on 1080Ti."

This is an enormous resource...self-play usually involves lightning games, sometimes as fast as 1 sec + 1 ms inc.
Just do the math and one can see see how it's possible.

I didn't think they could output so many games per second, people argued that the HW difference between the PC used by SF and the machine in which the NN worked, wasn't that high. Unless they replaced the machine for the match, it clearly was they case.

syzygy · Post by **syzygy** » Mon Dec 18, 2017 8:48 pm

Ozymandias wrote:The training phase... didn't it consist of 44 million games or something like that? If that's the case, I don't see how they could be played in just four hours.

Just read the paper: https://arxiv.org/pdf/1712.01815.pdf

9 hours for 44 million self-play games corresponding to 700,000 training batches of 4096 positions (so 65 positions per game, which seems reasonable), so a bit less than 20 million games in 4 hours. Each position corresponded to an MCTS with 800 "simulations"/NN evaluations.

At the 4-hour point (300,000 training batches), AlphaZero became stronger than SF. See Figure 1. It seems the network reached its saturation point before the 4-hour point. This could be improved upon by using a bigger network (which would then need still more training, but with Google's resources that would be just a matter of weeks).

Figure 1 was created from the results of a tournament between various iterations of AlphaZero and Stockfish as a base player. This tournament was played at 1 second per move.

Since AlphaZero is a bit stronger than Stockfish at 1 second per move and, apparently, scales better than Stockfish, it is no surprise that AlphaZero beats Stockfish handily at 1 minute per move.

Is 700,000 x 4096 searches in 9 hours possible? Let's see: they used 5000 TPUs, so each TPU had to do 573440 searches in 9 hours, which is 17.7 positions per second, or 56.5ms per MCTS. According to the paper, each 800-node MCTS took 40ms.

So there was about 2.5 hours left!

But not really: they also needed time to process each batch to adjust the weights. I don't think the paper tells us how much time that took per batch, but I currently have no reason to doubt that those about 2.5 hours sufficed.

This is all based on the paper, not on speculation.

syzygy · Post by **syzygy** » Mon Dec 18, 2017 8:52 pm

Ozymandias wrote:I didn't think they could output so many games per second, people argued that the HW difference between the PC used by SF and the machine in which the NN worked, wasn't that high. Unless they replaced the machine for the match, it clearly was they case.

They used lots of HW for training.

They used a big PC with 4 TPU expansion cards (each using just 28-40 Watt) for playing. SF likely played on the same big PC but obviously did not use the TPUs.

It's all documented quite well.

(Note that SF also uses lots of HW for tuning *and* a lot of human brains. The AlphaZero approach seems to be far more suitable for massive parallelisation.)

CheckersGuy · Post by **CheckersGuy** » Mon Dec 18, 2017 9:42 pm

Really intresting how people make comments about AlphaZero without taking a look at the paper first

Ovyron · Post by **Ovyron** » Mon Dec 18, 2017 9:54 pm

Rebel wrote:
Ovyron wrote:
Rebel wrote:No mobility, no king safety, no passed pawn evaluation, no castling knowledge, not even piece values?
Yup, I think true Artificial Intelligence has finally arrived, and it can do things like this and others that I would have never imagined to be possible.

Some examples of similar AIs:

AI can extract the style of a photo and turn another photo into that style
AI can learn how to make paintings of any artist of history and use any image to show how that artist would have painted it.
AI takes text as input and creates new photo realistic images indistingishable from actual photos.
AI learns how humans lips move when talking, so it can sync a video of anybody to any audio talking.
AI learns how celebrities look like and can invent new faces for fake ones that look real.
AI learns how art looks like, so it can turn your doodles into works of art.
AI learns how video works, so it can predict the future and create videos from still images
AI learns how images become pixelated when you scale them down and manages to reverse the process, turning pixelated messes into High Resolution images.
AI learns how visual expressions work and can swap the expressions of two people.
AI can turn your sketches into photo realistic images.
AI learns how to play non-deterministic video games just like humans.
All driven by domain specific knowledge, thus off-topic.

The domain specific knowledge these AI were given are equivalent to the rules of chess that were given to A0. It's not like A0 started moving the pieces aimplessly until finally learning how the knight moves, and finally learned how to castle, that domain specific knowledge was given to it, together with capturing promoting, checkmating...

But no more is necessary, because, for all we know, "piece values", "square bonuses", "king safety", etc. may be human constructions not necessary to play good chess. With the current approach, sure, take them out of your engine and it will only get weaker, but it's because you're not replacing it with whatever A0 is doing.

Who says a Knight is worth 3 pawns? The only reason a Knight and a Bishop are so close in value is because a piece's value decreases as it tries to avoid being traded, so that adding extra moves to either piece (say, diagonal-noncapturing moves to the knight, or sideways-noncapturing moves to the Bishop, so it can change colors) doesn't make it much more valuable, since it's still going to be traded for a normal Knight/Bishop, and it's going to be harder to trade it for as much pawns just like rook sacrifices are rare.

What can be observed from the 10 games:

Alpha Zero doesn't really have conventional values for pieces, or they seem very dynamic. If it doesn't think the pieces would allow the pawns to advance, it thinks pawns are worthless and doesn't mind leaving the opponent with many extra pawns. If it thinks the pawns won't let a piece move, it doesn't mind sacrificing the piece for a pawn, specially if in the process the opponent loses mobility.

It's as if A0 learned to play a different game, where a rook may be worthless if it sits at a8 and can't move the entire game, while Stockfish still has to think it begins with a value of 5 pawns and has to decrease its value with penalties.

Perhaps the way we're doing it is wrong, conceptually, from the get-go, and the only reason Stockfish was able to draw 73 of its games is because this approach is so advanced while A0's approach is still in its infancy.

Ovyron · Post by **Ovyron** » Mon Dec 18, 2017 10:07 pm

kranium wrote:Ivanhoe has a Montecarlo search implementation (with which I'm fairly familiar) and it works quite well.

The default implementation uses a sort of 'searchmoves' algorithm:
go montecarlo cpus 8 min -25 max 325 length 40 depth 10 moves c2c4 d2d4 e2e4 g1f3

Years ago I experimented with a version that would obtain the root move list from current position and actually play a strong game.
If you send it all 20 possible moves from the traditional start position, you'd be amazed how quickly the potential move choices are narrowed down...and it usually plays 1. c4 or 1. e4
I still have it if anyone interested (but it does crash once in awhile).

Yes, I'm interested.

yurikvelo · Post by **yurikvelo** » Mon Dec 18, 2017 10:07 pm

hgm wrote: A0 = Google's Alpha Zero. See the many recent threads about this in the various forum sections.

I've read a lot, but did not understand - it is strong in games played from starting position by strong opponent (because that was a dataset its NN was trained on) or can solve any tasks (arbitrary FEN) as a 3200+ engine.
Will it win Stockfish with 1pawn handicap (SF has no f7 pawn)?

Checkers were solved, but there is no 24-men EGTB for checkers. It can win against perfect player from starting position, not solve arbitrary FEN position.

hgm · Post by **hgm** » Mon Dec 18, 2017 10:15 pm

Well, this is not solving Chess or perfect play. It is just a very strong player. It was trained from the FIDE opening position, and evaluated in 100-game matches from that opening position and 12 others (so 1300 games in total), but the 12 other start positions were all in the opening tree that it trained.

But I have little doubt that it would also be very strong in positions it never saw before (e.g. Chess960 start positions). If it did not understand the general principles of Chess, it could never have played well enough during the entire game to beat Stockfish.

In chess,AlphaZero outperformed Stockfish after just 4 hours

From the document - In chess, AlphaZero outperformed Stockfish after just 4 hours. How believable is that?

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h