In chess,AlphaZero outperformed Stockfish after just 4 hours

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

From the document - In chess, AlphaZero outperformed Stockfish after just 4 hours. How believable is that?

I believe it as written
37
54%
I am sceptic
21
30%
I don't (can't) believe it
8
12%
I am undecided
3
4%
 
Total votes: 69

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by Lyudmil Tsvetkov »

Vinvin wrote:
Rebel wrote:...
the alleged 4 hours self-play
...
"4 hours" can be misleading because Google team can use 1 machine or 10 machines or 100 machines or ...
You should not be sceptical, I am fully certain, now, 2 weeks and 84*(4 hours) later they are already at 6500 elos.

Congratulations! :lol: :lol: :lol:
Henk
Posts: 7220
Joined: Mon May 27, 2013 10:31 am

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by Henk »

I am waiting for the news that chess already has been solved.

By the way it is demotivating to work on your old school engine if you know that the new A0 approach give so much better results. Pity for all the work you spend on your alpha beta algorithm.
User avatar
Ozymandias
Posts: 1535
Joined: Sun Oct 25, 2009 2:30 am

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by Ozymandias »

kranium wrote:
Ozymandias wrote:The training phase... didn't it consist of 44 million games or something like that? If that's the case, I don't see how they could be played in just four hours.
Like MIlos said:

"4h on 5000TPUs where each TPU is equivalent to roughly 2 new GV100 or 10 1080Ti which is currently the top of the range graphics card normal individuals can afford. So those 4h of training time is like over 30 years of training on 1080Ti."

This is an enormous resource...self-play usually involves lightning games, sometimes as fast as 1 sec + 1 ms inc.
Just do the math and one can see see how it's possible.
I didn't think they could output so many games per second, people argued that the HW difference between the PC used by SF and the machine in which the NN worked, wasn't that high. Unless they replaced the machine for the match, it clearly was they case.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by syzygy »

Ozymandias wrote:The training phase... didn't it consist of 44 million games or something like that? If that's the case, I don't see how they could be played in just four hours.
Just read the paper: https://arxiv.org/pdf/1712.01815.pdf

9 hours for 44 million self-play games corresponding to 700,000 training batches of 4096 positions (so 65 positions per game, which seems reasonable), so a bit less than 20 million games in 4 hours. Each position corresponded to an MCTS with 800 "simulations"/NN evaluations.

At the 4-hour point (300,000 training batches), AlphaZero became stronger than SF. See Figure 1. It seems the network reached its saturation point before the 4-hour point. This could be improved upon by using a bigger network (which would then need still more training, but with Google's resources that would be just a matter of weeks).

Figure 1 was created from the results of a tournament between various iterations of AlphaZero and Stockfish as a base player. This tournament was played at 1 second per move.

Since AlphaZero is a bit stronger than Stockfish at 1 second per move and, apparently, scales better than Stockfish, it is no surprise that AlphaZero beats Stockfish handily at 1 minute per move.

Is 700,000 x 4096 searches in 9 hours possible? Let's see: they used 5000 TPUs, so each TPU had to do 573440 searches in 9 hours, which is 17.7 positions per second, or 56.5ms per MCTS. According to the paper, each 800-node MCTS took 40ms.

So there was about 2.5 hours left!

But not really: they also needed time to process each batch to adjust the weights. I don't think the paper tells us how much time that took per batch, but I currently have no reason to doubt that those about 2.5 hours sufficed.

This is all based on the paper, not on speculation.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by syzygy »

Ozymandias wrote:I didn't think they could output so many games per second, people argued that the HW difference between the PC used by SF and the machine in which the NN worked, wasn't that high. Unless they replaced the machine for the match, it clearly was they case.
They used lots of HW for training.

They used a big PC with 4 TPU expansion cards (each using just 28-40 Watt) for playing. SF likely played on the same big PC but obviously did not use the TPUs.

It's all documented quite well.

(Note that SF also uses lots of HW for tuning *and* a lot of human brains. The AlphaZero approach seems to be far more suitable for massive parallelisation.)
CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by CheckersGuy »

Really intresting how people make comments about AlphaZero without taking a look at the paper first :lol:
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by Ovyron »

Rebel wrote:
Ovyron wrote:
Rebel wrote:No mobility, no king safety, no passed pawn evaluation, no castling knowledge, not even piece values?
Yup, I think true Artificial Intelligence has finally arrived, and it can do things like this and others that I would have never imagined to be possible.

Some examples of similar AIs:

AI can extract the style of a photo and turn another photo into that style
AI can learn how to make paintings of any artist of history and use any image to show how that artist would have painted it.
AI takes text as input and creates new photo realistic images indistingishable from actual photos.
AI learns how humans lips move when talking, so it can sync a video of anybody to any audio talking.
AI learns how celebrities look like and can invent new faces for fake ones that look real.
AI learns how art looks like, so it can turn your doodles into works of art.
AI learns how video works, so it can predict the future and create videos from still images
AI learns how images become pixelated when you scale them down and manages to reverse the process, turning pixelated messes into High Resolution images.
AI learns how visual expressions work and can swap the expressions of two people.
AI can turn your sketches into photo realistic images.
AI learns how to play non-deterministic video games just like humans.
All driven by domain specific knowledge, thus off-topic.
The domain specific knowledge these AI were given are equivalent to the rules of chess that were given to A0. It's not like A0 started moving the pieces aimplessly until finally learning how the knight moves, and finally learned how to castle, that domain specific knowledge was given to it, together with capturing promoting, checkmating...

But no more is necessary, because, for all we know, "piece values", "square bonuses", "king safety", etc. may be human constructions not necessary to play good chess. With the current approach, sure, take them out of your engine and it will only get weaker, but it's because you're not replacing it with whatever A0 is doing.

Who says a Knight is worth 3 pawns? The only reason a Knight and a Bishop are so close in value is because a piece's value decreases as it tries to avoid being traded, so that adding extra moves to either piece (say, diagonal-noncapturing moves to the knight, or sideways-noncapturing moves to the Bishop, so it can change colors) doesn't make it much more valuable, since it's still going to be traded for a normal Knight/Bishop, and it's going to be harder to trade it for as much pawns just like rook sacrifices are rare.

What can be observed from the 10 games:

Alpha Zero doesn't really have conventional values for pieces, or they seem very dynamic. If it doesn't think the pieces would allow the pawns to advance, it thinks pawns are worthless and doesn't mind leaving the opponent with many extra pawns. If it thinks the pawns won't let a piece move, it doesn't mind sacrificing the piece for a pawn, specially if in the process the opponent loses mobility.

It's as if A0 learned to play a different game, where a rook may be worthless if it sits at a8 and can't move the entire game, while Stockfish still has to think it begins with a value of 5 pawns and has to decrease its value with penalties.

Perhaps the way we're doing it is wrong, conceptually, from the get-go, and the only reason Stockfish was able to draw 73 of its games is because this approach is so advanced while A0's approach is still in its infancy.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by Ovyron »

kranium wrote:Ivanhoe has a Montecarlo search implementation (with which I'm fairly familiar) and it works quite well.

The default implementation uses a sort of 'searchmoves' algorithm:
go montecarlo cpus 8 min -25 max 325 length 40 depth 10 moves c2c4 d2d4 e2e4 g1f3

Years ago I experimented with a version that would obtain the root move list from current position and actually play a strong game.
If you send it all 20 possible moves from the traditional start position, you'd be amazed how quickly the potential move choices are narrowed down...and it usually plays 1. c4 or 1. e4
I still have it if anyone interested (but it does crash once in awhile).
Yes, I'm interested.
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by yurikvelo »

hgm wrote: A0 = Google's Alpha Zero. See the many recent threads about this in the various forum sections.
I've read a lot, but did not understand - it is strong in games played from starting position by strong opponent (because that was a dataset its NN was trained on) or can solve any tasks (arbitrary FEN) as a 3200+ engine.
Will it win Stockfish with 1pawn handicap (SF has no f7 pawn)?

Checkers were solved, but there is no 24-men EGTB for checkers. It can win against perfect player from starting position, not solve arbitrary FEN position.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Post by hgm »

Well, this is not solving Chess or perfect play. It is just a very strong player. It was trained from the FIDE opening position, and evaluated in 100-game matches from that opening position and 12 others (so 1300 games in total), but the 12 other start positions were all in the opening tree that it trained.

But I have little doubt that it would also be very strong in positions it never saw before (e.g. Chess960 start positions). If it did not understand the general principles of Chess, it could never have played well enough during the entire game to beat Stockfish.