AlphaZero performance

mar · Post by **mar** » Mon Dec 25, 2017 2:44 am

Just for fun, I generated virtual pgn of all A0-SF8 games (including opening positions, so we have a total of 1300 games) and ran through Ordo.
(after having seen this nonsense: https://www.youtube.com/watch?v=eN7BMWl_mpw)

Input (from A0's POV, all games):

Code: Select all

white&#58;
267W 378D 5L
black&#58;
51W 580D 19L

All games:

Code: Select all


   # PLAYER         &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 AlphaZero      &#58; 3383.5   14.1    797.0    1300   61.3%
   2 Stockfish 8    &#58; 3300.0   ----    503.0    1300   38.7%

White advantage = 66.25 +/- 6.98
Draw rate &#40;equal opponents&#41; = 50.00 % +/- 0.00

The match:

Code: Select all


   # PLAYER         &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 AlphaZero      &#58; 3406.8   53.9     64.0     100   64.0%
   2 Stockfish 8    &#58; 3300.0   ----     36.0     100   36.0%

White advantage = 85.72 +/- 26.87
Draw rate &#40;equal opponents&#41; = 50.00 % +/- 0.00

Ordo commandline:

Code: Select all

ordo-win32.exe -a 3300 -A "Stockfish 8" -W -p a0_sf8.pgn -s1000 -o rating_a0.txt

Generated PGN files (result only):
All: http://www.crabaware.com/Test/a0_sf8.pgn
Match: http://www.crabaware.com/Test/a0_sf8_match.pgn

So we can probably conclude that A0 is at most 100 elo stronger than SF8, which is a pretty good result for SF actually (certainly not crushing or devastating),
by which I don't want to play down the amazing achievement of A0 (even if the match ended the other way it'd still be huge)

Question remains how far could Deepmind go by training longer and using more than one 4TPU machine (assuming it would scale better than SF).

However, DeepMind certainly has much higher ambitions than claiming superiority in computer chess

, I wonder why so many people are upset.

The traditional tedious way of doing computer chess still lives (at least for the top dogs) and we should be grateful to DeepMind for showing us that there's still room for improvement.

Ovyron · Post by **Ovyron** » Mon Dec 25, 2017 3:04 am

Thanks. Gotta love results from virtual PGNs, not actually worse than IPON results

hgm · Post by **hgm** » Mon Dec 25, 2017 10:29 am

The easiest way to improve on Alpha Zero is not to train it longer, or make it search more nodes by using faster hardware during playing. It is starting the learning from scratch with a better NN. E.g. one that is better adapted to Chess, rather than general enough to also do go. By offering it efficiently pre-processed features of the position, such as SEE values for each square, X-rays, pins, etc.

mar · Post by **mar** » Mon Dec 25, 2017 4:01 pm

hgm wrote:The easiest way to improve on Alpha Zero is not to train it longer, or make it search more nodes by using faster hardware during playing. It is starting the learning from scratch with a better NN. E.g. one that is better adapted to Chess, rather than general enough to also do go. By offering it efficiently pre-processed features of the position, such as SEE values for each square, X-rays, pins, etc.

I haven't though about this and it makes perfect sense. It would be exciting to know how much they could improve with this approach (I guess adding more features might correlate with NN size and thus performance, but that could be tuned as well).

I guess DeepMind will move on to other challenges so we'll probably have to wait until similar poweful hardware becomes common (or maybe building a training network of tens of thousands of volunteers which seems unlikely).

hgm · Post by **hgm** » Mon Dec 25, 2017 4:16 pm

I am not sure how unrealistic that is. We don't need to train it in 4 hours. One year would be fine.

CheckersGuy · Post by **CheckersGuy** » Mon Dec 25, 2017 5:41 pm

Just take a loot at the LeelaZero project for inspiration. Training progress is made daily even though they are using an inferior neural network compared to alphaGoZero.

Leo · Post by **Leo** » Mon Dec 25, 2017 6:32 pm

mar wrote:Just for fun, I generated virtual pgn of all A0-SF8 games (including opening positions, so we have a total of 1300 games) and ran through Ordo.
(after having seen this nonsense: https://www.youtube.com/watch?v=eN7BMWl_mpw)

Input (from A0's POV, all games):
Code: Select all
white&#58;
267W 378D 5L
black&#58;
51W 580D 19L
All games:
Code: Select all
   # PLAYER         &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 AlphaZero      &#58; 3383.5   14.1    797.0    1300   61.3%
   2 Stockfish 8    &#58; 3300.0   ----    503.0    1300   38.7%

White advantage = 66.25 +/- 6.98
Draw rate &#40;equal opponents&#41; = 50.00 % +/- 0.00
The match:
Code: Select all
   # PLAYER         &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 AlphaZero      &#58; 3406.8   53.9     64.0     100   64.0%
   2 Stockfish 8    &#58; 3300.0   ----     36.0     100   36.0%

White advantage = 85.72 +/- 26.87
Draw rate &#40;equal opponents&#41; = 50.00 % +/- 0.00
Ordo commandline:
Code: Select all
ordo-win32.exe -a 3300 -A "Stockfish 8" -W -p a0_sf8.pgn -s1000 -o rating_a0.txt
Generated PGN files (result only):
All: http://www.crabaware.com/Test/a0_sf8.pgn
Match: http://www.crabaware.com/Test/a0_sf8_match.pgn

So we can probably conclude that A0 is at most 100 elo stronger than SF8, which is a pretty good result for SF actually (certainly not crushing or devastating),
by which I don't want to play down the amazing achievement of A0 (even if the match ended the other way it'd still be huge)

Question remains how far could Deepmind go by training longer and using more than one 4TPU machine (assuming it would scale better than SF).

However, DeepMind certainly has much higher ambitions than claiming superiority in computer chess , I wonder why so many people are upset.

The traditional tedious way of doing computer chess still lives (at least for the top dogs) and we should be grateful to DeepMind for showing us that there's still room for improvement.

I didn't like AZs arrogant in your face attitude in their press releases plus the fact that they handicapped SF.

Leo · Post by **Leo** » Mon Dec 25, 2017 6:47 pm

mar wrote:
hgm wrote:The easiest way to improve on Alpha Zero is not to train it longer, or make it search more nodes by using faster hardware during playing. It is starting the learning from scratch with a better NN. E.g. one that is better adapted to Chess, rather than general enough to also do go. By offering it efficiently pre-processed features of the position, such as SEE values for each square, X-rays, pins, etc.
I haven't though about this and it makes perfect sense. It would be exciting to know how much they could improve with this approach (I guess adding more features might correlate with NN size and thus performance, but that could be tuned as well).

I guess DeepMind will move on to other challenges so we'll probably have to wait until similar poweful hardware becomes common (or maybe building a training network of tens of thousands of volunteers which seems unlikely).

I think they have moved on already. I am sure there is enough brainpower with people in this forum to set up and develop a similar approach.

Rebel · Post by **Rebel** » Tue Dec 26, 2017 10:45 am

mar wrote:Just for fun, I generated virtual pgn of all A0-SF8 games (including opening positions, so we have a total of 1300 games) and ran through Ordo. (after having seen this nonsense: https://www.youtube.com/watch?v=eN7BMWl_mpw)

Turned it off half way but the guy in the beginning makes the claim the diagrams (page 6) are AZ vs SF games. Now looking at the result (64.4%) that seems likely but from the text in the paper it's all not so obvious, rather confusing, YMMV.

Table 2: Analysis of the 12 most popular human openings (played more than 100,000 times in an online database (1)). Each opening is labelled by its ECO code and common name. The plot shows the proportion of self-play training games in which AlphaZero played each opening, against training time. We also report the win/draw/loss results of 100 game AlphaZero vs. Stockfish matches starting from each opening, as either white (w) or black (b), from AlphaZero's perspective. Finally, the principal variation (PV) of AlphaZero is provided from each opening.

So who played who and can we conclude each of these 12 matches of each 100 games started from the position in the diagram?

Albert Silver · Post by **Albert Silver** » Tue Dec 26, 2017 8:23 pm

Rebel wrote:
mar wrote:Just for fun, I generated virtual pgn of all A0-SF8 games (including opening positions, so we have a total of 1300 games) and ran through Ordo. (after having seen this nonsense: https://www.youtube.com/watch?v=eN7BMWl_mpw)
Turned it off half way but the guy in the beginning makes the claim the diagrams (page 6) are AZ vs SF games. Now looking at the result (64.4%) that seems likely but from the text in the paper it's all not so obvious, rather confusing, YMMV.

Table 2: Analysis of the 12 most popular human openings (played more than 100,000 times in an online database (1)). Each opening is labelled by its ECO code and common name. The plot shows the proportion of self-play training games in which AlphaZero played each opening, against training time. We also report the win/draw/loss results of 100 game AlphaZero vs. Stockfish matches starting from each opening, as either white (w) or black (b), from AlphaZero's perspective. Finally, the principal variation (PV) of AlphaZero is provided from each opening.

So who played who and can we conclude each of these 12 matches of each 100 games started from the position in the diagram?

Not sure I understand the first question 'Who played who?'

I don't think a lot can be concluded from the reported results. Let's start with the assumption that A0 is about 100 Elo stronger, and that it is in all positions (which I'm certain is not correct): that will strongly influence the results, meaning you cannot ascribe any inherent value to the opening's worth. Then there is the opening itself: some openings might simply be worse objectively, or require a more exploitative approach, contrary to the equilibrium one that a NN would inevitably use.

AlphaZero performance

AlphaZero performance

Re: AlphaZero performance

Re: AlphaZero performance

Re: AlphaZero performance

Re: AlphaZero performance

Re: AlphaZero performance

Re: AlphaZero performance

Re: AlphaZero performance

Re: AlphaZero performance

Re: AlphaZero performance