Google's AlphaGo team has been working on chess

syzygy · Post by **syzygy** » Wed Dec 27, 2017 2:01 am

In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?

Michael Sherwin · Post by **Michael Sherwin** » Wed Dec 27, 2017 2:31 am

syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?

I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.

Michael Sherwin · Post by **Michael Sherwin** » Wed Dec 27, 2017 2:54 am

Michael Sherwin wrote:
syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.

RL can be added to a UCI engine in the following manner. A UCI engine would have to keep a record of the game. Then on end of game the UCI engine can determine for itself the outcome of the game and then update its RL database.

pilgrimdan · Post by **pilgrimdan** » Wed Dec 27, 2017 5:31 am

Michael Sherwin wrote:
syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.

" ... A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL."

from the paper ...

https://arxiv.org/pdf/1712.01815.pdf

this is what alphazero uses:

non-linear function approximatiom
deep neural network (NN)
reinforcement learning algorithm (RL)
MCTS (averages over approximation errors)
gradient descent (parameter adjustment)
mean-squared error
cross-entropy
weight regularisation

also from the paper...

prior reinforcement learning in computer chess

NeuroChess
neural network (evaluated positions)
temporal-difference (learning)

KnightCap
neural network (evaluated positions)
temporal-difference (leaf)

Meep
linear evaluation function (evaluated positions)
temporal-difference (TreeStrap)

Giraffe
neural network (evaluated positions)
temporal-difference (leaf) [self-play]

DeepChess
neural network (trained to perform evaluation)

Hex
networks (value)
policy (dual)

and finally from the paper...

chess programs using traditional MCTS were much weaker than alpha-beta search programs...
while alpha-beta programs based on neural networks have previously been unable to compete with faster, handcrafted evaluation functions...

Michael Sherwin · Post by **Michael Sherwin** » Wed Dec 27, 2017 6:13 am

pilgrimdan wrote:
Michael Sherwin wrote:
syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.
" ... A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL."

from the paper ...

https://arxiv.org/pdf/1712.01815.pdf

this is what alphazero uses:

non-linear function approximatiom
deep neural network (NN)
reinforcement learning algorithm (RL)
MCTS (averages over approximation errors)
gradient descent (parameter adjustment)
mean-squared error
cross-entropy
weight regularisation

also from the paper...

prior reinforcement learning in computer chess

NeuroChess
neural network (evaluated positions)
temporal-difference (learning)

KnightCap
neural network (evaluated positions)
temporal-difference (leaf)

Meep
linear evaluation function (evaluated positions)
temporal-difference (TreeStrap)

Giraffe
neural network (evaluated positions)
temporal-difference (leaf) [self-play]

DeepChess
neural network (trained to perform evaluation)

Hex
networks (value)
policy (dual)

and finally from the paper...

chess programs using traditional MCTS were much weaker than alpha-beta search programs...
while alpha-beta programs based on neural networks have previously been unable to compete with faster, handcrafted evaluation functions...

I can't tell if this is pro con or just information. Maybe a match between Romi and one of these programs mentioned is in order?

hgm · Post by **hgm** » Wed Dec 27, 2017 10:21 am

Michael Sherwin wrote:All I'm pushing is the opinion that A0's strength is due to RL.

And there is about as much need for that as pushing the claim that the sky is blue, water is wet, or checkmate is a win in Chess. It is what the Alpha Zero paper claims in the first place, what the whole experiment was set up to reveal: that Chess could be self-taught by reinforcement learning, without any knowledge other than the rules being revealed to it by any other means. All that it did better than a random mover, was due to RL.

Rebel · Post by **Rebel** » Wed Dec 27, 2017 10:22 am

CheckersGuy wrote:
Michael Sherwin wrote:
mcostalba wrote:I have read the paper: result is impressive!

Honestly I didn't think it was possible because my understanding was that chess is more "computer friendly" than Go....I was wrong.

It is true, SF is not meant to play at its best without a book and especially 1 fixed minute per move cuts out the whole time management, it would be more natural to play with tournament conditions, but nevertheless I think these are secondary aspects, what has been accomplished is huge.
Marco, A0 did not win a match against SF. A0 with RL won a match against SF. Or said another way, A0 won a match against SF because SF does not have RL. Or thought of a different way, a group of programmers identified a deficiency that exist in a competitive field and took advantage of that deficiency by eliminating that deficiency in their entity. Or one can change that thought around and say RL does not belong in competitive chess because it covers up the underlying strength and correctness of the algorithm. In that case the A0 vs SF match is non sequitur and meaningless. Then there is the thought of the fan that wants RL but are ignored because they are not important and what the fan thinks or wants is not meaningful.

But, what you can't say is, "what has been accomplished is huge" in terms of a chess playing algorithm. You might say that what A0 has demonstrated in go, chess and shogi has accomplished a huge demonstration that a NN with RL may conquer hamanity some day. I won't argue against that. Concerning chess though the AB algorithm is not inferior to NN+MC. It is inferior to NN+MC+RL. AB+RL is far superior to NN+MC+RL.

And I said all that without mentioning RomiChess not even one time!
That alpha-beta search + reinforcement learning is indeed better than mcts + nn+reinforcement learning is still something that has to be proven. Assertions and a bulk of text doesn't help it Only a match between engines using those 2 different algorithms can be thought of as an definitive answer. Everything else is just based on certain assumptions.

To simplify things, I suppose it's not hard to imagine that it's quite simple with learning (fom a given position) to get a 100% score against your own engine very soon.

Michael Sherwin · Post by **Michael Sherwin** » Wed Dec 27, 2017 10:38 am

CheckersGuy wrote:
Michael Sherwin wrote:
CheckersGuy wrote:
Michael Sherwin wrote:
mcostalba wrote:I have read the paper: result is impressive!

Honestly I didn't think it was possible because my understanding was that chess is more "computer friendly" than Go....I was wrong.

It is true, SF is not meant to play at its best without a book and especially 1 fixed minute per move cuts out the whole time management, it would be more natural to play with tournament conditions, but nevertheless I think these are secondary aspects, what has been accomplished is huge.
Marco, A0 did not win a match against SF. A0 with RL won a match against SF. Or said another way, A0 won a match against SF because SF does not have RL. Or thought of a different way, a group of programmers identified a deficiency that exist in a competitive field and took advantage of that deficiency by eliminating that deficiency in their entity. Or one can change that thought around and say RL does not belong in competitive chess because it covers up the underlying strength and correctness of the algorithm. In that case the A0 vs SF match is non sequitur and meaningless. Then there is the thought of the fan that wants RL but are ignored because they are not important and what the fan thinks or wants is not meaningful.

But, what you can't say is, "what has been accomplished is huge" in terms of a chess playing algorithm. You might say that what A0 has demonstrated in go, chess and shogi has accomplished a huge demonstration that a NN with RL may conquer hamanity some day. I won't argue against that. Concerning chess though the AB algorithm is not inferior to NN+MC. It is inferior to NN+MC+RL. AB+RL is far superior to NN+MC+RL.

And I said all that without mentioning RomiChess not even one time!
That alpha-beta search + reinforcement learning is indeed better than mcts + nn+reinforcement learning is still something that has to be proven. Assertions and a bulk of text doesn't help it Only a match between engines using those 2 different algorithms can be thought of as an definitive answer. Everything else is just based on certain assumptions. Since we won't have a commercial version of AlphaZero anytime soon it probably will be quite some time until we find out
Technically correct but not practically correct. Demonstrably there is strong evidence supporting what I posted. It was demonstrated by R_m_C_e_s_ that hundreds of elo can be gained just by very few training games in real competition. And over a 1000 elo in very restrictive test with even fewer training games. Against a truly massive opening book and against 6 top engines it was demonstrated that 50 elo per 5,000 games of training is achieved. And the gain was linear during the scope of the test. So unless it is believed that a 2400 elo engine can benefit this way but a 3400 elo engine cannot then it can be assumed that the 3400 elo engine will do quite well. In the case of SF that would mean victory against A0.
Can you point to a link ? Where there a sufficient amount of test games played ? Would like to see the statistics

From page 21 of Test no.1 (DiscoCheck w32 comparative)

http://www.talkchess.com/forum/viewtopi ... highlight=

Like you can see in the below capture, after 3600 games in my tournament, the singles effective learning functions seem to be implemented in:

-RomiChess P3L
-KnightCap 3.7e

[/url]

Rebel · Post by **Rebel** » Wed Dec 27, 2017 12:56 pm

Daniel Shawul wrote:Most of us here suspected that this could happen once Giraffe showed it can beat Stockfish's eval.

Is your opinion based on Giraffe's results in the STS test suite?

From his thesis:

Page 24
Figure 4 shows the result of running the test periodically as training progresses. With the material only bootstrap, it achieves a score of approximately 6000/15000. As training progresses, it gradually improved to approximately 9500/15000, with peaks above 9700/15000, proving that it has managed to gain a tremendous amount of positional understanding.

Page 25
It is clear that Giraffe's evaluation function now has at least comparable positional understanding compared to evaluation functions of top engines in the world

Page 25
Since Giraffe discovered all the evaluation features through self-play, it is likely that it knows about patterns that have not yet been studied by humans, and hence not included in the test suite.

Joost Buijs · Post by **Joost Buijs** » Wed Dec 27, 2017 5:07 pm

The only thing that it shows is that Giraffe has a performance comparable with top-engines on the 1500 positions from the STS test-set.

In the past I used an older version of STS to tune my evaluation-function (never changed it since), I also see a performance on STS comparable with top-engines but I'm pretty sure that my engine doesn't have the same positional understanding as e.g. Stockfish, Komodo and Houdini, to name a few.

I did an experiment once and replaced my evaluation-function with the one from an older Stockfish version, it gave me about 150 Elo gain, the score on STS remained in the same ballpark though. The score on STS doesn't tell you much, 1500 positions (very similar in each of the 15 categories) are way to little to say anything about the positional understanding of the evaluation-function.

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess