Google's AlphaGo team has been working on chess

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Google's AlphaGo team has been working on chess

Post by syzygy »

In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Google's AlphaGo team has been working on chess

Post by Michael Sherwin »

syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Google's AlphaGo team has been working on chess

Post by Michael Sherwin »

Michael Sherwin wrote:
syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.
RL can be added to a UCI engine in the following manner. A UCI engine would have to keep a record of the game. Then on end of game the UCI engine can determine for itself the outcome of the game and then update its RL database.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
pilgrimdan
Posts: 405
Joined: Sat Jul 02, 2011 10:49 pm

Re: Google's AlphaGo team has been working on chess

Post by pilgrimdan »

Michael Sherwin wrote:
syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.
" ... A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL."

from the paper ...

https://arxiv.org/pdf/1712.01815.pdf

this is what alphazero uses:

non-linear function approximatiom
deep neural network (NN)
reinforcement learning algorithm (RL)
MCTS (averages over approximation errors)
gradient descent (parameter adjustment)
mean-squared error
cross-entropy
weight regularisation

also from the paper...

prior reinforcement learning in computer chess

NeuroChess
neural network (evaluated positions)
temporal-difference (learning)

KnightCap
neural network (evaluated positions)
temporal-difference (leaf)

Meep
linear evaluation function (evaluated positions)
temporal-difference (TreeStrap)

Giraffe
neural network (evaluated positions)
temporal-difference (leaf) [self-play]

DeepChess
neural network (trained to perform evaluation)

Hex
networks (value)
policy (dual)

and finally from the paper...

chess programs using traditional MCTS were much weaker than alpha-beta search programs...
while alpha-beta programs based on neural networks have previously been unable to compete with faster, handcrafted evaluation functions...
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Google's AlphaGo team has been working on chess

Post by Michael Sherwin »

pilgrimdan wrote:
Michael Sherwin wrote:
syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.

It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.

As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.

Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.
" ... A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL."

from the paper ...

https://arxiv.org/pdf/1712.01815.pdf

this is what alphazero uses:

non-linear function approximatiom
deep neural network (NN)
reinforcement learning algorithm (RL)
MCTS (averages over approximation errors)
gradient descent (parameter adjustment)
mean-squared error
cross-entropy
weight regularisation

also from the paper...

prior reinforcement learning in computer chess

NeuroChess
neural network (evaluated positions)
temporal-difference (learning)

KnightCap
neural network (evaluated positions)
temporal-difference (leaf)

Meep
linear evaluation function (evaluated positions)
temporal-difference (TreeStrap)

Giraffe
neural network (evaluated positions)
temporal-difference (leaf) [self-play]

DeepChess
neural network (trained to perform evaluation)

Hex
networks (value)
policy (dual)

and finally from the paper...

chess programs using traditional MCTS were much weaker than alpha-beta search programs...
while alpha-beta programs based on neural networks have previously been unable to compete with faster, handcrafted evaluation functions...
I can't tell if this is pro con or just information. Maybe a match between Romi and one of these programs mentioned is in order?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Google's AlphaGo team has been working on chess

Post by hgm »

Michael Sherwin wrote:All I'm pushing is the opinion that A0's strength is due to RL.
And there is about as much need for that as pushing the claim that the sky is blue, water is wet, or checkmate is a win in Chess. It is what the Alpha Zero paper claims in the first place, what the whole experiment was set up to reveal: that Chess could be self-taught by reinforcement learning, without any knowledge other than the rules being revealed to it by any other means. All that it did better than a random mover, was due to RL.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Google's AlphaGo team has been working on chess

Post by Rebel »

CheckersGuy wrote:
Michael Sherwin wrote:
mcostalba wrote:I have read the paper: result is impressive!

Honestly I didn't think it was possible because my understanding was that chess is more "computer friendly" than Go....I was wrong.

It is true, SF is not meant to play at its best without a book and especially 1 fixed minute per move cuts out the whole time management, it would be more natural to play with tournament conditions, but nevertheless I think these are secondary aspects, what has been accomplished is huge.
Marco, A0 did not win a match against SF. A0 with RL won a match against SF. Or said another way, A0 won a match against SF because SF does not have RL. Or thought of a different way, a group of programmers identified a deficiency that exist in a competitive field and took advantage of that deficiency by eliminating that deficiency in their entity. Or one can change that thought around and say RL does not belong in competitive chess because it covers up the underlying strength and correctness of the algorithm. In that case the A0 vs SF match is non sequitur and meaningless. Then there is the thought of the fan that wants RL but are ignored because they are not important and what the fan thinks or wants is not meaningful.

But, what you can't say is, "what has been accomplished is huge" in terms of a chess playing algorithm. You might say that what A0 has demonstrated in go, chess and shogi has accomplished a huge demonstration that a NN with RL may conquer hamanity some day. I won't argue against that. Concerning chess though the AB algorithm is not inferior to NN+MC. It is inferior to NN+MC+RL. AB+RL is far superior to NN+MC+RL.

And I said all that without mentioning RomiChess not even one time! :D
That alpha-beta search + reinforcement learning is indeed better than mcts + nn+reinforcement learning is still something that has to be proven. Assertions and a bulk of text doesn't help it :lol: Only a match between engines using those 2 different algorithms can be thought of as an definitive answer. Everything else is just based on certain assumptions.
To simplify things, I suppose it's not hard to imagine that it's quite simple with learning (fom a given position) to get a 100% score against your own engine very soon.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Google's AlphaGo team has been working on chess

Post by Michael Sherwin »

CheckersGuy wrote:
Michael Sherwin wrote:
CheckersGuy wrote:
Michael Sherwin wrote:
mcostalba wrote:I have read the paper: result is impressive!

Honestly I didn't think it was possible because my understanding was that chess is more "computer friendly" than Go....I was wrong.

It is true, SF is not meant to play at its best without a book and especially 1 fixed minute per move cuts out the whole time management, it would be more natural to play with tournament conditions, but nevertheless I think these are secondary aspects, what has been accomplished is huge.
Marco, A0 did not win a match against SF. A0 with RL won a match against SF. Or said another way, A0 won a match against SF because SF does not have RL. Or thought of a different way, a group of programmers identified a deficiency that exist in a competitive field and took advantage of that deficiency by eliminating that deficiency in their entity. Or one can change that thought around and say RL does not belong in competitive chess because it covers up the underlying strength and correctness of the algorithm. In that case the A0 vs SF match is non sequitur and meaningless. Then there is the thought of the fan that wants RL but are ignored because they are not important and what the fan thinks or wants is not meaningful.

But, what you can't say is, "what has been accomplished is huge" in terms of a chess playing algorithm. You might say that what A0 has demonstrated in go, chess and shogi has accomplished a huge demonstration that a NN with RL may conquer hamanity some day. I won't argue against that. Concerning chess though the AB algorithm is not inferior to NN+MC. It is inferior to NN+MC+RL. AB+RL is far superior to NN+MC+RL.

And I said all that without mentioning RomiChess not even one time! :D
That alpha-beta search + reinforcement learning is indeed better than mcts + nn+reinforcement learning is still something that has to be proven. Assertions and a bulk of text doesn't help it :lol: Only a match between engines using those 2 different algorithms can be thought of as an definitive answer. Everything else is just based on certain assumptions. Since we won't have a commercial version of AlphaZero anytime soon it probably will be quite some time until we find out :(
Technically correct but not practically correct. Demonstrably there is strong evidence supporting what I posted. It was demonstrated by R_m_C_e_s_ that hundreds of elo can be gained just by very few training games in real competition. And over a 1000 elo in very restrictive test with even fewer training games. Against a truly massive opening book and against 6 top engines it was demonstrated that 50 elo per 5,000 games of training is achieved. And the gain was linear during the scope of the test. So unless it is believed that a 2400 elo engine can benefit this way but a 3400 elo engine cannot then it can be assumed that the 3400 elo engine will do quite well. In the case of SF that would mean victory against A0.
Can you point to a link ? Where there a sufficient amount of test games played ? Would like to see the statistics
From page 21 of Test no.1 (DiscoCheck w32 comparative)

http://www.talkchess.com/forum/viewtopi ... highlight=

Image
Like you can see in the below capture, after 3600 games in my tournament, the singles effective learning functions seem to be implemented in:

-RomiChess P3L
-KnightCap 3.7e
Image[/url]
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Google's AlphaGo team has been working on chess

Post by Rebel »

Daniel Shawul wrote:Most of us here suspected that this could happen once Giraffe showed it can beat Stockfish's eval.
Is your opinion based on Giraffe's results in the STS test suite?

From his thesis:

Page 24
Figure 4 shows the result of running the test periodically as training progresses. With the material only bootstrap, it achieves a score of approximately 6000/15000. As training progresses, it gradually improved to approximately 9500/15000, with peaks above 9700/15000, proving that it has managed to gain a tremendous amount of positional understanding.

Page 25
It is clear that Giraffe's evaluation function now has at least comparable positional understanding compared to evaluation functions of top engines in the world

Page 25
Since Giraffe discovered all the evaluation features through self-play, it is likely that it knows about patterns that have not yet been studied by humans, and hence not included in the test suite.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Google's AlphaGo team has been working on chess

Post by Joost Buijs »

The only thing that it shows is that Giraffe has a performance comparable with top-engines on the 1500 positions from the STS test-set.

In the past I used an older version of STS to tune my evaluation-function (never changed it since), I also see a performance on STS comparable with top-engines but I'm pretty sure that my engine doesn't have the same positional understanding as e.g. Stockfish, Komodo and Houdini, to name a few.

I did an experiment once and replaced my evaluation-function with the one from an older Stockfish version, it gave me about 150 Elo gain, the score on STS remained in the same ballpark though. The score on STS doesn't tell you much, 1500 positions (very similar in each of the 15 categories) are way to little to say anything about the positional understanding of the evaluation-function.