I can't believe that so many people don't get it!

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 11:00 pm

hgm wrote:
Michael Sherwin wrote:1. Asked and answered several times. But okay once more. Part a) yes in a way because before the search all prior knowledge is loaded into the hash table then the search learns from the data and selects a move.
Perhaps it is because I simply don't understand your answer. Just storing data in another form and reading it back for using it is not learning. So if that is what you ar doing it would be a "no", and not a "yes, in a way".

Part b) No self learning was employed against Rybka. However, that is immaterial because Romi's learning opposes Romi's natural evaluation function and causes it to return a different result if Romi is losing.
"No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken? I don't understand the secod sentence at all, but I don't think it would answer ay question I have anyway.

2. WDL is learned best line only if stats are good and when that ends and there is absolutely no subtree to load into the hash table then Romi at least has played a line up to then that it has performed better at in the past so Romi is still better off at that point than without the learning.
This sounds like it is just an opening book, and when you are out of book, you are out. You say it is no book, but everything that only works up to a point, and then not at all, is by definition an opening book. Whether you first store the book in the hash table doesn't make it less than a book. That is totally different from AlphaZero learning, which learned to play good moves in any position. If it would not have learned that, it would have reverted to a random mover very early in the game.

I don't understand how you can get very deep in the game this way. Even if you play millios of games to learn. Perft 6 (3 moves into the game) is already 119M. OK, not every move is acceptable, but even with just 5 playable moves out of 25 you would only get to move 6 with 100 million games. You can record deeper lines in the book, of course, but there doesn't seem to be any chance you would ever play the same line as Rybka very long, if you were not close in strength to Rybka. (And even then...) Otherwise you would not need the help of the learn file, if you would play all the Rybka moves by yourself.

"Perhaps it is because I simply don't understand your answer. Just storing data in another form and reading it back for using it is not learning. So if that is what you ar doing it would be a "no", and not a "yes, in a way"."

In Romi's RL for the search to learn from the stored data it is loaded into the hash table before the search. Then the search learns from that data to produce a different result. If that does not happen then there is no learning.

""No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken?"

Romi started with a completely empty learn file against Rybka and played from a themed starting position. So the match games themselves supplied the WDL and RL value. Remember WDL is not used in RL, only the RL value is.

" You say it is no book, but everything that only works up to a point, and then not at all, is by definition an opening book."

I have said numerous times that WDL is not used in RL. There is no book used in RL. It just so happens that a tree structure can be used for a book and RL at the same time. You can't see past the book part.

Engine vs engine matches tend not to explore the full width of the tree. If they did Romi would never have shown an improvement in Marc LaCrosse testing or advance at WBEC. If Romi can differentiate between 1.d4 and 1.e4 which move leads to better results for Romi then Romi has benefited. After a million games that differentiation goes a lot higher into the tree than the first move and guides Romi to positions that it scores better at. And positions where Romi does not do so well at the penalties will shy it away from those lines. In highly trafficked openings Romi can then play very well into even the endgame.

Daniel Shawul · Post by **Daniel Shawul** » Mon Dec 18, 2017 11:09 pm

Rebel wrote: Maybe you underestimate what can be done by simple hashing. A couple of years ago I created an opening book of 150 million positions (1.6 Gb) made from CCRL/CEGT games (max 30 moves) and analysed positions by Dann Corbit and got a 102 ELO improvement [link].

No, I don't underestimate the value of book learning especially with a deterministic opponent. What I am opposed to is the claiming I did what alphago did, when the
only denominator is "learning". Book learning as everybody knows it (whether it is stored in the book or hash_table) is specific to a certain position -- with a 64 bit hash key.

AlphaGo's learning (NN training) is learning a general evaluation function. This can be compared to automatic parameter tuning done in chess programs, with the only difference being the NN actually constructs the important features while we have to code in the passed_pawns & king safety features our selves.

Learning to avoid loosing lines (which most tournament organizers are opposed to as you know) is a cheap trick and can in no way be compared to AlphaGo's method. Romi's method is simple opponent modeling -- otherwise the learned file should perform equally well against all opponents (deterministic or otherwise).

And this isn't even Reinforcement Learning as Mike showed in his Qxc3! example elsewhere.

If you with Scorpio start a game against SF giving the first 5 moves of the Ruy Lopez in advance you likely will lose, you apply Reinforcement Learning and replay the game, maybe you need to do that 100-1000 times but in the end Scorpio will win. You repeat the process until Scorpio wins that line in all variantions, say 500 times in a row.

It's not nonsense.

Hence Tord's complaint makes sense when he was asked, why no opening book?

I can add "why no EGBBs" but i do not think it would have mattered anyway. And the 1GB hash complaint feels like as if a 1MB hash was used for Stockfish.

Daniel

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 11:09 pm

Rebel wrote: Maybe you remember Mchess 5.0 and how it mated Rebel, Hiarcs and Genius from the opening book. Nice handcrafted work by Sandro Necci. No problem to do it with Reinforcement Learning all automatic.

I have heard of Mchess but all I know of it is the name. What does that imply about the Mchess opening book? What process created it? I haven't a clue.

Ras · Post by **Ras** » Mon Dec 18, 2017 11:21 pm

Michael Sherwin wrote:Objection your honor, leading question.

I'd say the question hit the nail.

If some engine plays 1.a3 against Romi and Romi never saw that before then no there is no help for Romi from the learn file that game.

This is not learning at the level of Alpha0, then. Not even remotely.

However, if Romi has seen 1.e4 numerous times and Romi does better with 1. ... c5 instead of 1. ... e5 that the evaluation only search would return then the learned reinforcement values will guide the search to chose 1. ... c5 instead of 1. ... e5.

This is a fancy sort of book learning. Obviously, also after the "actual" book, but it's still a sort of book learning. This answer is also obvious because you do it via the hash tables which by definition only apply for exact position matches and not for patterns. That's how books work, however.

hgm · Post by **hgm** » Mon Dec 18, 2017 11:26 pm

Michael Sherwin wrote:In Romi's RL for the search to learn from the stored data it is loaded into the hash table before the search. Then the search learns from that data to produce a different result. If that does not happen then there is no learning.

We seem to use different terminology. For me, 'search' never learns. Search just 'finds'. Learning is altering of stored information.

""No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken?"

Romi started with a completely empty learn file against Rybka and played from a themed starting position. So the match games themselves supplied the WDL and RL value. Remember WDL is not used in RL, only the RL value is.

So you learned how to beat Rybka from playing against Rybka. That makes it completely different from what Alpha Zero does. Alpha Zero learned to beat Stockfish by playing against itself.

I have said numerous times that WDL is not used in RL. There is no book used in RL. It just so happens that a tree structure can be used for a book and RL at the same time. You can't see past the book part.

So what is the 'RL value'? It is derived from the WDL statistics in some way, is it not? And why do you think that makes any difference anyway? A book is a book, no matter what you store in it. Nothing requires a book to have WDL information. Anything that stores information on a per-position basis that affects move selection is by definition a book.

Rebel · Post by **Rebel** » Mon Dec 18, 2017 11:42 pm

hgm wrote:
Rebel wrote:
hgm wrote:The 100 games all started from the normal start position.
Nothing of that in the document.
Well, it should have been if they started from non-standard positions. The 10 games they published from that match all started from the standard position.

Nope.

Game-4 AZ-SF 1. d4 e6
Game-5 AZ-SF 1. d4 Nf6

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 11:46 pm

hgm wrote:
Michael Sherwin wrote:In Romi's RL for the search to learn from the stored data it is loaded into the hash table before the search. Then the search learns from that data to produce a different result. If that does not happen then there is no learning.
We seem to use different terminology. For me, 'search' never learns. Search just 'finds'. Learning is altering of stored information.

""No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken?"

Romi started with a completely empty learn file against Rybka and played from a themed starting position. So the match games themselves supplied the WDL and RL value. Remember WDL is not used in RL, only the RL value is.

So you learned how to beat Rybka from playing against Rybka. That makes it completely different from what Alpha Zero does. Alpha Zero learned to beat Stockfish by playing against itself.

I have said numerous times that WDL is not used in RL. There is no book used in RL. It just so happens that a tree structure can be used for a book and RL at the same time. You can't see past the book part.
So what is the 'RL value'? It is derived from the WDL statistics in some way, is it not? And why do you think that makes any difference anyway? A book is a book, no matter what you store in it. Nothing requires a book to have WDL information. Anything that stores information on a per-position basis that affects move selection is by definition a book.

The WDL values are not directly used to compute the RL value. A bonus is applied to the winning side's moves and a penalty is applied to the losing side's moves. The bonus/penalty is larger towards the leaves and very small at the root. Therefore not directly linked to WDL. The idea is to gently nudge the search into better lines. Wins/losses far from the root position affect the search far less than wins/losses close to the root move. It is just simply a lot more dynamic than WDL.

Is a persistent hash a book? Romi's RL is like that but much smarter.

Guenther · Post by **Guenther** » Mon Dec 18, 2017 11:47 pm

Rebel wrote:
hgm wrote:
Rebel wrote:
hgm wrote:The 100 games all started from the normal start position.
Nothing of that in the document.
Well, it should have been if they started from non-standard positions. The 10 games they published from that match all started from the standard position.
Nope.

Game-4 AZ-SF 1. d4 e6
Game-5 AZ-SF 1. d4 Nf6

And what?? Of course the MP search randomizes at 64 threads, especially if the search is always terminated after a fixed time...

Michael Sherwin · Post by **Michael Sherwin** » Tue Dec 19, 2017 12:10 am

Ras wrote:
Michael Sherwin wrote:Objection your honor, leading question.
I'd say the question hit the nail.

If some engine plays 1.a3 against Romi and Romi never saw that before then no there is no help for Romi from the learn file that game.
This is not learning at the level of Alpha0, then. Not even remotely.

However, if Romi has seen 1.e4 numerous times and Romi does better with 1. ... c5 instead of 1. ... e5 that the evaluation only search would return then the learned reinforcement values will guide the search to chose 1. ... c5 instead of 1. ... e5.
This is a fancy sort of book learning. Obviously, also after the "actual" book, but it's still a sort of book learning. This answer is also obvious because you do it via the hash tables which by definition only apply for exact position matches and not for patterns. That's how books work, however.

"MCTS may be viewed as a self-play algorithm that, given neural network parameters θ and
a root position s, computes a vector of search probabilities recommending moves to play, π =
αθ(s), proportional to the exponentiated visit count for each move, πa ∝ N(s, a)
1/τ , where τ is a
temperature parameter."

In other words proportional to the exponentiated visit count for each move means a tree structure in which is stored the probability value. The NN guides the search using these values stored in the tree. Nowhere does it state that the tree structure is ever deleted. The temperature value has probably to do with distance to the leaves. They hide the details in careful scientific-ease so the average Joe has no chance of understanding it. The paper is truly a sculptured work.

hgm · Post by **hgm** » Tue Dec 19, 2017 12:14 am

Rebel wrote: Nope.

Game-4 AZ-SF 1. d4 e6
Game-5 AZ-SF 1. d4 Nf6

What 'nope'? These lines are legal moves from the stadard opening position, not? So what is your point?

I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!