First, thanks for supplying more details. My speculations were based on what you had reported before.Michael Sherwin wrote:Mark Lefler was kind enough to ask me in a pm to error check the above!mjlef wrote:Of course we listen to requests. But I do not think the Romi learning is anything like the learning Google/DeepMind did. They used 5000 special TensorFlow Processing units (TPUs), each costing thousands of dollars. Right now, this is way beyond our resources. Romi learning is most likely simply saving important positions in a persistant hash. In future games, these are reloaded into the main hash table, so the new game, if ir encouters one of these positions, alread has deep search information ofr them. The thing is, this helps not at all if a different lien is played. My old program NOW had this feature, but I did it not so much to make it stronger, but instead to avoid losing lines during tournaments. You can think of it as a self-correcting book, which benefits if the same line it tried on Komodo.shrapnel wrote:So Reinforcement Learning will be introduced by Komodo team (if they are capable of it) only on REQUEST.Jesse Gersenson wrote:If people want learning in Komodo, let your voice be heard; Mark and Larry are open to feature requests, especially those requested by a lot of people.
'.
Looks like the thrashing stockfish received still hasn't convinced them.
*** Unbelievable.
Larry and I often discuss Monte Carlo Tree Search, and are interested in trying this. We have also discussed uses for neural networks. Small nns could be useful in present PCs, but the massive nn used in AlphaGo Zero is currently beyond what we, and most chess engine users can afford.
We listen, and try to add what we think people want. But we do not have endless resources. We can afford to buy roughly 1 new server each year. It is not a matter of being "convinced". We just cannot afford it currently. Hopefully, GPUs in graphics cards will get faster and perhaps nn hardware will get added to future PCs or at least be much less expensive.
There is circumstantial evidence that it is similar in nature. Reinforcement learning involves accumulated rewards (and penalties) for every position in a database. It could be a persistent hash like suggested.But I do not think the Romi learning is anything like the learning Google/DeepMind did.
Or the way Romi actually does it in a tree of all played games. This is superior to a persistent hash as only the subtree from the current position is loaded into the game hash. This has the advantage that only useful information ends up in the game hash.Romi learning is most likely simply saving important positions in a persistant hash.
Not true. Rewards and penalties are greater near the leaves and over time are back propagated to the root. Since every node is a root to a subtree, every move benefits from backpropagation. This results in a meaningful differentiation resulting in a gradual determination of which move gives better results. So for example at the actual root Romi will settle on say 1.e4 or 1.d4 etc. as being best and will always play that opening move. Just like AlphaZ always played 1.d4.The thing is, this helps not at all if a different lien is played.
Philosophically I would argue that what RomiChess does has nothing to do with a book. In a book an engine usually can choose randomly between acceptable moves. In RomiChess only the learned best move is played in its Monkey See Monkey Do "book". That is separate from Romi's reinforcement learning. It just so happens that the stored tree of all Romi's games can handle both a "book" and the rewards/penalties for reinforcement learning.You can think of it as a self-correcting book, which benefits if the same line it tried on Komodo.
Thanks Mark for inviting me to share some details!
So this is a type of persistent hash, although instead of loading many positions, it has a database and load positions based on the current board position. You are right that any position leading to the positions in the database would benefit, but it practicality, the game diverges quickly, expect for some very narrow forced lines.
This type of learning is very different from a neural network which funds generalized patterns based on many positions, and is not tied to a specific node. Of course the same reinforced learning would also train the program to prefer some opening lines over other as well. And learn which endings are wins, losses or draws, like a slightly inaccurate tablebase.
I simply do not understand this "Since every node is a root to a subtree, every move benefits from backpropagation." Only positions encountered i the current search tree that were in the database before can benefit. Once something changes (a piece is captured), no moves past that point can benefit unless they transpose to a position in the database.
You give no details on how it "learns". the simplest thing is o save as many positions as possible. trying to use stats of how winning or losing a position is could be useful in deciding which positions to keep (also depth, how popular a position is, how much the score changed compared with siblings).
In NOW my system was quite simple. It had a relatively small, fixed sized table save to disk containing hash positions (keys, scores, bounds..). It kept them sorted by the hash key, for fast updating. Before each search, then positions were placed at the appropriate point in the main hash table. After each completed search, hash positions were selected to add to the table, or update the table entry if they had better information (better bounds, more depth...). Positions that change the score, especially at the root were given priority in replacement. It added positions based on the PV. Given this was a 32 bit program, long ago, having a large database was not practical. It did change future play, which was the point, and when fed the same opening moves, did gain strength. Overall, transpositions to other positions were pretty rare. The branching factor is chess is just so large. but now that we have bigger memory, maybe it would help more.