Has any chess engine employed RT RL yet?
Moderators: hgm, Rebel, chrisw
-
- Posts: 867
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Has any chess engine employed RT RL yet?
If not then why not? When I released the first version of RomiChess with after game RL in January of 2006 it was my goal to next have Romi do RL in real time. But life circumstances prevented me from trying. Realtime RL is the next logical step in its evolution. I can't imagine that no one has thought of it! If an engine has one minute to make a move then use 30 seconds, give or take, to play many games or game segments at a few ply less search depth to do RL on the position. The information returned to the root from far higher in the tree would be extremely valuable. It would result in more accurate scores, better move ordering and even the main search could reach deeper depths although it would only have roughly half the time. So I ask again, if not then why not?
-
- Posts: 867
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Has any chess engine employed RT RL yet?
Does anyone remember the test against Glaurung2 I posted? Starting with an empty learn file Romi played 10 matches using the Nunn 10 positions. In the 1st match Romi only scored 5%. In the 10th match Romi scored 95%! I don't know how many reduced depth RL games can be played in 30 seconds but I bet it is a lot more than ten
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Has any chess engine employed RT RL yet?
Could you explain what "RL" is?
"Reactionary Learning", or "Reinforced Learning"? Put the engine in an unknown situation, let it try 50.000 things, and then tell it which of those things were good and bad? That's (very simplistically stated) the way you train a neural network.
"Reactionary Learning", or "Reinforced Learning"? Put the engine in an unknown situation, let it try 50.000 things, and then tell it which of those things were good and bad? That's (very simplistically stated) the way you train a neural network.
-
- Posts: 867
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Has any chess engine employed RT RL yet?
Reinforcement learning. Or how a pure alpha beta engine can be trained. But in real time which I believe is not done by any existing engine.mvanthoor wrote: ↑Fri Oct 02, 2020 1:40 am Could you explain what "RL" is?
"Reactionary Learning", or "Reinforced Learning"? Put the engine in an unknown situation, let it try 50.000 things, and then tell it which of those things were good and bad? That's (very simplistically stated) the way you train a neural network.
-
- Posts: 433
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: Has any chess engine employed RT RL yet?
What you described is not reinforcement learning but Monte-Carlo Tree Search (MCTS). Some engines like Komodo MCTS or Leela use it. As Komodo MCTS is weaker than Komodo with alphabeta, I am not sure MCTS is a valuable approach.Mike Sherwin wrote: ↑Thu Oct 01, 2020 10:50 pm If not then why not? When I released the first version of RomiChess with after game RL in January of 2006 it was my goal to next have Romi do RL in real time. But life circumstances prevented me from trying. Realtime RL is the next logical step in its evolution. I can't imagine that no one has thought of it! If an engine has one minute to make a move then use 30 seconds, give or take, to play many games or game segments at a few ply less search depth to do RL on the position. The information returned to the root from far higher in the tree would be extremely valuable. It would result in more accurate scores, better move ordering and even the main search could reach deeper depths although it would only have roughly half the time. So I ask again, if not then why not?
Richard Delorme
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Has any chess engine employed RT RL yet?
Maybe not for chess, at this time; it -is- a good approach for Go though. It revolutionized computer go. Alpha/Beta for Go isn't that great; programs can't search deep enough to reach more than 5 or 4K. (The reason is that in Go, a move can often cause an entire string of forced moves, ending in a bad position, which a human can immediately see without calculating anything. Think about starting a ladder, while a ladder breaker is already in place: if you start the ladder, you'll lose the game.)
https://en.wikipedia.org/wiki/Ladder_(Go)
In the example, white started a ladder. Without the black stone, the ladder would run all the way to the edge of the board (forced moves), and white would capture all the black stones. Because of the extra black stone, black can 'break' the ladder at that point, and white will not be able to capture the black stones within the ladder. A broken ladder construct is very weak, so white will lose the game (against any opponent that is not a pure beginner).
Alpha/Beta cannot solve something like that; MCTS with its hundreds of thousands, and sometimes millions of playouts, can.
Such extreme series of moves are quite common in Go; they aren't in chess. That is the reason why in Go "thinking ahead" is called "reading", and in chess it's called "calculating".
-
- Posts: 867
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Has any chess engine employed RT RL yet?
It is not MCTS though. It is less deeply searched a/b games intelligently directed one game building upon all the previous games RL for the allotted time. The number of games needed to strongly and positively affect the main search is magnitudes less than required by MCTS.abulmo2 wrote: ↑Sat Oct 03, 2020 12:22 amWhat you described is not reinforcement learning but Monte-Carlo Tree Search (MCTS). Some engines like Komodo MCTS or Leela use it. As Komodo MCTS is weaker than Komodo with alphabeta, I am not sure MCTS is a valuable approach.Mike Sherwin wrote: ↑Thu Oct 01, 2020 10:50 pm If not then why not? When I released the first version of RomiChess with after game RL in January of 2006 it was my goal to next have Romi do RL in real time. But life circumstances prevented me from trying. Realtime RL is the next logical step in its evolution. I can't imagine that no one has thought of it! If an engine has one minute to make a move then use 30 seconds, give or take, to play many games or game segments at a few ply less search depth to do RL on the position. The information returned to the root from far higher in the tree would be extremely valuable. It would result in more accurate scores, better move ordering and even the main search could reach deeper depths although it would only have roughly half the time. So I ask again, if not then why not?
-
- Posts: 9
- Joined: Tue Oct 06, 2015 5:00 pm
Re: Has any chess engine employed RT RL yet?
"Searching the given position to a shallower depth and using that information to inform the current search" sounds like iterative deepening to me. What is the "output" of the RL going to be (other than adding relevant entries to the hash table)? How are we going to use it to inform the "real" search?Mike Sherwin wrote: ↑Sat Oct 03, 2020 1:04 am It is less deeply searched a/b games intelligently directed one game building upon all the previous games RL for the allotted time. The number of games needed to strongly and positively affect the main search is magnitudes less than required by MCTS.
-
- Posts: 867
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Has any chess engine employed RT RL yet?
Using shallower searches to play game segments or even complete games to a depth greater than what the main search calls for. So if the iteration depth of the main search is 40 ply then game segments may be played to a minimum depth (not search depth) of 60 ply saving the game moves in a reinforcement learning structure like a tree as done in RomiChess (Romi's is after game RL) or possibly in a ginormous hash. Therefore each of the real time learning games results would modify the learning because that is how reinforcement learning works. The games will be more and more intelligently played as more games are added to the RL structure thus seeing strategy and tactics from 50% or more deeper in the tree that the main search will search when the learning time is exhausted. This information will result in better move ordering, more accurate evaluations and a faster main search.jwilson82 wrote: ↑Sun Oct 04, 2020 6:44 pm"Searching the given position to a shallower depth and using that information to inform the current search" sounds like iterative deepening to me. What is the "output" of the RL going to be (other than adding relevant entries to the hash table)? How are we going to use it to inform the "real" search?Mike Sherwin wrote: ↑Sat Oct 03, 2020 1:04 am It is less deeply searched a/b games intelligently directed one game building upon all the previous games RL for the allotted time. The number of games needed to strongly and positively affect the main search is magnitudes less than required by MCTS.
Edit: I did not say this exactly right. I did not mean to imply that RL is done between every iteration of the main search. I meant that if the last iteration has been averaging 40 ply then play pre main search RL games to at least 60 ply. Please someone let me know if I'm being understood.
-
- Posts: 9
- Joined: Tue Oct 06, 2015 5:00 pm
Re: Has any chess engine employed RT RL yet?
I'm lost already. You are given a position to search. You are given one minute to do the search. Let's say that you can (somehow) decide that you want to search that position to a depth of 10 using half (a fraction taken from an earlier post) of the allotted time, 30 seconds. How are you supposed to do a deeper search in the first 30 seconds (RL phase) than the second 30 seconds (search phase)?Using shallower searches to play game segments or even complete games to a depth greater than what the main search calls for
A book-like structure I can wrap my head around, but a hash table is iterative deepeningsaving the game moves in a reinforcement learning structure like a tree as done in RomiChess (Romi's is after game RL) or possibly in a ginormous hash
I'm still lost as to what, exactly, the tangible product of the learning phase isTherefore each of the real time learning games results would modify the learning because that is how reinforcement learning works
Better move ordering than hash table hits alone?This information will result in better move ordering, more accurate evaluations and a faster main search.
How does the learning phase impact the evaluation function?
How is the search faster other than hash table cutoffs?
I recognize that my questions my come off as sharp an abrupt, especially as a new poster to this form. I ask questions to better my own understanding, and have found that direct questions tend to result in direct answers. Please take them as an indication of my own ignorance an desire to understand rather than as an attack on your statements themselves.