Has any chess engine employed RT RL yet?

Mike Sherwin · Post by **Mike Sherwin** » Thu Oct 01, 2020 10:50 pm

If not then why not? When I released the first version of RomiChess with after game RL in January of 2006 it was my goal to next have Romi do RL in real time. But life circumstances prevented me from trying. Realtime RL is the next logical step in its evolution. I can't imagine that no one has thought of it! If an engine has one minute to make a move then use 30 seconds, give or take, to play many games or game segments at a few ply less search depth to do RL on the position. The information returned to the root from far higher in the tree would be extremely valuable. It would result in more accurate scores, better move ordering and even the main search could reach deeper depths although it would only have roughly half the time. So I ask again, if not then why not?

Mike Sherwin · Post by **Mike Sherwin** » Thu Oct 01, 2020 11:22 pm

Does anyone remember the test against Glaurung2 I posted? Starting with an empty learn file Romi played 10 matches using the Nunn 10 positions. In the 1st match Romi only scored 5%. In the 10th match Romi scored 95%! I don't know how many reduced depth RL games can be played in 30 seconds but I bet it is a lot more than ten

mvanthoor · Post by **mvanthoor** » Fri Oct 02, 2020 1:40 am

Could you explain what "RL" is?

"Reactionary Learning", or "Reinforced Learning"? Put the engine in an unknown situation, let it try 50.000 things, and then tell it which of those things were good and bad? That's (very simplistically stated) the way you train a neural network.

Mike Sherwin · Post by **Mike Sherwin** » Fri Oct 02, 2020 2:39 am

mvanthoor wrote: ↑Fri Oct 02, 2020 1:40 am Could you explain what "RL" is?

"Reactionary Learning", or "Reinforced Learning"? Put the engine in an unknown situation, let it try 50.000 things, and then tell it which of those things were good and bad? That's (very simplistically stated) the way you train a neural network.

Reinforcement learning. Or how a pure alpha beta engine can be trained. But in real time which I believe is not done by any existing engine.

abulmo2 · Post by **abulmo2** » Sat Oct 03, 2020 12:22 am

Mike Sherwin wrote: ↑Thu Oct 01, 2020 10:50 pm If not then why not? When I released the first version of RomiChess with after game RL in January of 2006 it was my goal to next have Romi do RL in real time. But life circumstances prevented me from trying. Realtime RL is the next logical step in its evolution. I can't imagine that no one has thought of it! If an engine has one minute to make a move then use 30 seconds, give or take, to play many games or game segments at a few ply less search depth to do RL on the position. The information returned to the root from far higher in the tree would be extremely valuable. It would result in more accurate scores, better move ordering and even the main search could reach deeper depths although it would only have roughly half the time. So I ask again, if not then why not?

What you described is not reinforcement learning but Monte-Carlo Tree Search (MCTS). Some engines like Komodo MCTS or Leela use it. As Komodo MCTS is weaker than Komodo with alphabeta, I am not sure MCTS is a valuable approach.

mvanthoor · Post by **mvanthoor** » Sat Oct 03, 2020 12:34 am

abulmo2 wrote: ↑Sat Oct 03, 2020 12:22 am What you described is not reinforcement learning but Monte-Carlo Tree Search (MCTS). Some engines like Komodo MCTS or Leela use it. As Komodo MCTS is weaker than Komodo with alphabeta, I am not sure MCTS is a valuable approach.

Maybe not for chess, at this time; it -is- a good approach for Go though. It revolutionized computer go. Alpha/Beta for Go isn't that great; programs can't search deep enough to reach more than 5 or 4K. (The reason is that in Go, a move can often cause an entire string of forced moves, ending in a bad position, which a human can immediately see without calculating anything. Think about starting a ladder, while a ladder breaker is already in place: if you start the ladder, you'll lose the game.)

https://en.wikipedia.org/wiki/Ladder_(Go)

In the example, white started a ladder. Without the black stone, the ladder would run all the way to the edge of the board (forced moves), and white would capture all the black stones. Because of the extra black stone, black can 'break' the ladder at that point, and white will not be able to capture the black stones within the ladder. A broken ladder construct is very weak, so white will lose the game (against any opponent that is not a pure beginner).

Alpha/Beta cannot solve something like that; MCTS with its hundreds of thousands, and sometimes millions of playouts, can.

Such extreme series of moves are quite common in Go; they aren't in chess. That is the reason why in Go "thinking ahead" is called "reading", and in chess it's called "calculating".

Mike Sherwin · Post by **Mike Sherwin** » Sat Oct 03, 2020 1:04 am

abulmo2 wrote: ↑Sat Oct 03, 2020 12:22 am
Mike Sherwin wrote: ↑Thu Oct 01, 2020 10:50 pm If not then why not? When I released the first version of RomiChess with after game RL in January of 2006 it was my goal to next have Romi do RL in real time. But life circumstances prevented me from trying. Realtime RL is the next logical step in its evolution. I can't imagine that no one has thought of it! If an engine has one minute to make a move then use 30 seconds, give or take, to play many games or game segments at a few ply less search depth to do RL on the position. The information returned to the root from far higher in the tree would be extremely valuable. It would result in more accurate scores, better move ordering and even the main search could reach deeper depths although it would only have roughly half the time. So I ask again, if not then why not?
What you described is not reinforcement learning but Monte-Carlo Tree Search (MCTS). Some engines like Komodo MCTS or Leela use it. As Komodo MCTS is weaker than Komodo with alphabeta, I am not sure MCTS is a valuable approach.

It is not MCTS though. It is less deeply searched a/b games intelligently directed one game building upon all the previous games RL for the allotted time. The number of games needed to strongly and positively affect the main search is magnitudes less than required by MCTS.

jwilson82 · Post by **jwilson82** » Sun Oct 04, 2020 6:44 pm

Mike Sherwin wrote: ↑Sat Oct 03, 2020 1:04 am It is less deeply searched a/b games intelligently directed one game building upon all the previous games RL for the allotted time. The number of games needed to strongly and positively affect the main search is magnitudes less than required by MCTS.

"Searching the given position to a shallower depth and using that information to inform the current search" sounds like iterative deepening to me. What is the "output" of the RL going to be (other than adding relevant entries to the hash table)? How are we going to use it to inform the "real" search?

Mike Sherwin · Post by **Mike Sherwin** » Mon Oct 05, 2020 1:29 am

jwilson82 wrote: ↑Sun Oct 04, 2020 6:44 pm
Mike Sherwin wrote: ↑Sat Oct 03, 2020 1:04 am It is less deeply searched a/b games intelligently directed one game building upon all the previous games RL for the allotted time. The number of games needed to strongly and positively affect the main search is magnitudes less than required by MCTS.
"Searching the given position to a shallower depth and using that information to inform the current search" sounds like iterative deepening to me. What is the "output" of the RL going to be (other than adding relevant entries to the hash table)? How are we going to use it to inform the "real" search?

Using shallower searches to play game segments or even complete games to a depth greater than what the main search calls for. So if the iteration depth of the main search is 40 ply then game segments may be played to a minimum depth (not search depth) of 60 ply saving the game moves in a reinforcement learning structure like a tree as done in RomiChess (Romi's is after game RL) or possibly in a ginormous hash. Therefore each of the real time learning games results would modify the learning because that is how reinforcement learning works. The games will be more and more intelligently played as more games are added to the RL structure thus seeing strategy and tactics from 50% or more deeper in the tree that the main search will search when the learning time is exhausted. This information will result in better move ordering, more accurate evaluations and a faster main search.

Edit: I did not say this exactly right. I did not mean to imply that RL is done between every iteration of the main search. I meant that if the last iteration has been averaging 40 ply then play pre main search RL games to at least 60 ply. Please someone let me know if I'm being understood.

jwilson82 · Post by **jwilson82** » Mon Oct 05, 2020 2:34 am

Using shallower searches to play game segments or even complete games to a depth greater than what the main search calls for

I'm lost already. You are given a position to search. You are given one minute to do the search. Let's say that you can (somehow) decide that you want to search that position to a depth of 10 using half (a fraction taken from an earlier post) of the allotted time, 30 seconds. How are you supposed to do a deeper search in the first 30 seconds (RL phase) than the second 30 seconds (search phase)?

saving the game moves in a reinforcement learning structure like a tree as done in RomiChess (Romi's is after game RL) or possibly in a ginormous hash

A book-like structure I can wrap my head around, but a hash table is iterative deepening

Therefore each of the real time learning games results would modify the learning because that is how reinforcement learning works

I'm still lost as to what, exactly, the tangible product of the learning phase is

This information will result in better move ordering, more accurate evaluations and a faster main search.

Better move ordering than hash table hits alone?
How does the learning phase impact the evaluation function?
How is the search faster other than hash table cutoffs?

I recognize that my questions my come off as sharp an abrupt, especially as a new poster to this form. I ask questions to better my own understanding, and have found that direct questions tend to result in direct answers. Please take them as an indication of my own ignorance an desire to understand rather than as an attack on your statements themselves.

Has any chess engine employed RT RL yet?

Has any chess engine employed RT RL yet?

Re: Has any chess engine employed RT RL yet?

Re: Has any chess engine employed RT RL yet?

Re: Has any chess engine employed RT RL yet?

Re: Has any chess engine employed RT RL yet?

Re: Has any chess engine employed RT RL yet?

Re: Has any chess engine employed RT RL yet?

Re: Has any chess engine employed RT RL yet?

Re: Has any chess engine employed RT RL yet?

Re: Has any chess engine employed RT RL yet?