Homura is almost ready to share!

AAce3 · Post by **AAce3** » Tue Mar 14, 2023 1:37 am

RedBedHed wrote: ↑Mon Mar 13, 2023 9:36 pm
lithander wrote: ↑Mon Mar 13, 2023 5:45 pm Originally the MC in MCTS stands for "Monte Carlo" which means random. So the search uses super fast but random playouts, right? You wouldn't even need an evaluation function for your positions because they all get played until the game ends in a draw or mate. Then you propagate a win or loss back through the nodes of the tree. This is what you tried first?
Hi! Yes, the classical MCTS uses random playouts and UCT. Each playout sends a probe out into the search space. For chess, this is impractical, as some playouts are long, and-- given the tactical, sudden-death nature of the game-- to accurately estimate the value of the node you would need to do a lot of playouts.

I did initially try the random playouts and found that they were extremely weak given a time constraint of 20 seconds per move. I even introduced Tree Parallelization so that the search could use each of my cores, and it still didn't help.

lithander wrote: ↑Mon Mar 13, 2023 5:45 pm Then, when you say A-B simulations you mean that instead of playing random moves on each node you play the best move determined by a low-depth A-B search? So you get to have a lot less nodes in the tree and now you also need a static evaluation for a position to get the A-B search to work well. But the benefit is that whether a win or loss is propagate back is now much less random and actually reflects the quality of the position better.
Some researchers tried this, but found it to be weaker than other methods. By A-B simulations I mean just a shallow alpha-beta search. You just run a shallow search, normalize the score, and backpropagate.

lithander wrote: ↑Mon Mar 13, 2023 5:45 pm So the P in PUCT stands for policy and it's meant to guide the search to focus on promising nodes?
How is a trained policy network different from a network used to evaluate a position as used in NNUE? And if it's similar could you use a static evaluation (e.g. PeSTO) as a policy, too? For example you compute the evaluation of the position before and after a move as a poor-man's policy to tell you how interesting the move is to explore? Did you try that?
The P stands for "predictor" technically, but yes! That's the gist! From what I understand, the policy network is trained on grandmaster games so that the search has a sort of human intuition. I did try generating a policy from the A-B simulations. The search depth nearly doubled, but convergence was still too slow.

lithander wrote: ↑Mon Mar 13, 2023 5:45 pm I'm not sure how to phrase this... I certainly don't mean to be mean but the list of classical features sound like they would make a pretty strong classical engine, already. Maybe 2400+ with just PeSTO evaluation? I don't want to imply that bringing MCTS into the mix came at the cost of a few hundred Elo but I also don't understand how exactly MCTC and classical work together in your engine and where the benefits are. Maybe you can help me understand whether Homura has managed to incoporate MCTS and is *still* play decent chess (also cool!) or if the addition of MCTS is providing some value over a purely classical engine already?
Currently, the MCTS + A-B Rollouts is stronger than my ID A-B search. I think you are right. After hearing about Zangdar, I think I maybe didn't implement the classical A-B features correctly. I am going to revisit the A-B part of my search before working on the evaluation. Hopefully, SPRT will help me.

There was an interview with the Komodo guy, who basically said that at short time controls with low numbers of threads, they relied mostly on the alpha-beta search, in other words, very expensive but accurate rollouts. With enough resources, they could depend more on the MCTS side on it to perform and converge onto the best move, although at super long time controls MCTS seems to start to get tunnel vision and get fixated on certain moves (Leela does this sometimes). The komodo website actually states just how "slow" their MCTS implementation is -- "A rough estimate on current (2019) processors is each core can expand about 30 Monte Carlo nodes a second, but can vary a lot when the tree has many draws or mates and climb higher to perhaps 100." In other words, they're expanding nodes very very slowly but the quality of those nodes are very good.

RedBedHed · Post by **RedBedHed** » Tue Mar 14, 2023 2:31 am

Ah, okay... So like better simulations and a greedier tree policy than UCT or PUCT? Hmm...

RedBedHed · Post by **RedBedHed** » Tue Mar 14, 2023 2:33 am

I also wonder what their final selection policy is... And how they got around the averaging issue...

RedBedHed · Post by **RedBedHed** » Tue Mar 14, 2023 3:51 am

Aaaah! OMG. You guys. I just figured out that I had a fundamental misunderstanding of how the transposition table worked. I used the ply (and extra iteration information) to access entries rather than the remaining depth. I just fixed it, and now Homura might be playing closer to 2400. Here is a tournament that I ran against Zeta Dva:

Code: Select all

Score of Homura vs zetadva: 22 - 3 - 0  [0.880] 25
...      Homura playing White: 12 - 1 - 0  [0.923] 13
...      Homura playing Black: 10 - 2 - 0  [0.833] 12
...      White vs Black: 14 - 11 - 0  [0.560] 25
Elo difference: 346.1 +/- nan, LOS: 100.0 %, DrawRatio: 0.0 %

algerbrex · Post by **algerbrex** » Tue Mar 14, 2023 5:35 am

RedBedHed wrote: ↑Tue Mar 14, 2023 3:51 am Aaaah! OMG. You guys. I just figured out that I had a fundamental misunderstanding of how the transposition table worked. I used the ply (and extra iteration information) to access entries rather than the remaining depth. I just fixed it, and now Homura might be playing closer to 2400. Here is a tournament that I ran against Zeta Dva:
Code: Select all
Score of Homura vs zetadva: 22 - 3 - 0  [0.880] 25
...      Homura playing White: 12 - 1 - 0  [0.923] 13
...      Homura playing Black: 10 - 2 - 0  [0.833] 12
...      White vs Black: 14 - 11 - 0  [0.560] 25
Elo difference: 346.1 +/- nan, LOS: 100.0 %, DrawRatio: 0.0 %

Nice, the transposition table is often a tricky beast to get working correctly and even harder to get working optimally. Now might be a good time to go back and double-check your other areas of the search to make sure you're not missing out on Elo.

AAce3 · Post by **AAce3** » Tue Mar 14, 2023 4:48 pm

RedBedHed wrote: ↑Tue Mar 14, 2023 2:33 am I also wonder what their final selection policy is... And how they got around the averaging issue...

Scores are significantly more consistent with strong evaluations, I think.

I know that Ceres(https://github.com/dje-dev/Ceres), which is a Lc0-type experimental engine, found significant success with using value of the node as the final selection policy. They've been doing a lot of super cool things with MCTS, so I'd recommend you check them out.

hgm · Post by **hgm** » Wed Mar 15, 2023 9:47 am

RedBedHed wrote: ↑Mon Mar 13, 2023 9:36 pmHi! Yes, the classical MCTS uses random playouts and UCT. Each playout sends a probe out into the search space. For chess, this is impractical, as some playouts are long, and-- given the tactical, sudden-death nature of the game-- to accurately estimate the value of the node you would need to do a lot of playouts.

Purely random playouts for chess are almost meaningless. It is not so expensive, however, to cover tactical situations (where typically a single move scores much better than all others) through a static analysis, a bit similar to how the N.E.G. engine does it: count the number of attackers and protectors of each square, and keep track of the attacker and protector with the lowest value. And then play the move that appears to make the best capture, or rescues your own piece from the best capture of the opponent.

Such a simple analysis can also be used as a poor-man's substitute for a 'policy network' in the MCTS part.

RedBedHed · Post by **RedBedHed** » Sat Mar 18, 2023 11:36 pm

algerbrex wrote: ↑Tue Mar 14, 2023 5:35 am Nice, the transposition table is often a tricky beast to get working correctly and even harder to get working optimally. Now might be a good time to go back and double-check your other areas of the search to make sure you're not missing out on Elo.

You were 100% right! I just found out that PeSTO used a different board layout than Homura. I flipped the PSQ tables and the engine gained a significant amount of ELO!

RedBedHed · Post by **RedBedHed** » Sat Mar 18, 2023 11:36 pm

AAce3 wrote: ↑Tue Mar 14, 2023 4:48 pm
RedBedHed wrote: ↑Tue Mar 14, 2023 2:33 am I also wonder what their final selection policy is... And how they got around the averaging issue...
Scores are significantly more consistent with strong evaluations, I think.

I know that Ceres(https://github.com/dje-dev/Ceres), which is a Lc0-type experimental engine, found significant success with using value of the node as the final selection policy. They've been doing a lot of super cool things with MCTS, so I'd recommend you check them out.

Woah, this is a fascinating project! I'll start reading it

RedBedHed · Post by **RedBedHed** » Sat Mar 18, 2023 11:39 pm

hgm wrote: ↑Wed Mar 15, 2023 9:47 am Purely random playouts for chess are almost meaningless.

I think you are totally right. The random playouts are pretty much just noise.

hgm wrote: ↑Wed Mar 15, 2023 9:47 am It is not so expensive, however, to cover tactical situations (where typically a single move scores much better than all others) through a static analysis, a bit similar to how the N.E.G. engine does it: count the number of attackers and protectors of each square, and keep track of the attacker and protector with the lowest value. And then play the move that appears to make the best capture, or rescues your own piece from the best capture of the opponent.

Such a simple analysis can also be used as a poor-man's substitute for a 'policy network' in the MCTS part.

This is a really interesting idea! I bet it would work great, honestly... I'm just not very good at chess tactics myself, so I wouldn't know where to start lol

Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!