Homura is almost ready to share!

RedBedHed · Post by **RedBedHed** » Sun Mar 12, 2023 7:56 am

Hello, Chess programming community!

As I near the end of my undergraduate thesis, I am almost ready to post my code for my very first Chess engine! It is kind of a mess right now... So I might need a few months to clean it up... But I do plan to post it.

My engine, Homura (named after one of my favorite anime characters), uses a rollout/backtracking hybrid search based on algorithm 4 from the full "Pruning Game Tree by Rollouts" paper, the chessprogramming wiki, and many different engines. After running tournaments against Zeta Dva and Blunder, I believe the latest version is ~100 points stronger than 2000 elo. It is single-threaded and uses the common PeSTO evaluation from the chess programming wiki. Right now, it only supports a small subset of UCI, including "go infinite" and "go movetime <x>."

I think that the evaluation is lacking, as it was tuned for an engine that searches much deeper than Homura. As the worst Chess player on this forum, I am unsure how to improve the evaluation. I tried adding king safety and mobility, but that only hurt Homura's strength. Does anyone have any suggestions?

~ also ~

As some of you may remember, I started this project with a Monte Carlo Alpha Beta search and a dream that I could make it half as strong as Komodo MCTS. Sadly, this didn't work out for me, and I discovered that writing a competitive MCTS engine without powerful hardware and neural networks is extremely difficult. I want to share what I found:

Classical MCTS needs a lot of help to play Chess. On its own, it struggles with tactics.

1) It is very selective in expanding the tree and searches some lines much deeper than others. If a line appears bad initially, MCTS doesn't stick around long enough to see if it can improve. It uses the evaluation of its simulations as a crutch, and if its simulations aren't tactically strong, it is prone to underestimating or overestimating the value of a node. A programmer might use Alpha-Beta simulations to remedy this. However, A-B simulations deeper than two plies are costly.

2) Even with PUCT, convergence to the Principal Variation is too slow. UCT explores without any prior belief, while PUCT biases exploration toward moves it believes are good— just like a human would. So I bet it can work wonders with a trained policy network, but I think a policy for PUCT must be strong from the moment a node is expanded. Without a NN, it is hard to compute a reliable policy up-front.

3) MCTS' backpropagation scheme averages the reward below a node. An imminent win on the fringe— with a probability of 1 and visit count of 1— can drown in the larger values near the root, making the search extremely unfit for sudden-death games like Chess. A programmer can remedy this with "MCTS Solver"-style terminal backpropagation. However, this can cause values near the root to suddenly drop or rise, even when a mate isn't imminent— assuming optimal play. And even with Secure Child as the final selection policy, it seems to panic and give away its position in the midgame.

On the other hand, I found that Alpha Beta doesn't need much help to play Chess well (I know, right? What a shock! lol).

1) An iterative deepening Alpha-Beta search prunes away what it doesn’t need to see and visits the rest at some depth. It sees many lines, even those that MCTS would abandon. Because of this, ID Alpha-Beta is much better equipped to pick up on subtle tactics and traps. It is much harder to force the search into a bad position, and it is much easier for the search to fight its way into a good position.

2) The score backpropagated to the root by Alpha-Beta is the heuristic score of the board after the principal variation is played. It can naturally prove whether one of the root’s children is a win or a loss, and— with the right evaluation— prefer wins closer to the root. So it is well-equipped to pick up on imminent wins and losses within its search tree.

So I decided to try an iterative deepening MCTS with Alpha Beta rollouts instead. The search calls the regular backtracking routine for null-window and internal ID searches. It builds the important parts of the tree in memory as it performs rollouts. It uses greedy selection at non-pv nodes where the move ordering scheme may not work well (It seems to boost elo and makes me wonder if it wouldn't be better to implement the null-window searches with rollouts). With each iteration, it passes the root pointer to my garbage collector and moves to a new root. Reusing the tree doesn't help with my current search, as the in-memory portion changes from iteration to iteration. In my next engine, I will experiment with re-using the tree and see if I can improve move ordering. I simply ran out of time with Homura.

Classical techniques implemented include:

Transposition Table
Dynamic Move Ordering: PV move -> Attacks MVV-LVA -> Killers -> History Quiets
Principal Variation Search
Fail Soft
Internal Iterative Deepening
Recursive Null Move Pruning
Futility Pruning
Razoring
Late Move Reductions
Late Move Pruning
Quiescence Search inspired heavily by Leorik

Overall, I'm just really excited that Homura is playing decent Chess, and I can't wait to share it.

(Sorry for any errors... I typed this pretty fast)

algerbrex · Post by **algerbrex** » Sun Mar 12, 2023 7:29 pm

RedBedHed wrote: ↑Sun Mar 12, 2023 7:56 am Hello, Chess programming community!

As I near the end of my undergraduate thesis, I am almost ready to post my code for my very first Chess engine! It is kind of a mess right now... So I might need a few months to clean it up... But I do plan to post it.

My engine, Homura (named after one of my favorite anime characters), uses a rollout/backtracking hybrid search based on algorithm 4 from the full "Pruning Game Tree by Rollouts" paper, the chessprogramming wiki, and many different engines. After running tournaments against Zeta Dva and Blunder, I believe the latest version is ~100 points stronger than 2000 elo. It is single-threaded and uses the common PeSTO evaluation from the chess programming wiki. Right now, it only supports a small subset of UCI, including "go infinite" and "go movetime <x>."

Congratulations! Getting to that first stable version of your engine is always a really cool moment. Seeing it play chess and all your hard work come together is very rewarding

RedBedHed wrote: ↑Sun Mar 12, 2023 7:56 am I think that the evaluation is lacking, as it was tuned for an engine that searches much deeper than Homura. As the worst Chess player on this forum, I am unsure how to improve the evaluation. I tried adding king safety and mobility, but that only hurt Homura's strength. Does anyone have any suggestions?

The challenge you're experiencing is a common one. The PeSTO values you used are already highly tuned. And due to the non-orthogonality of chess evaluation features, adding new, strength-gaining terms on top of the PeSTO terms without also tweaking them is likely a difficult task. Hand-tuning evaluation parameters, in general, I think is a difficult enterprise. Certainly not impossible, but not at all trivial.

My suggestion would be to start looking toward automated evaluation tuning. And don't feel the need to rush to neural networks right off the bat either. There are many interesting and rewarding stages in between hand-tuning your evaluation and using a modern NN design.

Texel tuning is the methodology most engine authors commonly start with. Basically using a sigmoid function to map from centi-pawn values to win-"percentages" between 0 and 1. A position is always evaluated from white's perspective, so a very negative position score will map to values closer to 0, which means the evaluation is "predicting" black has a higher chance of winning. And similarly for more positive scores being closer to 1, and the evaluation "predicting" white having a higher chance of winning.

The function to optimize is then usually a mean squared error function (although it doesn't have to be, and this might be an interesting place for some experimentation). A large set of positions (e.g. 1M) is taken as the dataset. A win for white is labeled with a 1.0, a win for black 0.0, and 0.5 a draw. The average squared error is then calculated by going through each position and finding the difference between the game result (1.0., 0.0, or 0.5), and win percentage given by running the evaluation through the sigmoid function.

A variety of algorithms are used to minimize the error function. The CPW wiki page for texel tuning presents a very straightforward approach to doing so. The downside being it's quite slow and inefficent, but does usually get the job done.

Gradient descent and its variants are what I see most people use, and it isn't terribly hard to get up and running either. The biggest conceptual hurdle is figuring out how to model your cost function purely in terms of derivable operations and functions. Most I've seen do this by modeling the evaluation as the dot product between a weights vector which holds evaluation terms, and features vector, which holds position specfic features such that when the dot product is taken between the two you get the typical evaluation score. If you're not sure about this I'd be happy to explain in more detail.

But you don't have to use gradient descent either! Part of the fun is experimenting with variables like what optimization algorithm to use. Erik Madsen, author of MadChess, uses a particle-swarm optimization algorithm to do his texel tuning. And Thomas Petzkep, author of iCE used genetic algorithms, specfically PBIL to do his tuning.

RedBedHed · Post by **RedBedHed** » Mon Mar 13, 2023 4:34 am

algerbrex wrote: ↑Sun Mar 12, 2023 7:29 pm
Congratulations! Getting to that first stable version of your engine is always a really cool moment. Seeing it play chess and all your hard work come together is very rewarding

The challenge you're experiencing is a common one. The PeSTO values you used are already highly tuned. And due to the non-orthogonality of chess evaluation features, adding new, strength-gaining terms on top of the PeSTO terms without also tweaking them is likely a difficult task. Hand-tuning evaluation parameters, in general, I think is a difficult enterprise. Certainly not impossible, but not at all trivial.

My suggestion would be to start looking toward automated evaluation tuning. And don't feel the need to rush to neural networks right off the bat either. There are many interesting and rewarding stages in between hand-tuning your evaluation and using a modern NN design.

Thank you! Ah, okay that makes a lot of sense! I will start reading about this asap and try to improve the evaluation.

I saw in another thread you said that cutechess-cli SPRT tests are used to gauge elo strength between engine versions. Can this command work with different time controls? I only have two UCI time control settings implemented :/

In the rustic blog I read, it looks like they use a time control I haven't implemented yet...

Henk · Post by **Henk** » Mon Mar 13, 2023 10:28 am

Yes I think you can't make the best engine unless you clone one and improve it. All too much work.
So that means you can only make a bad one. Most important is to keep code clean. In case you get a 'brilliant' idea you are still able to implement it.
Otherwise you have to re-start from scratch again.

algerbrex · Post by **algerbrex** » Mon Mar 13, 2023 10:38 am

RedBedHed wrote: ↑Mon Mar 13, 2023 4:34 am Thank you! Ah, okay that makes a lot of sense! I will start reading about this asap and try to improve the evaluation.

I saw in another thread you said that cutechess-cli SPRT tests are used to gauge elo strength between engine versions. Can this command work with different time controls? I only have two UCI time control settings implemented :/

In the rustic blog I read, it looks like they use a time control I haven't implemented yet...

Yep, absolutely! It can work with any time control theoretically.

Now, with that said, for SPRT to be accurate, you often need to test several thousand games. And this can be impractical in terms of the time testing takes if you're engine currently only works at long or blitz time controls. I usually use a time control of 10 seconds with 0.1 second increments per move, and this has worked pretty well for me. And since my PC is multicore, as most modern PCs are, I use the concurrency feature of cutechess-cli to run 6-7 games at once. My PC has 8 cores but I'd recommend not running the exact same number of games as number of cores, as this might cause the quality of games to drop too significantly due to the PC being overloaded.

Also, another caveat. SPRT self-testing is nice because it's convenient and personally, I've found it to be a pretty reliable method of getting a ballpark idea of engine strength gain when enough games are run. I say ballpark idea, however, because this strength gain is often overexaggerated in self-play, and against a gauntlet of similarly strong engines, the gain is likely smaller. This isn't always the case of course, very rarely are things always the case

but it's generally the case, and so at regularly intervals I think also some sort of gauntlet testing should be used to verify strength is actually being gained against other engines, and not just against previous versions. I usually do this with testing between version releases. If I'm planning on releasing the next major version, I'll create a suitable gauntlet using engines I estimate are of similar strength, as well as the previous version, and run several thousand games to verify the new version seems to be stronger.

algerbrex · Post by **algerbrex** » Mon Mar 13, 2023 10:57 am

Henk wrote: ↑Mon Mar 13, 2023 10:28 am Yes I think you can't make the best engine unless you clone one and improve it. All too much work.
So that means you can only make a bad one.

I don't think that's necessarily a fair assessment.

What do you mean by "bad"? Is an engine bad because it's one of the very best? Maybe, if you only value an engine by its raw Elo rating. By that metric the vast, vast majority of hobby engines are bad. But I think you'd be hard-pressed to get most engine authors to accept such a definition of value.

Personally, I think Elo rating is only one component of what makes an engine good or bad, and in many cases, maybe not even the most important component. For many I think an engine can have value simply because it was a rewarding project which took many hours and much thought. But beyond personal value, things like originality, playstyle, innovation, experimentation, and many other criteria all factor into the value of an engine.

Since creating a lichess account for Blunder I've had many people message me that they found it very enjoyable to play against my engine, and I've also had some people share with me my engine was part of their inspiration to get into chess programming, which I think is one of the most rewarding and messages an engine author can receive. I've even had someone contact me to let me know the various versions of my engines were excellent sparring partners for their needs.

I believe all these aspects contribute to how an engine might be judged, and there's no need to be sad an engine doesn't have any value because it's not sitting in one of the top 10 spots of the CCRL rating list

lithander · Post by **lithander** » Mon Mar 13, 2023 5:45 pm

Very interesting post!

Just like with NNUE I have no prior experience with MCTS but plan to experiment with it eventually in my own engine. I would hope that something like MCTS would allow the engine to scale linearily with multiple threads because you could parallelized the playout. The other thing I find attractive is that I can imagine that it helps me make my engine more selective to reach greater depths. With my current search I struggle to reduce the branching factor further in a way that actually gains strength, and I need some fresh ideas.

But I'm not sure I understand everything in your post.

RedBedHed wrote: ↑Sun Mar 12, 2023 7:56 am 1) It is very selective in expanding the tree and searches some lines much deeper than others. If a line appears bad initially, MCTS doesn't stick around long enough to see if it can improve. It uses the evaluation of its simulations as a crutch, and if its simulations aren't tactically strong, it is prone to underestimating or overestimating the value of a node. A programmer might use Alpha-Beta simulations to remedy this. However, A-B simulations deeper than two plies are costly.

Originally the MC in MCTS stands for "Monte Carlo" which means random. So the search uses super fast but random playouts, right? You wouldn't even need an evaluation function for your positions because they all get played until the game ends in a draw or mate. Then you propagate a win or loss back through the nodes of the tree. This is what you tried first?

Then, when you say A-B simulations you mean that instead of playing random moves on each node you play the best move determined by a low-depth A-B search? So you get to have a lot less nodes in the tree and now you also need a static evaluation for a position to get the A-B search to work well. But the benefit is that whether a win or loss is propagate back is now much less random and actually reflects the quality of the position better.

RedBedHed wrote: ↑Sun Mar 12, 2023 7:56 am 2) Even with PUCT, convergence to the Principal Variation is too slow. UCT explores without any prior belief, while PUCT biases exploration toward moves it believes are good— just like a human would. So I bet it can work wonders with a trained policy network, but I think a policy for PUCT must be strong from the moment a node is expanded. Without a NN, it is hard to compute a reliable policy up-front.

So the P in PUCT stands for policy and it's meant to guide the search to focus on promising nodes?
How is a trained policy network different from a network used to evaluate a position as used in NNUE? And if it's similar could you use a static evaluation (e.g. PeSTO) as a policy, too? For example you compute the evaluation of the position before and after a move as a poor-man's policy to tell you how interesting the move is to explore? Did you try that?

RedBedHed wrote: ↑Sun Mar 12, 2023 7:56 am So I decided to try an iterative deepening MCTS with Alpha Beta rollouts instead. The search calls the regular backtracking routine for null-window and internal ID searches. It builds the important parts of the tree in memory as it performs rollouts. It uses greedy selection at non-pv nodes where the move ordering scheme may not work well (It seems to boost elo and makes me wonder if it wouldn't be better to implement the null-window searches with rollouts). With each iteration, it passes the root pointer to my garbage collector and moves to a new root. Reusing the tree doesn't help with my current search, as the in-memory portion changes from iteration to iteration. In my next engine, I will experiment with re-using the tree and see if I can improve move ordering. I simply ran out of time with Homura.

Classical techniques implemented include:

Transposition Table

Dynamic Move Ordering: PV move -> Attacks MVV-LVA -> Killers -> History Quiets

Principal Variation Search

Fail Soft

Internal Iterative Deepening

Recursive Null Move Pruning

Futility Pruning

Razoring

Late Move Reductions

Late Move Pruning

Quiescence Search inspired heavily by Leorik

I'm not sure how to phrase this... I certainly don't mean to be mean but the list of classical features sound like they would make a pretty strong classical engine, already. Maybe 2400+ with just PeSTO evaluation? I don't want to imply that bringing MCTS into the mix came at the cost of a few hundred Elo but I also don't understand how exactly MCTC and classical work together in your engine and where the benefits are. Maybe you can help me understand whether Homura has managed to incoporate MCTS and is *still* play decent chess (also cool!) or if the addition of MCTS is providing some value over a purely classical engine already?

Great to hear you found inspiration in Leorik, btw!

RedBedHed wrote: ↑Sun Mar 12, 2023 7:56 am Overall, I'm just really excited that Homura is playing decent Chess, and I can't wait to share it.

I'm looking forward to your eventual release. I appreciate novelty and the explorative nature of your project!

RedBedHed · Post by **RedBedHed** » Mon Mar 13, 2023 9:36 pm

lithander wrote: ↑Mon Mar 13, 2023 5:45 pm Originally the MC in MCTS stands for "Monte Carlo" which means random. So the search uses super fast but random playouts, right? You wouldn't even need an evaluation function for your positions because they all get played until the game ends in a draw or mate. Then you propagate a win or loss back through the nodes of the tree. This is what you tried first?

Hi! Yes, the classical MCTS uses random playouts and UCT. Each playout sends a probe out into the search space. For chess, this is impractical, as some playouts are long, and-- given the tactical, sudden-death nature of the game-- to accurately estimate the value of the node you would need to do a lot of playouts.

I did initially try the random playouts and found that they were extremely weak given a time constraint of 20 seconds per move. I even introduced Tree Parallelization so that the search could use each of my cores, and it still didn't help.

lithander wrote: ↑Mon Mar 13, 2023 5:45 pm Then, when you say A-B simulations you mean that instead of playing random moves on each node you play the best move determined by a low-depth A-B search? So you get to have a lot less nodes in the tree and now you also need a static evaluation for a position to get the A-B search to work well. But the benefit is that whether a win or loss is propagate back is now much less random and actually reflects the quality of the position better.

Some researchers tried this, but found it to be weaker than other methods. By A-B simulations I mean just a shallow alpha-beta search. You just run a shallow search, normalize the score, and backpropagate.

lithander wrote: ↑Mon Mar 13, 2023 5:45 pm So the P in PUCT stands for policy and it's meant to guide the search to focus on promising nodes?
How is a trained policy network different from a network used to evaluate a position as used in NNUE? And if it's similar could you use a static evaluation (e.g. PeSTO) as a policy, too? For example you compute the evaluation of the position before and after a move as a poor-man's policy to tell you how interesting the move is to explore? Did you try that?

The P stands for "predictor" technically, but yes! That's the gist! From what I understand, the policy network is trained on grandmaster games so that the search has a sort of human intuition. I did try generating a policy from the A-B simulations. The search depth nearly doubled, but convergence was still too slow.

lithander wrote: ↑Mon Mar 13, 2023 5:45 pm I'm not sure how to phrase this... I certainly don't mean to be mean but the list of classical features sound like they would make a pretty strong classical engine, already. Maybe 2400+ with just PeSTO evaluation? I don't want to imply that bringing MCTS into the mix came at the cost of a few hundred Elo but I also don't understand how exactly MCTC and classical work together in your engine and where the benefits are. Maybe you can help me understand whether Homura has managed to incoporate MCTS and is *still* play decent chess (also cool!) or if the addition of MCTS is providing some value over a purely classical engine already?

Currently, the MCTS + A-B Rollouts is stronger than my ID A-B search. I think you are right. After hearing about Zangdar, I think I maybe didn't implement the classical A-B features correctly. I am going to revisit the A-B part of my search before working on the evaluation. Hopefully, SPRT will help me.

lithander wrote: ↑Mon Mar 13, 2023 5:45 pm Great to hear you found inspiration in Leorik, btw!

Leorik is great! I'm doing a lot of C# in my distributed computing classes, so it is super readable to me.

lithander wrote: ↑Mon Mar 13, 2023 5:45 pm I'm looking forward to your eventual release. I appreciate novelty and the explorative nature of your project!

Thanks!

The pseudo-code for the Alpha-Beta rollouts can be found in the full "Pruning Game Tree by Rollouts" paper: [link]https://www.microsoft.com/en-us/researc ... ollout.pdf[/link]

RedBedHed · Post by **RedBedHed** » Mon Mar 13, 2023 9:42 pm

algerbrex wrote: ↑Mon Mar 13, 2023 10:38 am Yep, absolutely! It can work with any time control theoretically.

Now, with that said, for SPRT to be accurate, you often need to test several thousand games. And this can be impractical in terms of the time testing takes if you're engine currently only works at long or blitz time controls. I usually use a time control of 10 seconds with 0.1 second increments per move, and this has worked pretty well for me. And since my PC is multicore, as most modern PCs are, I use the concurrency feature of cutechess-cli to run 6-7 games at once. My PC has 8 cores but I'd recommend not running the exact same number of games as number of cores, as this might cause the quality of games to drop too significantly due to the PC being overloaded.

Also, another caveat. SPRT self-testing is nice because it's convenient and personally, I've found it to be a pretty reliable method of getting a ballpark idea of engine strength gain when enough games are run. I say ballpark idea, however, because this strength gain is often overexaggerated in self-play, and against a gauntlet of similarly strong engines, the gain is likely smaller. This isn't always the case of course, very rarely are things always the case but it's generally the case, and so at regularly intervals I think also some sort of gauntlet testing should be used to verify strength is actually being gained against other engines, and not just against previous versions. I usually do this with testing between version releases. If I'm planning on releasing the next major version, I'll create a suitable gauntlet using engines I estimate are of similar strength, as well as the previous version, and run several thousand games to verify the new version seems to be stronger.

This is very interesting! Thank you for this info!

RedBedHed · Post by **RedBedHed** » Mon Mar 13, 2023 10:01 pm

I would attach images of my pseudocode for the negamax version, but it looks like I can't upload pictures

Homura is almost ready to share!

Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!

Re: Homura is almost ready to share!