As I near the end of my undergraduate thesis, I am almost ready to post my code for my very first Chess engine! It is kind of a mess right now... So I might need a few months to clean it up... But I do plan to post it.
My engine, Homura (named after one of my favorite anime characters), uses a rollout/backtracking hybrid search based on algorithm 4 from the full "Pruning Game Tree by Rollouts" paper, the chessprogramming wiki, and many different engines. After running tournaments against Zeta Dva and Blunder, I believe the latest version is ~100 points stronger than 2000 elo. It is single-threaded and uses the common PeSTO evaluation from the chess programming wiki. Right now, it only supports a small subset of UCI, including "go infinite" and "go movetime <x>."
I think that the evaluation is lacking, as it was tuned for an engine that searches much deeper than Homura. As the worst Chess player on this forum, I am unsure how to improve the evaluation. I tried adding king safety and mobility, but that only hurt Homura's strength. Does anyone have any suggestions?
~ also ~
As some of you may remember, I started this project with a Monte Carlo Alpha Beta search and a dream that I could make it half as strong as Komodo MCTS. Sadly, this didn't work out for me, and I discovered that writing a competitive MCTS engine without powerful hardware and neural networks is extremely difficult. I want to share what I found:
Classical MCTS needs a lot of help to play Chess. On its own, it struggles with tactics.
1) It is very selective in expanding the tree and searches some lines much deeper than others. If a line appears bad initially, MCTS doesn't stick around long enough to see if it can improve. It uses the evaluation of its simulations as a crutch, and if its simulations aren't tactically strong, it is prone to underestimating or overestimating the value of a node. A programmer might use Alpha-Beta simulations to remedy this. However, A-B simulations deeper than two plies are costly.
2) Even with PUCT, convergence to the Principal Variation is too slow. UCT explores without any prior belief, while PUCT biases exploration toward moves it believes are good— just like a human would. So I bet it can work wonders with a trained policy network, but I think a policy for PUCT must be strong from the moment a node is expanded. Without a NN, it is hard to compute a reliable policy up-front.
3) MCTS' backpropagation scheme averages the reward below a node. An imminent win on the fringe— with a probability of 1 and visit count of 1— can drown in the larger values near the root, making the search extremely unfit for sudden-death games like Chess. A programmer can remedy this with "MCTS Solver"-style terminal backpropagation. However, this can cause values near the root to suddenly drop or rise, even when a mate isn't imminent— assuming optimal play. And even with Secure Child as the final selection policy, it seems to panic and give away its position in the midgame.
On the other hand, I found that Alpha Beta doesn't need much help to play Chess well (I know, right? What a shock! lol).
1) An iterative deepening Alpha-Beta search prunes away what it doesn’t need to see and visits the rest at some depth. It sees many lines, even those that MCTS would abandon. Because of this, ID Alpha-Beta is much better equipped to pick up on subtle tactics and traps. It is much harder to force the search into a bad position, and it is much easier for the search to fight its way into a good position.
2) The score backpropagated to the root by Alpha-Beta is the heuristic score of the board after the principal variation is played. It can naturally prove whether one of the root’s children is a win or a loss, and— with the right evaluation— prefer wins closer to the root. So it is well-equipped to pick up on imminent wins and losses within its search tree.
So I decided to try an iterative deepening MCTS with Alpha Beta rollouts instead. The search calls the regular backtracking routine for null-window and internal ID searches. It builds the important parts of the tree in memory as it performs rollouts. It uses greedy selection at non-pv nodes where the move ordering scheme may not work well (It seems to boost elo and makes me wonder if it wouldn't be better to implement the null-window searches with rollouts). With each iteration, it passes the root pointer to my garbage collector and moves to a new root. Reusing the tree doesn't help with my current search, as the in-memory portion changes from iteration to iteration. In my next engine, I will experiment with re-using the tree and see if I can improve move ordering. I simply ran out of time with Homura.
Classical techniques implemented include:
- Transposition Table
- Dynamic Move Ordering: PV move -> Attacks MVV-LVA -> Killers -> History Quiets
- Principal Variation Search
- Fail Soft
- Internal Iterative Deepening
- Recursive Null Move Pruning
- Futility Pruning
- Razoring
- Late Move Reductions
- Late Move Pruning
- Quiescence Search inspired heavily by Leorik

(Sorry for any errors... I typed this pretty fast)