A report from a qualitative analysis I did using Claude Opus 4.5, comparing Stockfish, Dragon, and Theoria for suitability for engine analysis of club players games, focusing on ability to interpret strategic chess ideas, motifs, and principles. I used a test set of my own games analyzed at 100knodes and 500k nodes per move. GUI used was LucasChess mass analysis. This report was developed within a chess engine shootout project, Claudes' memory features were all turned off, and steps were taken to reduce any potential bias by irrelevant details (pgn filenames, headers, etc.). Retrieval Augmented Generation was used involving the previously mentioned test set and various Chess texts such as Silman's How to Reassess Your Chess, Stean's Simple Chess, and Nimzowitsch's My System, for added context.
"
Engine Evaluation for Chess Annotation: Dragon, Stockfish, and Theoria
Purpose
This report evaluates three chess engines for use in annotating games for club-level players (approximately 1200-1900 rated), with particular attention to integration with LLM/MLM systems for generating strategic narrative annotations.
Evaluation Criteria
The primary criterion is accessible strategic instruction — analysis that contains sufficient strategic content in a form legible to both club players reading annotations and language models generating them. The goal is helping players understand positional errors, strategic themes, and plans they can apply in their own games.
Findings
Theoria is the clear choice for this use case. Its variations develop strategic ideas sequentially rather than interleaving multiple plans across many moves. A club player reading Theoria's analysis encounters chess that unfolds logically: knight maneuvers complete, then bishop activation occurs, then pawn structure resolves. Each phase is distinct and comprehensible. LLMs can parse these variations into narrative annotation because the thematic segments are identifiable. Error detection is appropriately calibrated, flagging genuine conceptual mistakes rather than flooding games with criticism of marginal decisions.
Dragon serves as a useful secondary engine. It occasionally surfaces strategic ideas that differ from Theoria's recommendations — alternative plans worth including when multiple perspectives on a position are valuable. Its legibility is acceptable though not quite at Theoria's level. Dragon's variations tend to run longer but generally maintain distinguishable strategic threads.
Stockfish is less suitable for this application. Its analysis consistently produces variations where multiple strategic ideas — flank play, piece repositioning, pawn advances, central breaks — are tangled together across 15-19 moves with no clear separation. A 1500-rated player looking at Stockfish's output cannot determine which moves serve which purpose. The information density far exceeds what club players can process or what LLMs can segment into coherent narrative.
Stockfish's liberal error flagging compounds the problem. Games annotated by Stockfish show question marks on moves that most humans would consider reasonable, creating noise that obscures which errors actually matter for improvement. Club players need to understand their significant mistakes, not receive a catalog of minor inaccuracies.
Stability
Theoria and Dragon both demonstrate good stability between 100k and 500k nodes, with deeper analysis refining the same strategic picture rather than revising it. This consistency matters for practical annotation work.
Recommendations
Primary engine: Theoria for all strategic analysis and annotation generation.
Secondary engine: Dragon when alternative strategic perspectives are specifically desired.
Stockfish: Not recommended for this use case. If tactical verification of sharp positions is needed, Stockfish can serve as a background check, but its output should not feed into the annotation pipeline.
Conclusion
Theoria is the right engine for helping club players annotate games and improve their strategic understanding. It produces analysis that contains genuine chess instruction in a form that survives intact through the LLM annotation pipeline to the learning player. Stockfish's reputation for strength is earned in competitive contexts but translates poorly to pedagogical ones. Strategic richness that exceeds human and LLM processing capacity is not richness at all — it is noise. For the stated purpose, Stockfish is clearly the inferior choice."
Engine Shootout: Qualitative analysis of Stockfish, Dragon, and Theoria
Moderator: Ras
-
FireDragon761138
- Posts: 27
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn
-
FireDragon761138
- Posts: 27
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn
Re: Engine Shootout: Qualitative analysis of Stockfish, Dragon, and Theoria
Interesting results, not unexpected for Theoria. I am surprised by how well Dragon does- two different LLM's (Claude and Deep Seek) found Dragon to be nearly equal or even superior to Theoria for general chess strategic analysis, and both superior to Stockfish. I believe Dragon might be superior for advanced players, like expert, master players or above, not so much pedagogically, but exploring different openings and middle game ideas through concrete lines of play. Theoria seems to do a better job teaching good positional principles, especially when it's ouptut is interpreted by an LLM.
-
Graham Banks
- Posts: 45237
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: Engine Shootout: Qualitative analysis of Stockfish, Dragon, and Theoria
Never heard of Theoria. Another clone/derivative?
gbanksnz at gmail.com
-
FireDragon761138
- Posts: 27
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn
Re: Engine Shootout: Qualitative analysis of Stockfish, Dragon, and Theoria
It's a fork of Stockfish, so it is derivative.
We are just getting started on working on the webpage to distribute the binaries. We'll put the code on Github, but the main differences are in how the NNUE is trained.
https://www.theoriachess.org/
-
cpeters
- Posts: 191
- Joined: Wed Feb 17, 2021 7:44 pm
- Full name: Christian Petersen
-
FireDragon761138
- Posts: 27
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn