Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

syzygy
Posts: 5897
Joined: Tue Feb 28, 2012 11:56 pm

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by syzygy »

FireDragon761138 wrote: Wed Jan 28, 2026 10:29 pm
syzygy wrote: Wed Jan 28, 2026 10:05 pm
FireDragon761138 wrote: Wed Jan 28, 2026 9:38 pmTaste has nothing to do with morality, it's merely preference. Actual community involve relational responsibility and recognition, not merely assertions of normativity or legality. And I have received none of that from the Stockfish community. So I owe them nothing. I'm not obligated to show fealty to my abusers.
Asking yourself WHY you are not receiving your desired recognition seems to be beyond you.

I still think your persona is merely an elaborate joke.
You probably aren't a Christian, so you may not understand this, but I happen to think people deserve the benefit of the doubt as a right, until proven otherwise. Treating people outside the community or the consensus as a pariah to be the butt of jokes is a violation of a person's dignity and is fundamentally immoral.
Ask yourself why you are being called out like this. Not just by me and not just here.
sscg13
Posts: 24
Joined: Mon Apr 08, 2024 8:57 am
Full name: Chris Bao

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by sscg13 »

The "Stockfish community" is not as monolithic as many people (including you) think it is. Criticism of your work has been from individual people (not all of whom are even part of the "Stockfish community").

Now let me try to explain how I feel from what I can make of your views:

0) What is meant by "improving Stockfish"? To Stockfish developers, improving Stockfish means making it stronger, for which winning more games is the corresponding metric. The general public also tends to agree that stronger, faster analysis is better (See: responses to your initial post on Reddit).

1) You claim that Stockfish is "fundamentally flawed." That might be true (For instance, HCE was "fundamentally flawed" compared to NNUE). The Stockfish community interprets this as a claim of "Stockfish can be improved significantly [in strength]." Correspondingly, if you do not come up with a significant strength improvement to Stockfish, then your claim is interpreted as being arrogant.

2) You want to be able to learn from engines. But in my opinion the fundamental problem with this is not evaluation, but search. To put this bluntly, no matter what kind of evaluation you plug into Stockfish, as long as it is a sensible evaluation, the search will still be far superhuman. In my opinion the truly interesting question here is how one can make a computer "think" (i.e. search) like a human.

3) Evaluation stability is a strange metric. Let us recall Goodhart's Law: "when a measure becomes a target, it ceases to become a good measure." The current target widely used by the community is to win more games. This is a "good target" in the sense that it is nearly impossible to game, an engine typically wins more games by truly playing better chess. In contrast, there are many ways to "game" evaluation stability, the most blatant of which is to not even print a real evaluation. Besides, this has already been pointed out to you many times, but I will repeat again here, if you are looking for a "stable" engine, then Lc0, which uses averaging, by default is more "stable" than alpha-beta engines.

4) I would posit that most of the initial engagement was in good faith. But people have a limit to patience. So when many people think that you are engaging in bad faith and refusing to learn, they then change how they engage with you. Speaking of learning, I strongly recommend you, if you haven't already, to learn about sycophancy in LLMs.

5) It is true that Glaurung and Stockfish utilize many fundamental ideas that date to the 20th century. But this is comparable to saying that AMD's Zen 5 depends on fundamental ideas about CPU design from the 20th century, or that TSMC's "2 nm" process depends on fundamental ideas about transistor design. In all of these cases, the current state of the art is far better than in the past. And I think saying "Stockfish developers didn't help me" is strange when your project builds on Stockfish. Do you think we should stop giving credit to the originators of ideas in engine programming, because they did not help Stockfish or any other modern engine?
FireDragon761138
Posts: 80
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by FireDragon761138 »

sscg13 wrote: Thu Jan 29, 2026 5:46 am
2) You want to be able to learn from engines. But in my opinion the fundamental problem with this is not evaluation, but search. To put this bluntly, no matter what kind of evaluation you plug into Stockfish, as long as it is a sensible evaluation, the search will still be far superhuman. In my opinion the truly interesting question here is how one can make a computer "think" (i.e. search) like a human.
"Human-aligned" and "thinking like a human" aren't necessarily the same thing.

The latest study I did showed that we can achieve several orders of magnitude more computational efficiency in the search, simply by changing the evaluation training regimen. That means it's more parsimonious and energy efficient. You don't get 22+ moves to achieve engine evaluation stability, and the resulting evaluation is still above what is needed for the typical human chess player to analyze their games.
3) Evaluation stability is a strange metric. Let us recall Goodhart's Law: "when a measure becomes a target, it ceases to become a good measure." The current target widely used by the community is to win more games. This is a "good target" in the sense that it is nearly impossible to game, an engine typically wins more games by truly playing better chess. In contrast, there are many ways to "game" evaluation stability, the most blatant of which is to not even print a real evaluation. Besides, this has already been pointed out to you many times, but I will repeat again here, if you are looking for a "stable" engine, then Lc0, which uses averaging, by default is more "stable" than alpha-beta engines.
Evaluation stability was judged a good target by the Dragon development team, for the purposes of analysis. It's also logical self-evidently, since an evaluation that's more stable represents a settled plan.

Lc0 may be useful for research, but it's not computationally efficient. My goal with Theoria was to make a computationally efficient engine that approximated Lc0, with the predictable kind of outcomes you'ld expect out of an alpha-beta search engine.
sscg13
Posts: 24
Joined: Mon Apr 08, 2024 8:57 am
Full name: Chris Bao

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by sscg13 »

FireDragon761138 wrote: Thu Jan 29, 2026 5:57 am "Human-aligned" and "thinking like a human" aren't necessarily the same thing.

The latest study I did showed that we can achieve several orders of magnitude more computational efficiency in the search, simply by changing the evaluation training regimen. That means it's more parsimonious and energy efficient. You don't get 22+ moves to achieve engine evaluation stability, and the resulting evaluation is still above what is needed for the typical human chess player to analyze their games.
The thing is, the longer you let an engine search, the closer it will get to the "truth" of chess. Unless you are presuming the engine is already "perfect", it is bound to change its mind with more time.

(In fact, I will use this to argue on the contrary, the closer an engine is to "perfect", the more stable its output is, since its initial evaluation is more likely to be correct all along.)
FireDragon761138 wrote: Thu Jan 29, 2026 5:57 am Lc0 may be useful for research, but it's not computationally efficient.
There are CPU-based MCTS engines as well.

Would you care to respond to my other points?
FireDragon761138
Posts: 80
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by FireDragon761138 »

sscg13 wrote: Thu Jan 29, 2026 6:11 am
FireDragon761138 wrote: Thu Jan 29, 2026 5:57 am "Human-aligned" and "thinking like a human" aren't necessarily the same thing.

The latest study I did showed that we can achieve several orders of magnitude more computational efficiency in the search, simply by changing the evaluation training regimen. That means it's more parsimonious and energy efficient. You don't get 22+ moves to achieve engine evaluation stability, and the resulting evaluation is still above what is needed for the typical human chess player to analyze their games.
The thing is, the longer you let an engine search, the closer it will get to the "truth" of chess. Unless you are presuming the engine is already "perfect", it is bound to change its mind with more time.
I don't accept that as true axiomatically.

Also, Stockfish's search might possibly be more mathematically correct, but by the point Stockfish stabilizes (depth 22-26), there's far less that's strategically interpretable in its PV, it's more like noise from the standpoint of being strategically coherent and comprehensible, especially towards the end of the principle variations. It might have more in common with AI hallucinations than what seems obvious at first glance, and that would make sense given the highly fractured nature of the evaluations representations

If you think all this is trivial, just consider there's an emerging consensus that grinding chess puzzles don't correspond to corresponding increase in player elo. Compared to proven methods such as studying annotated master level games, tactical-oriented chess puzzles have much weaker evidence base. That has relevance to what we are discussing here. Humans need to understand chess not just in terms of forcing tactical sequences, but recognition of more holistic patterns and strategic ideas, to say nothing of game management skills in general.

There are CPU-based MCTS engines as well.
Which are no more efficient. Monte Carlo tree search itself seems to be relatively inefficient by nature compared to alpha-beta.
Would you care to respond to my other points?
Not really. Most of your other questions would involve debating philosophical first principles and assumptions, or they feed into what's essentially moral panic and sensationalistic cultural memes about large language models. My initial research began not with an LLM, but with Pytorch-NNUE and Cute Chess. I've used a variety of approaches to verify the research.
sscg13
Posts: 24
Joined: Mon Apr 08, 2024 8:57 am
Full name: Chris Bao

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by sscg13 »

FireDragon761138 wrote: Thu Jan 29, 2026 6:56 am I don't accept that as true axiomatically.
Which part do you not accept? That there is a "truth" to chess, that engines get closer to the "truth" as they search for longer, or that the "truth" will be different from an engine's initial analysis as long as the engine is not perfect?
FireDragon761138 wrote: Thu Jan 29, 2026 6:56 am Humans need to understand chess not just in terms of forcing tactical sequences, but recognition of more holistic patterns and strategic ideas, to say nothing of game management skills in general.
Any alpha-beta engine has "tactical sequences" built into its search.

Usually when an engine will change between multiple moves in its PV, it is because they all are very similar in evaluation. For instance, in case of a transposition whichever is preferred between (move A, move B) and (move B, move A) comes down to noise. Part of the "efficiency" of alpha-beta is that it is able to instantly update when a new, better move is found. Perhaps the "inefficiency" of MCTS is a natural price you must pay for that level of stability.
FireDragon761138
Posts: 80
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by FireDragon761138 »

sscg13 wrote: Thu Jan 29, 2026 7:20 am
FireDragon761138 wrote: Thu Jan 29, 2026 6:56 am I don't accept that as true axiomatically.
Which part do you not accept? That there is a "truth" to chess, that engines get closer to the "truth" as they search for longer, or that the "truth" will be different from an engine's initial analysis as long as the engine is not perfect?
I operate from a different epistemology and metaphysics. Truth is nuanced, dialectical, and contextual, and more like the ancient Greek concept of alitheia or the Hebrew emet, which implies reliability, faifthulness, not just mathematical correctness.
FireDragon761138 wrote: Thu Jan 29, 2026 6:56 am Humans need to understand chess not just in terms of forcing tactical sequences, but recognition of more holistic patterns and strategic ideas, to say nothing of game management skills in general.
Any alpha-beta engine has "tactical sequences" built into its search. [/quote]

Alpha beta is also capable of finding quiet, strategic moves... with the right evaluation function.
Usually when an engine will change between multiple moves in its PV, it is because they all are very similar. For instance, in case of a transposition whichever is preferred between (move A, move B) and (move B, move A) comes down to noise. Part of the "efficiency" of alpha-beta is that it is able to instantly update when a new, better move is found. Perhaps the "inefficiency" of MCTS is a natural price you must pay for that level of stability.
I think that's where the optimism algorithm can be helpful. Even without optimism, though, Theoria is still more stable than Stockfish, in terms of evaluation.

I'm hypothesizing that some of the manifold of chess information may not be reducible to finite mathematics without fractured representations Just a hunch or intuition I have. Maybe I'll explore it in the future. Maybe optimism helps smooth over some kind of roughness or gaps on the representation.
sscg13
Posts: 24
Joined: Mon Apr 08, 2024 8:57 am
Full name: Chris Bao

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by sscg13 »

FireDragon761138 wrote: Thu Jan 29, 2026 7:43 am I operate from a different epistemology and metaphysics. Truth is nuanced, dialectical, and contextual, and more like the ancient Greek concept of alitheia or the Hebrew emet, which implies reliability, faifthulness, not just mathematical correctness.
This is fine, just don't expect others to arrive at the same conclusions as you if they start from different premises.

From my perspective, one plays quiet strategic moves when they check that there are no tactics that threaten their position, which implies being good tactically. Also, if you watch grandmasters analyze, there are two types of "nonsense engine line", the first is a tactical sequence, and the second is (for lack of a better term) the engine having an "understanding" of the position that humans don't understand.

I think https://lichess.org/@/jk_182/blog/conce ... o/z5y4GSS3 is an interesting read, where the author concludes that human masters understand the same patterns as AlphaZero, just that AlphaZero is also much more tactically precise at forcing the exact patterns it likes.
I think that humans often know a concept and spot it in a position, but might not play the move because they miss a critical move somewhere in the variations. Also weighing up different ideas is often more difficult than finding the ideas in the first place. So the reason why the grandmasters missed a certain move might have more to do with their lower calculation ability compared to AlphaZero rather than the knowledge of the concepts.
FireDragon761138
Posts: 80
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by FireDragon761138 »

sscg13 wrote: Thu Jan 29, 2026 7:54 am
FireDragon761138 wrote: Thu Jan 29, 2026 7:43 am I operate from a different epistemology and metaphysics. Truth is nuanced, dialectical, and contextual, and more like the ancient Greek concept of alitheia or the Hebrew emet, which implies reliability, faifthulness, not just mathematical correctness.
This is fine, just don't expect others to arrive at the same conclusions as you if they start from different premises.

From my perspective, one plays quiet strategic moves when they check that there are no tactics that threaten their position, which implies being good tactically. Also, if you watch grandmasters analyze, there are two types of "nonsense engine line", the first is a tactical sequence, and the second is (for lack of a better term) the engine having an "understanding" of the position that humans don't understand.


Tactics are a big part of chess, obviously, and an engine would have no utility at all if it were weak at tactics. But the tactics should flow from positional advantages, to be humanly relevant in a generalisable way. The reason positions are often superior in chess is because they have latent potential to give rise to tactics. Think of knights in the center vs. the edge of the board, for instance. That latent potential can play into longe term advantages through prophylaxis and initiative far more than focusing on merely having complex concrete lines.

The focus on positional truth is about conceiving of chess in terms of narrative flow, which used to be an important part of chess pedagogy. Narrative is an important part of how humans acquire knowledge that integrates, and its more powerful for learning than simply teaching chess as an optimization problem.
I think https://lichess.org/@/jk_182/blog/conce ... o/z5y4GSS3 is an interesting read, where the author concludes that human masters understand the same patterns as AlphaZero, just that AlphaZero is also much more tactically precise at forcing the exact patterns it likes.
Of course, that doesn't surprise me at all and fits in with my overall philosophical understanding of the world. Chess has an inner logic that participates in a more universal logic or intelligibility. It's not a coincidence that engines like Lc0 identifies some of the same patterns in play that humans do (like the Evans Gambit- Lc0 discovered that through self-play all on its own against Stockfish several years ago).
Last edited by FireDragon761138 on Thu Jan 29, 2026 8:12 am, edited 1 time in total.
sscg13
Posts: 24
Joined: Mon Apr 08, 2024 8:57 am
Full name: Chris Bao

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by sscg13 »

FireDragon761138 wrote: Thu Jan 29, 2026 8:06 am Tactics are a big part of chess, obviously, and an engine would have no utility at all if it were weak at tactics. But the tactics should flow from positional advantages, to be humanly relevant in a generalisable way. The reason positions are often superior in chess is because they have latent potential to give rise to tactics. Think of knights in the center vs. the edge of the board, for instance. That latent potential can play into longe term advantages through prophylaxis and initiative far more than focusing on merely having complex concrete lines.

The focus on positional truth is about conceiving of chess in terms of narrative flow, which used to be an important part of chess pedagogy. Narrative is an important part of how humans acquire knowledge that integrates, and its more powerful for learning than simply teaching chess as an optimization problem.
Then I still think MCTS better represents your philosophy. MCTS operates more like the "expectation/potential" you speak of, while AB requires concrete lines to "prove" a position is good.

An engine might superficially appear to understand that a pawn structure is weak when in reality it has simply calculated it wins a pawn in 10 moves, for instance. We aren't able to distinguish this internally. It is similar to set theory in mathematics where one must be careful not to confuse their intuition (of what certain "set" constructions are) with the mathematical formalism.
Last edited by sscg13 on Thu Jan 29, 2026 8:20 am, edited 2 times in total.