I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.
We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.
The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.
Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.
https://www.theoriachess.org/research/
Threat inputs and engine eval stability
Moderator: Ras
-
FireDragon761138
- Posts: 71
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn
-
syzygy
- Posts: 5868
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Threat inputs and engine eval stability
So all you changed was:
Code: Select all
--- ../../Stockfish-sf_16.1/src/evaluate.cpp 2024-02-24 18:15:04.000000000 +0100
+++ evaluate.cpp 2026-01-14 16:45:45.000000000 +0100
@@ -193,7 +193,7 @@
assert(!pos.checkers());
int simpleEval = simple_eval(pos, pos.side_to_move());
- bool smallNet = std::abs(simpleEval) > 1050;
+ bool smallNet = false; // MODIFIED: Always use big net
int nnueComplexity;
Code: Select all
--- ../../Stockfish-sf_16.1/src/misc.cpp 2024-02-24 18:15:04.000000000 +0100
+++ misc.cpp 2026-01-14 16:45:52.000000000 +0100
@@ -75,7 +75,7 @@
namespace {
// Version number or dev.
-constexpr std::string_view version = "16.1";
+constexpr std::string_view version = "0.1";
// Our fancy logging facility. The trick here is to replace cin.rdbuf() and
// cout.rdbuf() with two Tie objects that tie cin and cout to a file stream. We
@@ -159,7 +159,7 @@
// Stockfish version
std::string engine_info(bool to_uci) {
std::stringstream ss;
- ss << "Stockfish " << version << std::setfill('0');
+ ss << "Theoria " << version << std::setfill('0');
if constexpr (version == "dev")
{
@@ -185,7 +185,7 @@
#endif
}
- ss << (to_uci ? "\nid author " : " by ") << "the Stockfish developers (see AUTHORS file)";
+ ss << (to_uci ? "\nid author " : " ") << "Theoria Chess Project (A fork of Stockfish 16.1) - www.theoriachess.org";
return ss.str();
}So all you did was disable the small net.
Maybe it's time to put an end to this joke?
-
Jjaw
- Posts: 92
- Joined: Thu Jul 29, 2021 4:48 pm
- Full name: Joe Louvier
Re: Threat inputs and engine eval stability
Yeah , it's just another SF clone without any original ideas. There's been dozens of them over the years.
-
Rebel
- Posts: 7477
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: Threat inputs and engine eval stability
FireDragon761138 wrote: ↑Sun Jan 25, 2026 10:51 pm I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.
We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.
The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.
Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.
https://www.theoriachess.org/research/
Code: Select all
No. Name Win Draw Loss Unf. Score Games %
----------------------------------------------------------
1 sf161 +280 =1148 -72 *0 854.0 1500 56.9%
2 theoria-6-0.1 +72 =1148 -280 *0 646.0 1500 43.1%
Total Games: 1500
White Wins: 278 (18.5%)
Black Wins: 74 (4.9%)
Draws: 1148 (76.5%)90% of coding is debugging, the other 10% is writing bugs.
-
FireDragon761138
- Posts: 71
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn
Re: Threat inputs and engine eval stability
In simple terms, an engine with the highest evaluation stability and strategic coherence in its principle variations, for human-aligned analysis.Rebel wrote: ↑Tue Jan 27, 2026 3:32 pmFireDragon761138 wrote: ↑Sun Jan 25, 2026 10:51 pm I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.
We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.
The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.
Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.
https://www.theoriachess.org/research/Can you explain what you are trying to achieve ?Code: Select all
No. Name Win Draw Loss Unf. Score Games % ---------------------------------------------------------- 1 sf161 +280 =1148 -72 *0 854.0 1500 56.9% 2 theoria-6-0.1 +72 =1148 -280 *0 646.0 1500 43.1% Total Games: 1500 White Wins: 278 (18.5%) Black Wins: 74 (4.9%) Draws: 1148 (76.5%)
We've done some more analyses and will have them on our website in a few days. Threat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
-
syzygy
- Posts: 5868
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Threat inputs and engine eval stability
Stop calling it "our" engine already.FireDragon761138 wrote: ↑Tue Jan 27, 2026 11:08 pmThreat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
And stop telling us nonsense. You took SF16.1 which does not have threat inputs to begin with.
Maybe just tell us what the joke is about. Are you testing an LLM to see how you can bluff your way past experts? (It seems you can't.)
-
FireDragon761138
- Posts: 71
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn
Re: Threat inputs and engine eval stability
It is our project, released under GPL. We had no technical assistance from the Stockfish community on this project. In fact I was ridiculed. So my obligations are to comply only with the legal requirements.syzygy wrote: ↑Wed Jan 28, 2026 12:23 amStop calling it "our" engine already.FireDragon761138 wrote: ↑Tue Jan 27, 2026 11:08 pmThreat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
And stop telling us nonsense. You took SF16.1 which does not have threat inputs to begin with. [/quote]
I implementetd it on Stockfish 17.
It's not a joke.Maybe just tell us what the joke is about. Are you testing an LLM to see how you can bluff your way past experts? (It seems you can't.)
-
syzygy
- Posts: 5868
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Threat inputs and engine eval stability
You changed ONE line in a totally obvious way. This does not give you any right to claim authorship or any other kind of credit.FireDragon761138 wrote: ↑Wed Jan 28, 2026 12:59 amIt is our project, released under GPL. We had no technical assistance from the Stockfish community on this project. In fact I was ridiculed. So my obligations are to comply only with the legal requirements.syzygy wrote: ↑Wed Jan 28, 2026 12:23 amStop calling it "our" engine already.FireDragon761138 wrote: ↑Tue Jan 27, 2026 11:08 pmThreat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
It seems you are unable to recognise this. You have never seen code before?
Or did I overlook something? Please explain.
Implemented what.I implementetd it on Stockfish 17.And stop telling us nonsense. You took SF16.1 which does not have threat inputs to begin with.
-
syzygy
- Posts: 5868
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Threat inputs and engine eval stability
https://www.theoriachess.org/research/c ... -analysis/
Stockfish-16.1 does not "incorporate conceptual frameworks that annotate chess motifs and themes" any less than Stockfish-17.1.
What is the grift exactly? Are you cheating to get a PhD?
All you did was take Stockfish-16.1 and call it Theoria 0.1.Modern chess engines employ fundamentally different approaches to position analysis. Stockfish-17.1 represents the state-of-the-art in brute-force calculation, evaluating positions through deep alpha-beta pruning and neural network evaluation. Theoria 0.1 incorporates conceptual frameworks that annotate chess motifs and themes. This research investigates which approach yields more interpretable strategic analysis for human learners.
Stockfish-16.1 does not "incorporate conceptual frameworks that annotate chess motifs and themes" any less than Stockfish-17.1.
What is the grift exactly? Are you cheating to get a PhD?
-
FireDragon761138
- Posts: 71
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn
Re: Threat inputs and engine eval stability
No, we trained a new NNUE, disabled the secondary network, and disabled code that auto-downloads the network. We also experimented with turning off and on various algorithms in the engine.syzygy wrote: ↑Wed Jan 28, 2026 1:17 am https://www.theoriachess.org/research/c ... -analysis/All you did was take Stockfish-16.1 and call it Theoria 0.1.Modern chess engines employ fundamentally different approaches to position analysis. Stockfish-17.1 represents the state-of-the-art in brute-force calculation, evaluating positions through deep alpha-beta pruning and neural network evaluation. Theoria 0.1 incorporates conceptual frameworks that annotate chess motifs and themes. This research investigates which approach yields more interpretable strategic analysis for human learners.
Stockfish-16.1 does not "incorporate conceptual frameworks that annotate chess motifs and themes" any less than Stockfish-17.1.
What is the grift exactly? Are you cheating to get a PhD?