Threat inputs and engine eval stability

Discussion of chess software programming and technical issues.

Moderator: Ras

FireDragon761138
Posts: 71
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Threat inputs and engine eval stability

Post by FireDragon761138 »

I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.

We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.

The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.

Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.

https://www.theoriachess.org/research/
syzygy
Posts: 5868
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threat inputs and engine eval stability

Post by syzygy »

So all you changed was:

Code: Select all

--- ../../Stockfish-sf_16.1/src/evaluate.cpp	2024-02-24 18:15:04.000000000 +0100
+++ evaluate.cpp	2026-01-14 16:45:45.000000000 +0100
@@ -193,7 +193,7 @@
     assert(!pos.checkers());
 
     int  simpleEval = simple_eval(pos, pos.side_to_move());
-    bool smallNet   = std::abs(simpleEval) > 1050;
+    bool smallNet   = false;  // MODIFIED: Always use big net
 
     int nnueComplexity;
and your "team" managed to figure out how to rename it:

Code: Select all

--- ../../Stockfish-sf_16.1/src/misc.cpp	2024-02-24 18:15:04.000000000 +0100
+++ misc.cpp	2026-01-14 16:45:52.000000000 +0100
@@ -75,7 +75,7 @@
 namespace {
 
 // Version number or dev.
-constexpr std::string_view version = "16.1";
+constexpr std::string_view version = "0.1";
 
 // Our fancy logging facility. The trick here is to replace cin.rdbuf() and
 // cout.rdbuf() with two Tie objects that tie cin and cout to a file stream. We
@@ -159,7 +159,7 @@
 // Stockfish version
 std::string engine_info(bool to_uci) {
     std::stringstream ss;
-    ss << "Stockfish " << version << std::setfill('0');
+    ss << "Theoria " << version << std::setfill('0');
 
     if constexpr (version == "dev")
     {
@@ -185,7 +185,7 @@
 #endif
     }
 
-    ss << (to_uci ? "\nid author " : " by ") << "the Stockfish developers (see AUTHORS file)";
+    ss << (to_uci ? "\nid author " : " ") << "Theoria Chess Project (A fork of Stockfish 16.1) - www.theoriachess.org";
 
     return ss.str();
 }
You seem to have messed up the Makefile, with "make build" no longer downloading the net. But "make net" still downloads SF16.1's big net.

So all you did was disable the small net.

Maybe it's time to put an end to this joke?
Jjaw
Posts: 92
Joined: Thu Jul 29, 2021 4:48 pm
Full name: Joe Louvier

Re: Threat inputs and engine eval stability

Post by Jjaw »

Yeah , it's just another SF clone without any original ideas. There's been dozens of them over the years.
User avatar
Rebel
Posts: 7477
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Threat inputs and engine eval stability

Post by Rebel »

FireDragon761138 wrote: Sun Jan 25, 2026 10:51 pm I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.

We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.

The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.

Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.

https://www.theoriachess.org/research/

Code: Select all

No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 sf161         +280 =1148  -72   *0  854.0  1500   56.9%
  2 theoria-6-0.1  +72 =1148 -280   *0  646.0  1500   43.1%

Total Games:    1500
White Wins:      278 (18.5%)
Black Wins:       74 (4.9%)
Draws:          1148 (76.5%)
Can you explain what you are trying to achieve ?
90% of coding is debugging, the other 10% is writing bugs.
FireDragon761138
Posts: 71
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Threat inputs and engine eval stability

Post by FireDragon761138 »

Rebel wrote: Tue Jan 27, 2026 3:32 pm
FireDragon761138 wrote: Sun Jan 25, 2026 10:51 pm I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.

We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.

The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.

Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.

https://www.theoriachess.org/research/

Code: Select all

No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 sf161         +280 =1148  -72   *0  854.0  1500   56.9%
  2 theoria-6-0.1  +72 =1148 -280   *0  646.0  1500   43.1%

Total Games:    1500
White Wins:      278 (18.5%)
Black Wins:       74 (4.9%)
Draws:          1148 (76.5%)
Can you explain what you are trying to achieve ?
In simple terms, an engine with the highest evaluation stability and strategic coherence in its principle variations, for human-aligned analysis.

We've done some more analyses and will have them on our website in a few days. Threat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
syzygy
Posts: 5868
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threat inputs and engine eval stability

Post by syzygy »

FireDragon761138 wrote: Tue Jan 27, 2026 11:08 pmThreat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
Stop calling it "our" engine already.

And stop telling us nonsense. You took SF16.1 which does not have threat inputs to begin with.

Maybe just tell us what the joke is about. Are you testing an LLM to see how you can bluff your way past experts? (It seems you can't.)
FireDragon761138
Posts: 71
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Threat inputs and engine eval stability

Post by FireDragon761138 »

syzygy wrote: Wed Jan 28, 2026 12:23 am
FireDragon761138 wrote: Tue Jan 27, 2026 11:08 pmThreat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
Stop calling it "our" engine already.
It is our project, released under GPL. We had no technical assistance from the Stockfish community on this project. In fact I was ridiculed. So my obligations are to comply only with the legal requirements.

And stop telling us nonsense. You took SF16.1 which does not have threat inputs to begin with. [/quote]

I implementetd it on Stockfish 17.
Maybe just tell us what the joke is about. Are you testing an LLM to see how you can bluff your way past experts? (It seems you can't.)
It's not a joke.
syzygy
Posts: 5868
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threat inputs and engine eval stability

Post by syzygy »

FireDragon761138 wrote: Wed Jan 28, 2026 12:59 am
syzygy wrote: Wed Jan 28, 2026 12:23 am
FireDragon761138 wrote: Tue Jan 27, 2026 11:08 pmThreat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
Stop calling it "our" engine already.
It is our project, released under GPL. We had no technical assistance from the Stockfish community on this project. In fact I was ridiculed. So my obligations are to comply only with the legal requirements.
You changed ONE line in a totally obvious way. This does not give you any right to claim authorship or any other kind of credit.
It seems you are unable to recognise this. You have never seen code before?

Or did I overlook something? Please explain.
And stop telling us nonsense. You took SF16.1 which does not have threat inputs to begin with.
I implementetd it on Stockfish 17.
Implemented what.
syzygy
Posts: 5868
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threat inputs and engine eval stability

Post by syzygy »

https://www.theoriachess.org/research/c ... -analysis/
Modern chess engines employ fundamentally different approaches to position analysis. Stockfish-17.1 represents the state-of-the-art in brute-force calculation, evaluating positions through deep alpha-beta pruning and neural network evaluation. Theoria 0.1 incorporates conceptual frameworks that annotate chess motifs and themes. This research investigates which approach yields more interpretable strategic analysis for human learners.
All you did was take Stockfish-16.1 and call it Theoria 0.1.
Stockfish-16.1 does not "incorporate conceptual frameworks that annotate chess motifs and themes" any less than Stockfish-17.1.

What is the grift exactly? Are you cheating to get a PhD?
FireDragon761138
Posts: 71
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Threat inputs and engine eval stability

Post by FireDragon761138 »

syzygy wrote: Wed Jan 28, 2026 1:17 am https://www.theoriachess.org/research/c ... -analysis/
Modern chess engines employ fundamentally different approaches to position analysis. Stockfish-17.1 represents the state-of-the-art in brute-force calculation, evaluating positions through deep alpha-beta pruning and neural network evaluation. Theoria 0.1 incorporates conceptual frameworks that annotate chess motifs and themes. This research investigates which approach yields more interpretable strategic analysis for human learners.
All you did was take Stockfish-16.1 and call it Theoria 0.1.
Stockfish-16.1 does not "incorporate conceptual frameworks that annotate chess motifs and themes" any less than Stockfish-17.1.

What is the grift exactly? Are you cheating to get a PhD?
No, we trained a new NNUE, disabled the secondary network, and disabled code that auto-downloads the network. We also experimented with turning off and on various algorithms in the engine.