Threat inputs and engine eval stability

FireDragon761138 · Post by **FireDragon761138** » Sun Jan 25, 2026 10:51 pm

I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.

We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.

The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.

Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.

https://www.theoriachess.org/research/

syzygy · Post by **syzygy** » Tue Jan 27, 2026 12:12 am

FireDragon761138 wrote: ↑Sun Jan 25, 2026 10:51 pm https://www.theoriachess.org/research/

So all you changed was:

Code: Select all

--- ../../Stockfish-sf_16.1/src/evaluate.cpp	2024-02-24 18:15:04.000000000 +0100
+++ evaluate.cpp	2026-01-14 16:45:45.000000000 +0100
@@ -193,7 +193,7 @@
     assert(!pos.checkers());
 
     int  simpleEval = simple_eval(pos, pos.side_to_move());
-    bool smallNet   = std::abs(simpleEval) > 1050;
+    bool smallNet   = false;  // MODIFIED: Always use big net
 
     int nnueComplexity;

and your "team" managed to figure out how to rename it:

Code: Select all

--- ../../Stockfish-sf_16.1/src/misc.cpp	2024-02-24 18:15:04.000000000 +0100
+++ misc.cpp	2026-01-14 16:45:52.000000000 +0100
@@ -75,7 +75,7 @@
 namespace {
 
 // Version number or dev.
-constexpr std::string_view version = "16.1";
+constexpr std::string_view version = "0.1";
 
 // Our fancy logging facility. The trick here is to replace cin.rdbuf() and
 // cout.rdbuf() with two Tie objects that tie cin and cout to a file stream. We
@@ -159,7 +159,7 @@
 // Stockfish version
 std::string engine_info(bool to_uci) {
     std::stringstream ss;
-    ss << "Stockfish " << version << std::setfill('0');
+    ss << "Theoria " << version << std::setfill('0');
 
     if constexpr (version == "dev")
     {
@@ -185,7 +185,7 @@
 #endif
     }
 
-    ss << (to_uci ? "\nid author " : " by ") << "the Stockfish developers (see AUTHORS file)";
+    ss << (to_uci ? "\nid author " : " ") << "Theoria Chess Project (A fork of Stockfish 16.1) - www.theoriachess.org";
 
     return ss.str();
 }

You seem to have messed up the Makefile, with "make build" no longer downloading the net. But "make net" still downloads SF16.1's big net.

So all you did was disable the small net.

Maybe it's time to put an end to this joke?

Jjaw · Post by **Jjaw** » Tue Jan 27, 2026 7:20 am

Yeah , it's just another SF clone without any original ideas. There's been dozens of them over the years.

Rebel · Post by **Rebel** » Tue Jan 27, 2026 3:32 pm

FireDragon761138 wrote: ↑Sun Jan 25, 2026 10:51 pm I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.

We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.

The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.

Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.

https://www.theoriachess.org/research/

Code: Select all

No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 sf161         +280 =1148  -72   *0  854.0  1500   56.9%
  2 theoria-6-0.1  +72 =1148 -280   *0  646.0  1500   43.1%

Total Games:    1500
White Wins:      278 (18.5%)
Black Wins:       74 (4.9%)
Draws:          1148 (76.5%)

Can you explain what you are trying to achieve ?

FireDragon761138 · Post by **FireDragon761138** » Tue Jan 27, 2026 11:08 pm

Rebel wrote: ↑Tue Jan 27, 2026 3:32 pm
FireDragon761138 wrote: ↑Sun Jan 25, 2026 10:51 pm I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.

We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.

The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.

Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.

https://www.theoriachess.org/research/
Code: Select all
No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 sf161         +280 =1148  -72   *0  854.0  1500   56.9%
  2 theoria-6-0.1  +72 =1148 -280   *0  646.0  1500   43.1%

Total Games:    1500
White Wins:      278 (18.5%)
Black Wins:       74 (4.9%)
Draws:          1148 (76.5%)
Can you explain what you are trying to achieve ?

In simple terms, an engine with the highest evaluation stability and strategic coherence in its principle variations, for human-aligned analysis.

We've done some more analyses and will have them on our website in a few days. Threat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.

syzygy · Post by **syzygy** » Wed Jan 28, 2026 12:23 am

FireDragon761138 wrote: ↑Tue Jan 27, 2026 11:08 pmThreat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.

Stop calling it "our" engine already.

And stop telling us nonsense. You took SF16.1 which does not have threat inputs to begin with.

Maybe just tell us what the joke is about. Are you testing an LLM to see how you can bluff your way past experts? (It seems you can't.)

FireDragon761138 · Post by **FireDragon761138** » Wed Jan 28, 2026 12:59 am

syzygy wrote: ↑Wed Jan 28, 2026 12:23 am
FireDragon761138 wrote: ↑Tue Jan 27, 2026 11:08 pmThreat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
Stop calling it "our" engine already.

It is our project, released under GPL. We had no technical assistance from the Stockfish community on this project. In fact I was ridiculed. So my obligations are to comply only with the legal requirements.

And stop telling us nonsense. You took SF16.1 which does not have threat inputs to begin with. [/quote]

I implementetd it on Stockfish 17.

Maybe just tell us what the joke is about. Are you testing an LLM to see how you can bluff your way past experts? (It seems you can't.)

It's not a joke.

syzygy · Post by **syzygy** » Wed Jan 28, 2026 1:08 am

FireDragon761138 wrote: ↑Wed Jan 28, 2026 12:59 am
syzygy wrote: ↑Wed Jan 28, 2026 12:23 am
FireDragon761138 wrote: ↑Tue Jan 27, 2026 11:08 pmThreat inputs adds nothing positive to our engine, either in evaluation stability or elo. That's what the point of this thread was about. Just to report on our experience with threat inputs.
Stop calling it "our" engine already.

It is our project, released under GPL. We had no technical assistance from the Stockfish community on this project. In fact I was ridiculed. So my obligations are to comply only with the legal requirements.

You changed ONE line in a totally obvious way. This does not give you any right to claim authorship or any other kind of credit.
It seems you are unable to recognise this. You have never seen code before?

Or did I overlook something? Please explain.

And stop telling us nonsense. You took SF16.1 which does not have threat inputs to begin with.
I implementetd it on Stockfish 17.

Implemented what.

syzygy · Post by **syzygy** » Wed Jan 28, 2026 1:17 am

https://www.theoriachess.org/research/c ... -analysis/

Modern chess engines employ fundamentally different approaches to position analysis. Stockfish-17.1 represents the state-of-the-art in brute-force calculation, evaluating positions through deep alpha-beta pruning and neural network evaluation. Theoria 0.1 incorporates conceptual frameworks that annotate chess motifs and themes. This research investigates which approach yields more interpretable strategic analysis for human learners.

All you did was take Stockfish-16.1 and call it Theoria 0.1.
Stockfish-16.1 does not "incorporate conceptual frameworks that annotate chess motifs and themes" any less than Stockfish-17.1.

What is the grift exactly? Are you cheating to get a PhD?

FireDragon761138 · Post by **FireDragon761138** » Wed Jan 28, 2026 2:16 am

syzygy wrote: ↑Wed Jan 28, 2026 1:17 am https://www.theoriachess.org/research/c ... -analysis/
Modern chess engines employ fundamentally different approaches to position analysis. Stockfish-17.1 represents the state-of-the-art in brute-force calculation, evaluating positions through deep alpha-beta pruning and neural network evaluation. Theoria 0.1 incorporates conceptual frameworks that annotate chess motifs and themes. This research investigates which approach yields more interpretable strategic analysis for human learners.
All you did was take Stockfish-16.1 and call it Theoria 0.1.
Stockfish-16.1 does not "incorporate conceptual frameworks that annotate chess motifs and themes" any less than Stockfish-17.1.

What is the grift exactly? Are you cheating to get a PhD?

No, we trained a new NNUE, disabled the secondary network, and disabled code that auto-downloads the network. We also experimented with turning off and on various algorithms in the engine.

Threat inputs and engine eval stability

Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability