Threat inputs and engine eval stability

Discussion of chess software programming and technical issues.

Moderator: Ras

FireDragon761138
Posts: 50
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Threat inputs and engine eval stability

Post by FireDragon761138 »

I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.

We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.

The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.

Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.

https://www.theoriachess.org/research/
syzygy
Posts: 5846
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threat inputs and engine eval stability

Post by syzygy »

So all you changed was:

Code: Select all

--- ../../Stockfish-sf_16.1/src/evaluate.cpp	2024-02-24 18:15:04.000000000 +0100
+++ evaluate.cpp	2026-01-14 16:45:45.000000000 +0100
@@ -193,7 +193,7 @@
     assert(!pos.checkers());
 
     int  simpleEval = simple_eval(pos, pos.side_to_move());
-    bool smallNet   = std::abs(simpleEval) > 1050;
+    bool smallNet   = false;  // MODIFIED: Always use big net
 
     int nnueComplexity;
and your "team" managed to figure out how to rename it:

Code: Select all

--- ../../Stockfish-sf_16.1/src/misc.cpp	2024-02-24 18:15:04.000000000 +0100
+++ misc.cpp	2026-01-14 16:45:52.000000000 +0100
@@ -75,7 +75,7 @@
 namespace {
 
 // Version number or dev.
-constexpr std::string_view version = "16.1";
+constexpr std::string_view version = "0.1";
 
 // Our fancy logging facility. The trick here is to replace cin.rdbuf() and
 // cout.rdbuf() with two Tie objects that tie cin and cout to a file stream. We
@@ -159,7 +159,7 @@
 // Stockfish version
 std::string engine_info(bool to_uci) {
     std::stringstream ss;
-    ss << "Stockfish " << version << std::setfill('0');
+    ss << "Theoria " << version << std::setfill('0');
 
     if constexpr (version == "dev")
     {
@@ -185,7 +185,7 @@
 #endif
     }
 
-    ss << (to_uci ? "\nid author " : " by ") << "the Stockfish developers (see AUTHORS file)";
+    ss << (to_uci ? "\nid author " : " ") << "Theoria Chess Project (A fork of Stockfish 16.1) - www.theoriachess.org";
 
     return ss.str();
 }
You seem to have messed up the Makefile, with "make build" no longer downloading the net. But "make net" still downloads SF16.1's big net.

So all you did was disable the small net.

Maybe it's time to put an end to this joke?
Jjaw
Posts: 92
Joined: Thu Jul 29, 2021 4:48 pm
Full name: Joe Louvier

Re: Threat inputs and engine eval stability

Post by Jjaw »

Yeah , it's just another SF clone without any original ideas. There's been dozens of them over the years.
User avatar
Rebel
Posts: 7477
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Threat inputs and engine eval stability

Post by Rebel »

FireDragon761138 wrote: Sun Jan 25, 2026 10:51 pm I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.

We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.

The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.

Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.

https://www.theoriachess.org/research/

Code: Select all

No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 sf161         +280 =1148  -72   *0  854.0  1500   56.9%
  2 theoria-6-0.1  +72 =1148 -280   *0  646.0  1500   43.1%

Total Games:    1500
White Wins:      278 (18.5%)
Black Wins:       74 (4.9%)
Draws:          1148 (76.5%)
Can you explain what you are trying to achieve ?
90% of coding is debugging, the other 10% is writing bugs.