I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.
We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.
The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.
Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.
https://www.theoriachess.org/research/
Threat inputs and engine eval stability
Moderator: Ras
-
FireDragon761138
- Posts: 50
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn
-
syzygy
- Posts: 5846
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Threat inputs and engine eval stability
So all you changed was:
Code: Select all
--- ../../Stockfish-sf_16.1/src/evaluate.cpp 2024-02-24 18:15:04.000000000 +0100
+++ evaluate.cpp 2026-01-14 16:45:45.000000000 +0100
@@ -193,7 +193,7 @@
assert(!pos.checkers());
int simpleEval = simple_eval(pos, pos.side_to_move());
- bool smallNet = std::abs(simpleEval) > 1050;
+ bool smallNet = false; // MODIFIED: Always use big net
int nnueComplexity;
Code: Select all
--- ../../Stockfish-sf_16.1/src/misc.cpp 2024-02-24 18:15:04.000000000 +0100
+++ misc.cpp 2026-01-14 16:45:52.000000000 +0100
@@ -75,7 +75,7 @@
namespace {
// Version number or dev.
-constexpr std::string_view version = "16.1";
+constexpr std::string_view version = "0.1";
// Our fancy logging facility. The trick here is to replace cin.rdbuf() and
// cout.rdbuf() with two Tie objects that tie cin and cout to a file stream. We
@@ -159,7 +159,7 @@
// Stockfish version
std::string engine_info(bool to_uci) {
std::stringstream ss;
- ss << "Stockfish " << version << std::setfill('0');
+ ss << "Theoria " << version << std::setfill('0');
if constexpr (version == "dev")
{
@@ -185,7 +185,7 @@
#endif
}
- ss << (to_uci ? "\nid author " : " by ") << "the Stockfish developers (see AUTHORS file)";
+ ss << (to_uci ? "\nid author " : " ") << "Theoria Chess Project (A fork of Stockfish 16.1) - www.theoriachess.org";
return ss.str();
}So all you did was disable the small net.
Maybe it's time to put an end to this joke?
-
Jjaw
- Posts: 92
- Joined: Thu Jul 29, 2021 4:48 pm
- Full name: Joe Louvier
Re: Threat inputs and engine eval stability
Yeah , it's just another SF clone without any original ideas. There's been dozens of them over the years.
-
Rebel
- Posts: 7477
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: Threat inputs and engine eval stability
FireDragon761138 wrote: ↑Sun Jan 25, 2026 10:51 pm I ran a test set of 100 Lichess games at around 1500 elo - threat input in Theoria actually degrades eval stability, it's effectively tactical noise, and probably doesn't add much, if anything, to raw Elo performance (which is probably why it wasn't included in release), without seriously altering how the underlying Stockfish search works.
We'll have the report up on our website soon under Research. We'll also try developing the Theoria17 fork with HalfKAv2_hm instead of full threats, before testing it against the SF 16.1 fork.
The tests we ran on the Lichess dataset has confirmed that Theoria (built on the Stockfish 16.1 engine) is indeed more stable than Stockfish in terms of evaluation score, when using scale-invariate statistical analysis, which is highly suggestive of a clearer representation of positional chess knowledge than Stockfish. And when sharp tactical positions are excluded, Theoria with full threats is still significantly more stable than Stockfish 17.1.
Based on our research, we'll drop threat inputs from the Theoria project, as it just seems to be tactical noise, somewhat analogous to a crude "sharpness" control on an old TV set in effect (brings out fine surface details, doesn't really enhance the fidelity of the underlying image), and isn't really compatible with our design goals focused on coherent representations of positional chess knowledge.
https://www.theoriachess.org/research/
Code: Select all
No. Name Win Draw Loss Unf. Score Games %
----------------------------------------------------------
1 sf161 +280 =1148 -72 *0 854.0 1500 56.9%
2 theoria-6-0.1 +72 =1148 -280 *0 646.0 1500 43.1%
Total Games: 1500
White Wins: 278 (18.5%)
Black Wins: 74 (4.9%)
Draws: 1148 (76.5%)90% of coding is debugging, the other 10% is writing bugs.