Modern chess engines employ fundamentally different approaches to position analysis. Stockfish-17.1 represents the state-of-the-art in brute-force calculation, evaluating positions through deep alpha-beta pruning and neural network evaluation. Theoria 0.1 incorporates conceptual frameworks that annotate chess motifs and themes. This research investigates which approach yields more interpretable strategic analysis for human learners.
All you did was take Stockfish-16.1 and call it Theoria 0.1.
Stockfish-16.1 does not "incorporate conceptual frameworks that annotate chess motifs and themes" any less than Stockfish-17.1.
What is the grift exactly? Are you cheating to get a PhD?
No, we trained a new NNUE, disabled the secondary network, and disabled code that auto-downloads the network. We also experimented with turning off and on various algorithms in the engine.
Disabling the secondary network was that one line.
It seems you have now added an NNUE net to the download page. It wasn't there before.
So what you did was use the standard tools in the standard way on standard data to create a standard network.
In no way does replacing the NNUE result in "conceptual frameworks that annotate chess motifs and themes" that were not there already in SF-16.1.
And replacing the NNUE obviously does not turn SF-16.1 into a different engine that you could call "yours".
There is not even copyright on an NNUE.
FireDragon761138 wrote: ↑Tue Jan 27, 2026 11:08 pm
Threat inputs adds nothing positive to our engine, either in evaluation stability or elo.
I do not believe you are capable of properly testing this.
> Stockfish-17.1 represents the state-of-the-art in brute-force calculation
Brute force? A modern engine?
You'll probably reply "yes", as in my (fairly painful) interactions with you in the SF server you convinced me that you entirely lack self-awareness and literacy, so I'll spell it out for you: I am ridiculing the idea that any modern engine can be remotely accurately described as "brute force".
SPRT tells you whether a certain change is sufficiently good or not. It does not tell you if you have implemented the change properly.
I strongly suspect your implementation of threat inputs is not correct but I cannot say anything further without the exact code difference.
You have also been told multiple times that "averaging" used in MCTS by default is good for "stability" which is part of the reason why most people consider "stability" a nonsense metric.
Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
sscg13 wrote: ↑Thu Jan 29, 2026 8:32 am
Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.
The main problem we had with threat inputs was the loss of engine evaluation stability. It was fairly dramatic.
sscg13 wrote: ↑Thu Jan 29, 2026 8:32 am
Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.
The main problem we had with threat inputs was the loss of engine evaluation stability. It was fairly dramatic.
We is normally used when there’s more than one worker. Unlikely any other worker would stick around for more than five minutes with the degree of narcissism on display here. We tested is therefore oxymoronic.
sscg13 wrote: ↑Thu Jan 29, 2026 8:32 am
Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.
I am frankly shocked by this, since our scaling data suggests a 300 MB NNUE (L1=3072?) would be around 100 elo worse compared to 100 MB (L1=1024?), not to mention undertraining.