SF16 vs. SF17 architecture differences and engine evaluation stability

FireDragon761138 · Post by **FireDragon761138** » Mon Feb 02, 2026 10:06 pm

We've tried two different runs trying to train networks for Theoria using the SF 17.1 code, and each time the network ends up producing an engine that has a less stable and coherent evaluation than Theoria built on SF 16.1 code. The systematicity (coherence) in particular takes a big hit. There's far more convergence between Theoria built on SF16.1 code and Lc0, than the version built on SF17.1.

What changed between SF 16.1 vs. 17.1 in terms of the search code? The training data for both engine is identical, just using leela96.bin.

I asked Claude about this, and this is what it said, "Between SF16.1 and SF17.1, Stockfish moved toward deeper integration of NNUE evaluation with search pruning decisions. The key difference is that SF17 allows the neural network evaluation to more directly influence forward pruning thresholds and late move reductions. While this creates stronger tactical play at very high node counts, it introduces evaluation volatility when the engine is operating under constraints—exactly what you're seeing with node-limited searches. The engine becomes more "jumpy" because small changes in position can trigger cascading effects through the pruning decisions."

The trend seems to mirror what we've observed so far, above 3600 elo, alpha-beta search engines all become progressively less and less coherent and stable in terms of evaluation scores. We're going to run some tests on Dragon to test its evaluation stability, since it sits right around the gap between these two engines (T-16.1 vs. T-17.1), and see how much this trend reflects some underlying reality, or not.

syzygy · Post by **syzygy** » Tue Feb 03, 2026 1:19 am

What does it mean for an evaluation to be "coherent" ?

Claude wrote:exactly what you're seeing with node-limited searches

So you asked it to confirm your bias and Claude complied.

Sopel · Post by **Sopel** » Tue Feb 03, 2026 3:16 pm

We'd all benefit if you actually stuck to conversing with Claude to the fullest extent.

SF16 vs. SF17 architecture differences and engine evaluation stability

SF16 vs. SF17 architecture differences and engine evaluation stability

Re: SF16 vs. SF17 architecture differences and engine evaluation stability

Re: SF16 vs. SF17 architecture differences and engine evaluation stability