SF16 vs. SF17 architecture differences and engine evaluation stability

Discussion of chess software programming and technical issues.

Moderator: Ras

FireDragon761138
Posts: 80
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

SF16 vs. SF17 architecture differences and engine evaluation stability

Post by FireDragon761138 »

We've tried two different runs trying to train networks for Theoria using the SF 17.1 code, and each time the network ends up producing an engine that has a less stable and coherent evaluation than Theoria built on SF 16.1 code. The systematicity (coherence) in particular takes a big hit. There's far more convergence between Theoria built on SF16.1 code and Lc0, than the version built on SF17.1.

What changed between SF 16.1 vs. 17.1 in terms of the search code? The training data for both engine is identical, just using leela96.bin.

I asked Claude about this, and this is what it said, "Between SF16.1 and SF17.1, Stockfish moved toward deeper integration of NNUE evaluation with search pruning decisions. The key difference is that SF17 allows the neural network evaluation to more directly influence forward pruning thresholds and late move reductions. While this creates stronger tactical play at very high node counts, it introduces evaluation volatility when the engine is operating under constraints—exactly what you're seeing with node-limited searches. The engine becomes more "jumpy" because small changes in position can trigger cascading effects through the pruning decisions."

The trend seems to mirror what we've observed so far, above 3600 elo, alpha-beta search engines all become progressively less and less coherent and stable in terms of evaluation scores. We're going to run some tests on Dragon to test its evaluation stability, since it sits right around the gap between these two engines (T-16.1 vs. T-17.1), and see how much this trend reflects some underlying reality, or not.
syzygy
Posts: 5896
Joined: Tue Feb 28, 2012 11:56 pm

Re: SF16 vs. SF17 architecture differences and engine evaluation stability

Post by syzygy »

What does it mean for an evaluation to be "coherent" ?
Claude wrote:exactly what you're seeing with node-limited searches
So you asked it to confirm your bias and Claude complied.
Sopel
Posts: 397
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: SF16 vs. SF17 architecture differences and engine evaluation stability

Post by Sopel »

We'd all benefit if you actually stuck to conversing with Claude to the fullest extent.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.