Threat inputs and engine eval stability

Discussion of chess software programming and technical issues.

Moderator: Ras

FireDragon761138
Posts: 73
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Threat inputs and engine eval stability

Post by FireDragon761138 »

sscg13 wrote: Thu Jan 29, 2026 3:08 pm
FireDragon761138 wrote: Thu Jan 29, 2026 1:27 pm
sscg13 wrote: Thu Jan 29, 2026 8:32 am Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.
I am frankly shocked by this, since our scaling data suggests a 300 MB NNUE (L1=3072?) would be around 100 elo worse compared to 100 MB (L1=1024?), not to mention undertraining.
Maybe that has something to do with different training data. Stockfish probably benefits alot more from speed if it's being trained on heterogeneous types of data optimized for finding forcing moves, instead of evaluating relatively quiet positions.
syzygy
Posts: 5868
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threat inputs and engine eval stability

Post by syzygy »

chrisw wrote: Thu Jan 29, 2026 1:41 pm
FireDragon761138 wrote: Thu Jan 29, 2026 1:27 pm
sscg13 wrote: Thu Jan 29, 2026 8:32 am Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.

The main problem we had with threat inputs was the loss of engine evaluation stability. It was fairly dramatic.
We is normally used when there’s more than one worker. Unlikely any other worker would stick around for more than five minutes with the degree of narcissism on display here. We tested is therefore oxymoronic.
Royal we.
mar
Posts: 2673
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Threat inputs and engine eval stability

Post by mar »

syzygy wrote: Thu Jan 29, 2026 11:28 pm Royal we.
pluralis maiestatis vs pluralis modestiae, both use 'we' and both imply something completely different, depends on the point of view I guess
syzygy
Posts: 5868
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threat inputs and engine eval stability

Post by syzygy »

mar wrote: Thu Jan 29, 2026 11:49 pm
syzygy wrote: Thu Jan 29, 2026 11:28 pm Royal we.
pluralis maiestatis vs pluralis modestiae, both use 'we' and both imply something completely different, depends on the point of view I guess
I don't think it is humility that led to the use of "we" here.
FireDragon761138
Posts: 73
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Threat inputs and engine eval stability

Post by FireDragon761138 »

chrisw wrote: Thu Jan 29, 2026 1:41 pm
FireDragon761138 wrote: Thu Jan 29, 2026 1:27 pm
sscg13 wrote: Thu Jan 29, 2026 8:32 am Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.

The main problem we had with threat inputs was the loss of engine evaluation stability. It was fairly dramatic.
We is normally used when there’s more than one worker. Unlikely any other worker would stick around for more than five minutes with the degree of narcissism on display here. We tested is therefore oxymoronic.
That assertion is untrue. There are two humans working on this project.
chrisw
Posts: 4788
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Threat inputs and engine eval stability

Post by chrisw »

FireDragon761138 wrote: Fri Jan 30, 2026 12:34 am
chrisw wrote: Thu Jan 29, 2026 1:41 pm
FireDragon761138 wrote: Thu Jan 29, 2026 1:27 pm
sscg13 wrote: Thu Jan 29, 2026 8:32 am Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.

The main problem we had with threat inputs was the loss of engine evaluation stability. It was fairly dramatic.
We is normally used when there’s more than one worker. Unlikely any other worker would stick around for more than five minutes with the degree of narcissism on display here. We tested is therefore oxymoronic.
That assertion is untrue. There are two humans working on this project.
one teenager using ChatGPT and constructed persona as a mask.