Threat inputs and engine eval stability

FireDragon761138 · Post by **FireDragon761138** » Thu Jan 29, 2026 8:02 pm

sscg13 wrote: ↑Thu Jan 29, 2026 3:08 pm
FireDragon761138 wrote: ↑Thu Jan 29, 2026 1:27 pm
sscg13 wrote: ↑Thu Jan 29, 2026 8:32 am Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.
I am frankly shocked by this, since our scaling data suggests a 300 MB NNUE (L1=3072?) would be around 100 elo worse compared to 100 MB (L1=1024?), not to mention undertraining.

Maybe that has something to do with different training data. Stockfish probably benefits alot more from speed if it's being trained on heterogeneous types of data optimized for finding forcing moves, instead of evaluating relatively quiet positions.

syzygy · Post by **syzygy** » Thu Jan 29, 2026 11:28 pm

chrisw wrote: ↑Thu Jan 29, 2026 1:41 pm
FireDragon761138 wrote: ↑Thu Jan 29, 2026 1:27 pm
sscg13 wrote: ↑Thu Jan 29, 2026 8:32 am Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.

The main problem we had with threat inputs was the loss of engine evaluation stability. It was fairly dramatic.
We is normally used when there’s more than one worker. Unlikely any other worker would stick around for more than five minutes with the degree of narcissism on display here. We tested is therefore oxymoronic.

Royal we.

mar · Post by **mar** » Thu Jan 29, 2026 11:49 pm

syzygy wrote: ↑Thu Jan 29, 2026 11:28 pm Royal we.

pluralis maiestatis vs pluralis modestiae, both use 'we' and both imply something completely different, depends on the point of view I guess

syzygy · Post by **syzygy** » Fri Jan 30, 2026 12:11 am

mar wrote: ↑Thu Jan 29, 2026 11:49 pm
syzygy wrote: ↑Thu Jan 29, 2026 11:28 pm Royal we.
pluralis maiestatis vs pluralis modestiae, both use 'we' and both imply something completely different, depends on the point of view I guess

I don't think it is humility that led to the use of "we" here.

FireDragon761138 · Post by **FireDragon761138** » Fri Jan 30, 2026 12:34 am

chrisw wrote: ↑Thu Jan 29, 2026 1:41 pm
FireDragon761138 wrote: ↑Thu Jan 29, 2026 1:27 pm
sscg13 wrote: ↑Thu Jan 29, 2026 8:32 am Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.

The main problem we had with threat inputs was the loss of engine evaluation stability. It was fairly dramatic.
We is normally used when there’s more than one worker. Unlikely any other worker would stick around for more than five minutes with the degree of narcissism on display here. We tested is therefore oxymoronic.

That assertion is untrue. There are two humans working on this project.

chrisw · Post by **chrisw** » Fri Jan 30, 2026 12:33 pm

FireDragon761138 wrote: ↑Fri Jan 30, 2026 12:34 am
chrisw wrote: ↑Thu Jan 29, 2026 1:41 pm
FireDragon761138 wrote: ↑Thu Jan 29, 2026 1:27 pm
sscg13 wrote: ↑Thu Jan 29, 2026 8:32 am Even if you correctly implemented threat inputs, threat inputs need to be optimized well. There is also a fixed overhead to speed, meaning that it will only become better with large NNUE. Correspondingly, larger NNUE also requires longer training time.
We tested a 300 mb NNUE with threat inputs. The 300 MB NNUE was only marginally better than the 100 MB NNUE with threat inputs in terms of SPRT testing.

The main problem we had with threat inputs was the loss of engine evaluation stability. It was fairly dramatic.
We is normally used when there’s more than one worker. Unlikely any other worker would stick around for more than five minutes with the degree of narcissism on display here. We tested is therefore oxymoronic.
That assertion is untrue. There are two humans working on this project.

one teenager using ChatGPT and constructed persona as a mask.

Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability

Re: Threat inputs and engine eval stability