
Stockfish NNUE and testsuites
Moderator: Ras
-
- Posts: 3611
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Stockfish NNUE and testsuites
This Stockfish was wonderful surprise! Lczero speed in my GPU was 10-20 positions/s and suddenly You have NN engine with 5 Mn/s without hardware update
. Clearly it's better than SF dev in playing, but how about testsuites? After doing a lot tests with different nets I think it's equal to SF dev. With same search it can't be better? Yes it solves some classic positions very fast, but there seems to be more easy one, which remain unsolved. One example: Arasan test suite with 3 minutes/4 cores. SF dev got 185/200 (80 min) and NNUE 178 (95m). NNUE is good reminder, that testsuites are useless. Even 60 ELO can't detect.

Jouni
-
- Posts: 2071
- Joined: Thu May 04, 2006 3:40 am
- Location: Dune
Re: Stockfish NNUE and testsuites
Or perhaps that's an indication that the testsuite is flawed.
-
- Posts: 5284
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Stockfish NNUE and testsuites
As I pointed here : http://talkchess.com/forum3/viewtopic.p ... 14#p853414
Stockfish-NNUE is faster than all the A/B engines for 14 positions of the Hard-Talkchess-2020 set :
All positions are here : http://talkchess.com/forum3/viewtopic.p ... 35#p827135
Stockfish-NNUE is faster than all the A/B engines for 14 positions of the Hard-Talkchess-2020 set :
Code: Select all
85) h5-h6 : 3 seconds
97) .. Qh5-f5 : 12 seconds
100) a5-a6 : 1 seconds
133) Bh5-f3 : 38 seconds
139) .. Ne8-c7 : 6 seconds
146) Rd1-d8 : 0 seconds
155) Bd3-g6 : 7 seconds
160) Ne4-g5 : 0 seconds
184) .. Bf5-g4 : 123 seconds
185) .. Kg8-g7 : 1 seconds
186) Rf3-f6 : 4 seconds
189) Qf3xf4 : 4 seconds
193) .. c3xb2 : 0 seconds
197) .. Qb2-c2 : 25 seconds
-
- Posts: 3611
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Stockfish NNUE and testsuites
But there is one exception. TTT1 at http://dorszcz.blogspot.com/p/ttt1.html. SF dev solved in my test 34/100, but SF NNUE 63/100
.


Jouni
-
- Posts: 3611
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Stockfish NNUE and testsuites
After testing a lot of different nets my conclusion. SF NNUE is about 100 ELO better than SF11, but this is not visible in any "standard" testsuite I use! In same suites like Arasan SF NNUE is worse than handcrafted SF. 2 possible reasons : 1) testsuites are mostly useless and 2) SF NNUE is satisfied to find winning move even if it's not the best move?
Jouni
-
- Posts: 1632
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: Stockfish NNUE and testsuites
You should try some of the other nets: Toga III, Frosty, LizardFish, Night Nurse. They, especially NiNu, may give you a different opinion, especially when avoiding the hybrid mod.Jouni wrote: ↑Mon Aug 10, 2020 8:33 pm After testing a lot of different nets my conclusion. SF NNUE is about 100 ELO better than SF11, but this is not visible in any "standard" testsuite I use! In same suites like Arasan SF NNUE is worse than handcrafted SF. 2 possible reasons : 1) testsuites are mostly useless and 2) SF NNUE is satisfied to find winning move even if it's not the best move?
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
-
- Posts: 3361
- Joined: Sat Feb 16, 2008 7:38 am
- Full name: Peter Martan
Re: Stockfish NNUE and testsuites
How many games against which opponent- pool do you need to get your 100 Elo difference, Jouni?Jouni wrote: ↑Mon Aug 10, 2020 8:33 pm After testing a lot of different nets my conclusion. SF NNUE is about 100 ELO better than SF11, but this is not visible in any "standard" testsuite I use! In same suites like Arasan SF NNUE is worse than handcrafted SF. 2 possible reasons : 1) testsuites are mostly useless and 2) SF NNUE is satisfied to find winning move even if it's not the best move?
Or do you (as at the moment is usus) simply take selfplay-Elo for granted?
Matches against SF 11 or against SF dev. before NNUE are to me somewhat like "advanced selfplay", isnt' it?
You can combine several test suites, if a single one isn't enough statistical significance for you, like e.g Ferdinand Mosca does:
http://talkchess.com/forum3/viewtopic.p ... 57#p854457
Add HTC, Eret and maybe STS too, I guess you'll still be faster in expressing statistical signifikant differences then in ordinary rating- list- Elo with standard hardware- TC, openings and a mixed pool of opponents, of course including some LC0- like engines too.
Error is always to see necessity to convert differences in test suites to "standard" Elo, whatever this term might mean nowadays. Who compares Elo of human players to Eng-Eng-Elo or to corr.- chess Elo?
Are you sure you can reproduce and compare your 100 Elo at TCEC? With an error- bar smaller than the performance- difference? Or come into an confidence- interval of 95% to any standard rating- list- Elo with your estimated 100?
Testsuite- differences are measurements of their own like selfplay- Elo or TCEC- Elo or rating- list- Elo are. One thing is true for all of these measurements: playing strength is always position- dependent, Eng-Eng- Matches are nothing else but played out testsuites neither, testsuites of (opening- ) test positions as well as other testsuites.
If you let play out from very short openings (of some reason for normal chess) or the starting position itself bookless, draw- death of modern computerchess will kill your 100 Elo from any hardware- TC of modern hardware and let's say 30'+5" upwards even at advanced selfplay, so what?
Peter.
-
- Posts: 12768
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Stockfish NNUE and testsuites
For most test suites, the new SF will play exactly like old SF.Jouni wrote: ↑Mon Aug 10, 2020 8:33 pm After testing a lot of different nets my conclusion. SF NNUE is about 100 ELO better than SF11, but this is not visible in any "standard" testsuite I use! In same suites like Arasan SF NNUE is worse than handcrafted SF. 2 possible reasons : 1) testsuites are mostly useless and 2) SF NNUE is satisfied to find winning move even if it's not the best move?
That is because as soon as the score becomes unbalanced (usually right off the bat for a test suite) the eval used is SF alpha-beta.
I guess if you run the special nnue-only SF builds, it will clobber most of the test suites because it uses NNUE all the time.
https://github.com/joergoster/Stockfish-NNUE
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 41
- Joined: Tue Oct 29, 2019 8:33 pm
- Location: French Polynesia
- Full name: Roger C.
Re: Stockfish NNUE and testsuites
Hi,
5 positions and best moves for testing NNUE nets :
q7/4P3/8/6pk/1Q1Bn1b1/8/2r3PK/3R4 w - - 0 1 bm Qb7
2q1k3/1Npp2K1/1pP2P2/3Pp3/8/8/3P1P2/8 w - - 0 1 bm Nd6+
rn1qrnk1/p4pp1/1p1pp3/6P1/2Pp1PN1/2PQ4/P5P1/2KR3R w - - 0 1 bm Nh6+
4k3/4Pp2/1P1p1P1P/pPpPpK2/pr2pbP1/7r/3RP3/NN5b w - - 0 1 bm Rb2
7q/6pk/1R6/5K1p/3B2Pp/7P/3P4/8 w - - 0 1 bm Rh6+
5 positions and best moves for testing NNUE nets :
q7/4P3/8/6pk/1Q1Bn1b1/8/2r3PK/3R4 w - - 0 1 bm Qb7
2q1k3/1Npp2K1/1pP2P2/3Pp3/8/8/3P1P2/8 w - - 0 1 bm Nd6+
rn1qrnk1/p4pp1/1p1pp3/6P1/2Pp1PN1/2PQ4/P5P1/2KR3R w - - 0 1 bm Nh6+
4k3/4Pp2/1P1p1P1P/pPpPpK2/pr2pbP1/7r/3RP3/NN5b w - - 0 1 bm Rb2
7q/6pk/1R6/5K1p/3B2Pp/7P/3P4/8 w - - 0 1 bm Rh6+
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: Stockfish NNUE and testsuites

You need to test SF NNUE in real games. It is not even close to 100 Elo better! Unless you play 1 core, and ultra fast time controls. SF NNUE does not scale "very well".
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.