Stockfish NNUE and testsuites

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Jouni
Posts: 2432
Joined: Wed Mar 08, 2006 7:15 pm

Stockfish NNUE and testsuites

Post by Jouni » Wed Jul 29, 2020 2:28 pm

This Stockfish was wonderful surprise! Lczero speed in my GPU was 10-20 positions/s and suddenly You have NN engine with 5 Mn/s without hardware update :D . Clearly it's better than SF dev in playing, but how about testsuites? After doing a lot tests with different nets I think it's equal to SF dev. With same search it can't be better? Yes it solves some classic positions very fast, but there seems to be more easy one, which remain unsolved. One example: Arasan test suite with 3 minutes/4 cores. SF dev got 185/200 (80 min) and NNUE 178 (95m). NNUE is good reminder, that testsuites are useless. Even 60 ELO can't detect.
Jouni

User avatar
Leto
Posts: 2052
Joined: Thu May 04, 2006 1:40 am
Location: Dune

Re: Stockfish NNUE and testsuites

Post by Leto » Wed Jul 29, 2020 3:07 pm

Or perhaps that's an indication that the testsuite is flawed.

Vinvin
Posts: 4949
Joined: Thu Mar 09, 2006 8:40 am
Full name: Vincent Lejeune

Re: Stockfish NNUE and testsuites

Post by Vinvin » Wed Jul 29, 2020 3:46 pm

As I pointed here : viewtopic.php?p=853414#p853414
Stockfish-NNUE is faster than all the A/B engines for 14 positions of the Hard-Talkchess-2020 set :

Code: Select all

 85) h5-h6       : 3   seconds
 97) .. Qh5-f5   : 12  seconds
100) a5-a6       : 1   seconds
133) Bh5-f3      : 38  seconds
139) .. Ne8-c7   : 6   seconds
146) Rd1-d8      : 0   seconds
155) Bd3-g6      : 7   seconds
160) Ne4-g5      : 0   seconds
184) .. Bf5-g4   : 123 seconds
185) .. Kg8-g7   : 1   seconds
186) Rf3-f6      : 4   seconds
189) Qf3xf4      : 4   seconds
193) .. c3xb2    : 0   seconds
197) .. Qb2-c2   : 25  seconds 
All positions are here : viewtopic.php?p=827135#p827135

Jouni
Posts: 2432
Joined: Wed Mar 08, 2006 7:15 pm

Re: Stockfish NNUE and testsuites

Post by Jouni » Fri Jul 31, 2020 1:41 pm

But there is one exception. TTT1 at http://dorszcz.blogspot.com/p/ttt1.html. SF dev solved in my test 34/100, but SF NNUE 63/100 :!: :) .
Jouni

Jouni
Posts: 2432
Joined: Wed Mar 08, 2006 7:15 pm

Re: Stockfish NNUE and testsuites

Post by Jouni » Mon Aug 10, 2020 6:33 pm

After testing a lot of different nets my conclusion. SF NNUE is about 100 ELO better than SF11, but this is not visible in any "standard" testsuite I use! In same suites like Arasan SF NNUE is worse than handcrafted SF. 2 possible reasons : 1) testsuites are mostly useless and 2) SF NNUE is satisfied to find winning move even if it's not the best move?
Jouni

dkappe
Posts: 901
Joined: Tue Aug 21, 2018 5:52 pm
Full name: Dietrich Kappe

Re: Stockfish NNUE and testsuites

Post by dkappe » Mon Aug 10, 2020 6:51 pm

Jouni wrote:
Mon Aug 10, 2020 6:33 pm
After testing a lot of different nets my conclusion. SF NNUE is about 100 ELO better than SF11, but this is not visible in any "standard" testsuite I use! In same suites like Arasan SF NNUE is worse than handcrafted SF. 2 possible reasons : 1) testsuites are mostly useless and 2) SF NNUE is satisfied to find winning move even if it's not the best move?
You should try some of the other nets: Toga III, Frosty, LizardFish, Night Nurse. They, especially NiNu, may give you a different opinion, especially when avoiding the hybrid mod.

peter
Posts: 2269
Joined: Sat Feb 16, 2008 6:38 am
Full name: Peter Martan

Re: Stockfish NNUE and testsuites

Post by peter » Mon Aug 10, 2020 6:59 pm

Jouni wrote:
Mon Aug 10, 2020 6:33 pm
After testing a lot of different nets my conclusion. SF NNUE is about 100 ELO better than SF11, but this is not visible in any "standard" testsuite I use! In same suites like Arasan SF NNUE is worse than handcrafted SF. 2 possible reasons : 1) testsuites are mostly useless and 2) SF NNUE is satisfied to find winning move even if it's not the best move?
How many games against which opponent- pool do you need to get your 100 Elo difference, Jouni?
Or do you (as at the moment is usus) simply take selfplay-Elo for granted?
Matches against SF 11 or against SF dev. before NNUE are to me somewhat like "advanced selfplay", isnt' it?

You can combine several test suites, if a single one isn't enough statistical significance for you, like e.g Ferdinand Mosca does:

viewtopic.php?p=854457#p854457

Add HTC, Eret and maybe STS too, I guess you'll still be faster in expressing statistical signifikant differences then in ordinary rating- list- Elo with standard hardware- TC, openings and a mixed pool of opponents, of course including some LC0- like engines too.

Error is always to see necessity to convert differences in test suites to "standard" Elo, whatever this term might mean nowadays. Who compares Elo of human players to Eng-Eng-Elo or to corr.- chess Elo?
Are you sure you can reproduce and compare your 100 Elo at TCEC? With an error- bar smaller than the performance- difference? Or come into an confidence- interval of 95% to any standard rating- list- Elo with your estimated 100?

Testsuite- differences are measurements of their own like selfplay- Elo or TCEC- Elo or rating- list- Elo are. One thing is true for all of these measurements: playing strength is always position- dependent, Eng-Eng- Matches are nothing else but played out testsuites neither, testsuites of (opening- ) test positions as well as other testsuites.
If you let play out from very short openings (of some reason for normal chess) or the starting position itself bookless, draw- death of modern computerchess will kill your 100 Elo from any hardware- TC of modern hardware and let's say 30'+5" upwards even at advanced selfplay, so what?
Peter.

Dann Corbit
Posts: 12168
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Stockfish NNUE and testsuites

Post by Dann Corbit » Tue Aug 11, 2020 12:11 am

Jouni wrote:
Mon Aug 10, 2020 6:33 pm
After testing a lot of different nets my conclusion. SF NNUE is about 100 ELO better than SF11, but this is not visible in any "standard" testsuite I use! In same suites like Arasan SF NNUE is worse than handcrafted SF. 2 possible reasons : 1) testsuites are mostly useless and 2) SF NNUE is satisfied to find winning move even if it's not the best move?
For most test suites, the new SF will play exactly like old SF.
That is because as soon as the score becomes unbalanced (usually right off the bat for a test suite) the eval used is SF alpha-beta.
I guess if you run the special nnue-only SF builds, it will clobber most of the test suites because it uses NNUE all the time.
https://github.com/joergoster/Stockfish-NNUE
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

RogerC
Posts: 35
Joined: Tue Oct 29, 2019 7:33 pm
Location: French Polynesia
Full name: Roger C.

Re: Stockfish NNUE and testsuites

Post by RogerC » Tue Aug 11, 2020 12:57 am

Hi,

5 positions and best moves for testing NNUE nets :

q7/4P3/8/6pk/1Q1Bn1b1/8/2r3PK/3R4 w - - 0 1 bm Qb7
2q1k3/1Npp2K1/1pP2P2/3Pp3/8/8/3P1P2/8 w - - 0 1 bm Nd6+
rn1qrnk1/p4pp1/1p1pp3/6P1/2Pp1PN1/2PQ4/P5P1/2KR3R w - - 0 1 bm Nh6+
4k3/4Pp2/1P1p1P1P/pPpPpK2/pr2pbP1/7r/3RP3/NN5b w - - 0 1 bm Rb2
7q/6pk/1R6/5K1p/3B2Pp/7P/3P4/8 w - - 0 1 bm Rh6+

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 8:00 pm

Re: Stockfish NNUE and testsuites

Post by mwyoung » Tue Aug 11, 2020 4:09 am

Jouni wrote:
Mon Aug 10, 2020 6:33 pm
SF NNUE is about 100 ELO better than SF11, but this is not visible in any "standard" testsuite I use!
:lol:

You need to test SF NNUE in real games. It is not even close to 100 Elo better! Unless you play 1 core, and ultra fast time controls. SF NNUE does not scale "very well".
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.

Post Reply