New sf+nnue play-only compiles

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

kranium
Posts: 2129
Joined: Thu May 29, 2008 10:43 am

New sf+nnue play-only compiles

Post by kranium »

Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: New sf+nnue play-only compiles

Post by Laskos »

kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested

Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.

Thanks!
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: New sf+nnue play-only compiles

Post by Modern Times »

Laskos wrote: Mon Jul 27, 2020 9:47 pm Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.

Thanks!
At standard chess yes probably, but these NNUE versions are substantially weaker than standard Stockfish at chess960 according to some limited tests I ran. Would be useful if someone else could confirm that.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: New sf+nnue play-only compiles

Post by Laskos »

Modern Times wrote: Mon Jul 27, 2020 10:07 pm
Laskos wrote: Mon Jul 27, 2020 9:47 pm Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.

Thanks!
At standard chess yes probably, but these NNUE versions are substantially weaker than standard Stockfish at chess960 according to some limited tests I ran. Would be useful if someone else could confirm that.
In Chess variants I checked, NNUE was always weaker than standard SF_dev, I am not surprised it's weaker in Chess960. But for every variant one can train a new network, which can be significantly stronger than SF_dev, as SF_dev was not specifically designed for the variant.
User avatar
Werner
Posts: 2871
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: New sf+nnue play-only compiles

Post by Werner »

Hi, thanks for the compiles. I just compared it on command window from startposition:
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
info depth 25 seldepth 39 multipv 1 score cp 15 nodes 9858990 nps 1144663 hashfull 1000 tbhits 0 time 8613 pv d2d4 g8f6
c2c4 e7e6 g1f3 d7d5 c4d5 e6d5 b1c3 f8e7 c1g5 c7c6 d1c2 g7g6 e2e3 c8f5 f1d3 f5d3 c2d3 a7a5 e1g1 e8g8 h2h3 b8d7 d3c2 f8e8
f1e1 b7b5 g5f4 d7b6 a2a3

and stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
info depth 25 seldepth 34 multipv 1 score cp 25 nodes 5863103 nps 1137583 hashfull 997 tbhits 0 time 5154 pv d2d4 g8f6 c
2c4 e7e6 g1f3 d7d5 c1g5 h7h6 g5f6 d8f6 e2e3 f8b4 b1c3 e8g8 a1c1 c7c6 d1b3 b4d6 f1d3 b8d7 e1g1 f8d8 c4d5 e6d5 e3e4 d5e4 c3e4 f6f4

Any idea what´s the reason for these differences?
Werner
kranium
Posts: 2129
Joined: Thu May 29, 2008 10:43 am

Re: New sf+nnue play-only compiles

Post by kranium »

Werner wrote: Mon Jul 27, 2020 10:44 pm Hi, thanks for the compiles. I just compared it on command window from startposition:
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
info depth 25 seldepth 39 multipv 1 score cp 15 nodes 9858990 nps 1144663 hashfull 1000 tbhits 0 time 8613 pv d2d4 g8f6
c2c4 e7e6 g1f3 d7d5 c4d5 e6d5 b1c3 f8e7 c1g5 c7c6 d1c2 g7g6 e2e3 c8f5 f1d3 f5d3 c2d3 a7a5 e1g1 e8g8 h2h3 b8d7 d3c2 f8e8
f1e1 b7b5 g5f4 d7b6 a2a3

and stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
info depth 25 seldepth 34 multipv 1 score cp 25 nodes 5863103 nps 1137583 hashfull 997 tbhits 0 time 5154 pv d2d4 g8f6 c
2c4 e7e6 g1f3 d7d5 c1g5 h7h6 g5f6 d8f6 e2e3 f8b4 b1c3 e8g8 a1c1 c7c6 d1b3 b4d6 f1d3 b8d7 e1g1 f8d8 c4d5 e6d5 e3e4 d5e4 c3e4 f6f4

Any idea what´s the reason for these differences?
I tried to document many of the incremental updates here:
https://github.com/FireFather/sf-nnue/releases

also, see original nodchhip repository (issues and PRs)

PS nodchip's compiles are not created via PGO...using it gives a nice boost

I think the binary can be even faster...
I expect the 1st official SF release to scream, those guys are really good at getting every tiny bit of speed out
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: New sf+nnue play-only compiles

Post by lkaufman »

Laskos wrote: Mon Jul 27, 2020 9:47 pm
kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested

Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.


Thanks!
I got +39 elo over recent SF (July 6) on 7 threads at 0.5' + 0.5" after 170 games. Maybe gains are less with more threads?
Komodo rules!
kranium
Posts: 2129
Joined: Thu May 29, 2008 10:43 am

Re: New sf+nnue play-only compiles

Post by kranium »

I think this is an important change (not yet merged in nodchip)
https://github.com/nodchip/Stockfish/issues/68

I suspect training would be improved...and better nets.
Last edited by kranium on Tue Jul 28, 2020 12:15 am, edited 2 times in total.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: New sf+nnue play-only compiles

Post by Laskos »

lkaufman wrote: Mon Jul 27, 2020 11:39 pm
Laskos wrote: Mon Jul 27, 2020 9:47 pm
kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested

Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.


Thanks!
I got +39 elo over recent SF (July 6) on 7 threads at 0.5' + 0.5" after 170 games. Maybe gains are less with more threads?
What is the draw rate in your case? Aside 1 thread ultra-fast TC, I am using unbalanced openings too, so my draw rate is about 35%. The discrepancy comes from these very different conditions and from just 170 games.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: New sf+nnue play-only compiles

Post by mwyoung »

lkaufman wrote: Mon Jul 27, 2020 11:39 pm
Laskos wrote: Mon Jul 27, 2020 9:47 pm
kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested

Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.


Thanks!
I got +39 elo over recent SF (July 6) on 7 threads at 0.5' + 0.5" after 170 games. Maybe gains are less with more threads?
I only got +20 elo playing 32 threads at 3m+2s on a 2950x at 4.1 Ghz. Still outstanding.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.