New sf+nnue play-only compiles

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: New sf+nnue play-only compiles

Post by lkaufman »

Laskos wrote: Tue Jul 28, 2020 12:14 am
lkaufman wrote: Mon Jul 27, 2020 11:39 pm
Laskos wrote: Mon Jul 27, 2020 9:47 pm
kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested

Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.


Thanks!
I got +39 elo over recent SF (July 6) on 7 threads at 0.5' + 0.5" after 170 games. Maybe gains are less with more threads?
What is the draw rate in your case? Aside 1 thread ultra-fast TC, I am using unbalanced openings too, so my draw rate is about 35%. The discrepancy comes from these very different conditions and from just 170 games.
Yes, despite using Hert low-draw book and fast tc, I got 70% draws on 7 fast threads (final result +42 elo /176 games). I switched to same test on one thread and have about 56% draws, + 58 elo after 273 games. I suppose this means that the win to loss ratio is fairly constant, but the draws get out of hand quite quickly.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: New sf+nnue play-only compiles

Post by lkaufman »

lkaufman wrote: Tue Jul 28, 2020 1:15 am
Laskos wrote: Tue Jul 28, 2020 12:14 am
lkaufman wrote: Mon Jul 27, 2020 11:39 pm
Laskos wrote: Mon Jul 27, 2020 9:47 pm
kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested

Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.


Thanks!
I got +39 elo over recent SF (July 6) on 7 threads at 0.5' + 0.5" after 170 games. Maybe gains are less with more threads?
What is the draw rate in your case? Aside 1 thread ultra-fast TC, I am using unbalanced openings too, so my draw rate is about 35%. The discrepancy comes from these very different conditions and from just 170 games.
Yes, despite using Hert low-draw book and fast tc, I got 70% draws on 7 fast threads (final result +42 elo /176 games). I switched to same test on one thread and have about 56% draws, + 58 elo after 273 games. I suppose this means that the win to loss ratio is fairly constant, but the draws get out of hand quite quickly.
Final result on one thread dropped to +44 elo after 500 games, 55% draws.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: New sf+nnue play-only compiles

Post by Laskos »

lkaufman wrote: Tue Jul 28, 2020 3:01 am
lkaufman wrote: Tue Jul 28, 2020 1:15 am
Laskos wrote: Tue Jul 28, 2020 12:14 am
lkaufman wrote: Mon Jul 27, 2020 11:39 pm
Laskos wrote: Mon Jul 27, 2020 9:47 pm
kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested

Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.


Thanks!
I got +39 elo over recent SF (July 6) on 7 threads at 0.5' + 0.5" after 170 games. Maybe gains are less with more threads?
What is the draw rate in your case? Aside 1 thread ultra-fast TC, I am using unbalanced openings too, so my draw rate is about 35%. The discrepancy comes from these very different conditions and from just 170 games.
Yes, despite using Hert low-draw book and fast tc, I got 70% draws on 7 fast threads (final result +42 elo /176 games). I switched to same test on one thread and have about 56% draws, + 58 elo after 273 games. I suppose this means that the win to loss ratio is fairly constant, but the draws get out of hand quite quickly.
Final result on one thread dropped to +44 elo after 500 games, 55% draws.
Not sure our discrepancies can be explained by TC and number of threads. Not sure about Hert book, I am using 2-move and 3-move unbalanced openings. With the fastest "kranium" compile and best SV net (one of this morning) I am getting about 100 Elo points difference compared to SF_dev in ultra-fast games. Quite a difference compared to your result.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: New sf+nnue play-only compiles

Post by Zenmastur »

kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested
On threadripper I got a HUGE regression in speed when using either the POPC or AVX2 version listed here vice using stockfish.avx2.no-nnue.nnue-gen-sfen-from-original-eval.2020-07-19.exe. I'm getting 900K to 1.2Mnps vice 2Mnps on a single core.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: New sf+nnue play-only compiles

Post by Raphexon »

Zenmastur wrote: Tue Jul 28, 2020 9:39 am
kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested
On threadripper I got a HUGE regression in speed when using either the POPC or AVX2 version listed here vice using stockfish.avx2.no-nnue.nnue-gen-sfen-from-original-eval.2020-07-19.exe. I'm getting 900K to 1.2Mnps vice 2Mnps on a single core.

Regards,

Zenmastur
Because you are comparing regular SF's speed with the NNUE.

no-nnue means that there is no NNUE present.
nnue-gen-sfen-from-original-eval means it can generate training data and that it will generate it from SF's original eval.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: New sf+nnue play-only compiles

Post by Rebel »

kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system
Tested it, the usual 2000 games at 40m/20s, got the best result so far.
sergio-2344 57.2% with the old executable
Sergio-2344 59.3% with Norman's compile

Will test this net on CCRL blitz level, curious how it scales, 1000 games.
90% of coding is debugging, the other 10% is writing bugs.
kranium
Posts: 2129
Joined: Thu May 29, 2008 10:43 am

Re: New sf+nnue play-only compiles

Post by kranium »

Just to clarify any confusion in this thread...

We're seeing different Elo results presented
Those are of course the results of 'nnue' vs sf-dev test, and are very dependent on exactly which nn.bin is being tested
In the case of Larry and Mark, this detail is not indicated

This has much less to do with the quality of the compile (except for Ed's test of course), in which it's perfectly clear what's being tested.

My recommendation for a simple method of testing which compile is fastest on a particular system is:
1. make sure both binaries being compared load the same NNUE eval file (normally nn.bin)
2. type 'ucinewgame' and verify the nn.bin is found and loaded
(not needed for my recent compiles in which the nn.bin is loaded at startup)
3. run 'bench' at least twice for each compile, and average the result

(I know most here already know this, please don't be offended...I'm including it for clarity, and in case it helps someone).

Also, as Henk pointed out...to compare to a nodchip compile, make sure to select a 'nnue' version and make sure to type 'ucinewgame' to load the nn.bin before running 'bench'.

I believe Laskos 'nnue' vs sf-dev results with the newest SV net are significant...
I've been using ultra-fast for many years and have great confidence that ultra-fast results scale down in a meaningful way as TC increases.
At this point, I believe it's safe to say that nnue is pushing +60 elo or more? (Ed's results have it at +65)
which is fantastic considering it was at -50 to sf-dev in the very beginning, and just +30 just a short time ago
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: New sf+nnue play-only compiles

Post by lkaufman »

Laskos wrote: Tue Jul 28, 2020 9:26 am
lkaufman wrote: Tue Jul 28, 2020 3:01 am
lkaufman wrote: Tue Jul 28, 2020 1:15 am
Laskos wrote: Tue Jul 28, 2020 12:14 am
lkaufman wrote: Mon Jul 27, 2020 11:39 pm
Laskos wrote: Mon Jul 27, 2020 9:47 pm
kranium wrote: Mon Jul 27, 2020 5:57 pm Hi all-
I released new PO (play-only) compiles
which seem to be significantly faster
...more than 10% on my system

Intel® Core™ i9-9900K Processor

Bench was run just once, not averaged after several runs, so these are just estimates:

Code: Select all

PO
-------------------------
sf+nnue-po.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 2289
Nodes searched  : 3355738
Nodes/second    : 1466027

sf+nnue-po.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 2305
Nodes searched  : 3355738
Nodes/second    : 1452071

sf+nnue-po.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 2684
Nodes searched  : 3355738
Nodes/second    : 1247022
-------------------------

NODCHIP
stockfish.bmi2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2608
Nodes searched  : 4049933
Nodes/second    : 1552888

stockfish.avx2.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 2609
Nodes searched  : 4049933
Nodes/second    : 1552293

stockfish.sse42.halfkp_256x2-32-32.profile-nnue.2020-07-19.exe
Total time (ms) : 3041
Nodes searched  : 4049933
Nodes/second    : 1331776
-------------------------

AIO
sf+nnue-aio.270720.halfkp_256x2-32-32.x64.bmi2.exe
Total time (ms) : 3086
Nodes searched  : 4049933
Nodes/second    : 1312356

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.avx2.exe
Total time (ms) : 3124
Nodes searched  : 4049933
Nodes/second    : 1296393

sf+nnue-aio.270720.halfkp_256x2-32-32.x64.popc.exe
Total time (ms) : 3661
Nodes searched  : 4049933
Nodes/second    : 1106236
https://github.com/FireFather/sf-nnue/releases

I hope they prove fast on other systems as well...
PS they have not been widely tested

Wow, fast indeed, 15% or so faster than Nodchip compiles with BMI2. Now at short time control, SV nets with this compile are about 90 Elo points stronger than SF_dev.


Thanks!
I got +39 elo over recent SF (July 6) on 7 threads at 0.5' + 0.5" after 170 games. Maybe gains are less with more threads?
What is the draw rate in your case? Aside 1 thread ultra-fast TC, I am using unbalanced openings too, so my draw rate is about 35%. The discrepancy comes from these very different conditions and from just 170 games.
Yes, despite using Hert low-draw book and fast tc, I got 70% draws on 7 fast threads (final result +42 elo /176 games). I switched to same test on one thread and have about 56% draws, + 58 elo after 273 games. I suppose this means that the win to loss ratio is fairly constant, but the draws get out of hand quite quickly.
Final result on one thread dropped to +44 elo after 500 games, 55% draws.
Not sure our discrepancies can be explained by TC and number of threads. Not sure about Hert book, I am using 2-move and 3-move unbalanced openings. With the fastest "kranium" compile and best SV net (one of this morning) I am getting about 100 Elo points difference compared to SF_dev in ultra-fast games. Quite a difference compared to your result.
The Hert book prunes drawish lines, but is not "unbalanced", so that's one difference. I used the net that came with the download in this thread. Also I believe your tc is faster, so maybe all three factors together explain it.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: New sf+nnue play-only compiles

Post by lkaufman »

kranium wrote: Tue Jul 28, 2020 2:10 pm Just to clarify any confusion in this thread...

We're seeing different Elo results presented
Those are of course the results of 'nnue' vs sf-dev test, and are very dependent on exactly which nn.bin is being tested
In the case of Larry and Mark, this detail is not indicated

This has much less to do with the quality of the compile (except for Ed's test of course), in which it's perfectly clear what's being tested.

My recommendation for a simple method of testing which compile is fastest on a particular system is:
1. make sure both binaries being compared load the same NNUE eval file (normally nn.bin)
2. type 'ucinewgame' and verify the nn.bin is found and loaded
(not needed for my recent compiles in which the nn.bin is loaded at startup)
3. run 'bench' at least twice for each compile, and average the result

(I know most here already know this, please don't be offended...I'm including it for clarity, and in case it helps someone).

Also, as Henk pointed out...to compare to a nodchip compile, make sure to select a 'nnue' version and make sure to type 'ucinewgame' to load the nn.bin before running 'bench'.

I believe Laskos 'nnue' vs sf-dev results with the newest SV net are significant...
I've been using ultra-fast for many years and have great confidence that ultra-fast results scale down in a meaningful way as TC increases.
At this point, I believe it's safe to say that nnue is pushing +60 elo or more? (Ed's results have it at +65)
which is fantastic considering it was at -50 to sf-dev in the very beginning, and just +30 just a short time ago
With the latest net (1817) vs. latest SF (july 17) at same 0.5' + 0.5", one thread, Hert lowdraw book, I got 60.5 out of 100, +74 elo, so far.
Komodo rules!
User avatar
Thomas Lagershausen
Posts: 328
Joined: Mon Jun 11, 2007 6:59 pm

Re: New sf+nnue play-only compiles

Post by Thomas Lagershausen »

@Kranium

Do you know someone that can make a pgo compile for arm-8 on android?

The lto compile of petero2 is solid but it missing the performance and fire of the compiles for x-86 cpus.

Thx for your attention. :!:
TL