Progress of Stockfish in 6 days

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Progress of Stockfish in 6 days

Post by Laskos »

Another +13 Elo points in 2 days. After 8 days of gradual SF development, +48 Elo points gain.

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 23.583 sec)
Settings = Gauntlet/128MB/6000ms+100ms/M 700cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_80_100.epd(1749)
Time = 6044 sec elapsed, 0 sec remaining
 1.  SF NNUE 14 Aug              	569.0/1000	312-174-514  	(L: m=0 t=0 i=0 a=174)	(D: r=317 i=103 f=34 s=7 a=53)	(tpm=180.8 d=17.68 nps=1327473)
 2.  SF NNUE 06 Aug             	431.0/1000	174-312-514  	(L: m=2 t=0 i=0 a=310)	(D: r=317 i=103 f=34 s=7 a=53)	(tpm=182.7 d=17.34 nps=1215253)
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Progress of Stockfish in 6 days

Post by lkaufman »

mwyoung wrote: Fri Aug 14, 2020 3:48 pm
marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
I ran two tests so far on this. First I ran SFNNUE Aug 13 on one thread vs. final pre-NNUE SF (July 31) on four threads at 30" +.5" (at 5 GHz), and got a 56 to 54 win for NNUE. Then I quadrupled the time limit to 2' + 2" and got 53 to 51 for NNUE. So there doesn't appear to be a falling-off with more time, and at least based on one thread the 4x CPU claim appears to be true. Now it must be repeated with more threads, like 32 vs 8 or even 64 vs 16, to see if it is really generally valid. Perhaps you'd like to run this, if not I can.
Komodo rules!
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

Laskos wrote: Fri Aug 14, 2020 10:36 am -mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
I do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

lkaufman wrote: Fri Aug 14, 2020 4:32 pm
mwyoung wrote: Fri Aug 14, 2020 3:48 pm
marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
I ran two tests so far on this. First I ran SFNNUE Aug 13 on one thread vs. final pre-NNUE SF (July 31) on four threads at 30" +.5" (at 5 GHz), and got a 56 to 54 win for NNUE. Then I quadrupled the time limit to 2' + 2" and got 53 to 51 for NNUE. So there doesn't appear to be a falling-off with more time, and at least based on one thread the 4x CPU claim appears to be true. Now it must be repeated with more threads, like 32 vs 8 or even 64 vs 16, to see if it is really generally valid. Perhaps you'd like to run this, if not I can.
That why we need to test with more then one thread, and longer time controls. And yes, I have been running these test and will continue to test NNUE. Unlike some I do not average 23 seconds per game and test at 1 thread, and call it good. And say +100 Elo. The truth is more down to earth....


DESKTOP-CORSAIR, Rapid 30.0min+30.0sec 0

1 SF+NNUE PO 290720 x64 popc +17 +7/=95/-2 52.40% 54.5/104
2 Stockfish 170720 64 POPCNT -17 +2/=95/-7 47.60% 49.5/104
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

lkaufman wrote: Fri Aug 14, 2020 4:32 pm
mwyoung wrote: Fri Aug 14, 2020 3:48 pm
marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
Perhaps you'd like to run this, if not I can.
You should run your own testing. But just a reminder, if you are running 1 thread vs 2,3,4... threads. Or some kind of combination of x4 cores. You need to go into bios and lock all the cores to the same speed. Or the one core setup will be running much faster. Have you done this in the testing so far?

This is why I always run all testing with the speed locked on all cores. Because throttling is a bitch when testing chess engines!
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Progress of Stockfish in 6 days

Post by lkaufman »

mwyoung wrote: Fri Aug 14, 2020 6:25 pm
lkaufman wrote: Fri Aug 14, 2020 4:32 pm
mwyoung wrote: Fri Aug 14, 2020 3:48 pm
marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
Perhaps you'd like to run this, if not I can.
You should run your own testing. But just a reminder, if you are running 1 thread vs 2,3,4... threads. Or some kind of combination of x4 cores. You need to go into bios and lock all the cores to the same speed. Or the one core setup will be running much faster. Have you done this in the testing so far?

This is why I always run all testing with the speed locked on all cores. Because throttling is a bitch when testing chess engines!
No, I have never done that. I do test with hyperthreading off, which probably reduces the problem, but I suppose it is still an issue. I got an 89 to 81 score for NNUE vs final SF with 8 threads vs 2 at 30" + .5", better than my four vs one thread results, but I'll leave it to you to follow up with more threads if you wish in view of the throttling issue. Maybe it will turn out that quadruple CPU power is overstated, anyway it wasn't my statement.
Komodo rules!
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

Jouni wrote: Wed Aug 12, 2020 9:36 pm Yes SF NNUE is equal to quadruple your CPU cores for free. Incredible :!: :!: .
Update 1....


DESKTOP-CORSAIR, Rapid 15.0min+15.0sec 0


1 Stockfish 140820+NNUE +0/=21/-0 50.00% 10.5/21 110.25
2 Stockfish 140820 +0/=21/-0 50.00% 10.5/21 110.25

Live Stream:
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Progress of Stockfish in 6 days

Post by Milos »

mwyoung wrote: Fri Aug 14, 2020 4:44 pm
Laskos wrote: Fri Aug 14, 2020 10:36 am -mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
I do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.
The only thing I noticed with your "tests" is that you get higher draw rate than a typical correspondence chess match of today. ;)
That really makes them super uninteresting for anything.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

Milos wrote: Mon Aug 17, 2020 12:41 am
mwyoung wrote: Fri Aug 14, 2020 4:44 pm
Laskos wrote: Fri Aug 14, 2020 10:36 am -mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
I do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.
The only thing I noticed with your "tests" is that you get higher draw rate than a typical correspondence chess match of today. ;)
That really makes them super uninteresting for anything.
Testing very strong chess engines is like watching paint dry.. Unless you love it. I suggest you stick to TCEC.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

lkaufman wrote: Thu Aug 13, 2020 7:22 am
Jouni wrote: Wed Aug 12, 2020 9:36 pm Yes SF NNUE is equal to quadruple your CPU cores for free. Incredible :!: :!: .
I actually got a result that SFNNUE (a couple days ago) on one thread beat Stockfish 11 on seven threads, at 2' + 1", by 90 to 80! So you may be understating it!
Update 2...

DESKTOP-CORSAIR, Rapid 15.0min+15.0sec 0


1 Stockfish 140820+NNUE +12 +1/=28/-0 51.72% 15.0/29
2 Stockfish 140820 -12 +0/=28/-1 48.28% 14.0/29

Live Stream:
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.