Stockfish search

Discussion of chess software programming and technical issues.

Moderator: Ras

Werewolf
Posts: 1983
Joined: Thu Sep 18, 2008 10:24 pm

Stockfish search

Post by Werewolf »

I posted this in the main section, but realised it may be better to post it here where the experts hang out.

In all instances I'm referring to Stockfish's implementation of Lazy SMP.
Please could you have a go at the following "multiple choice" ? :D

1. Adding cores in a CPU to an engine ALWAYS increases search speed, providing that there's no loss in clock speed. This applies even if the new cores are slower than the old ones.
a) TRUE
b) FALSE
c) UNCLEAR

The newer CPUs with "efficiency" cores are in mind here.

2. Stockfish runs on two different PCs. In each case it searches at 5 MN/s. On the first PC it's running on one core, on the second 4 cores.
a) Stockfish's search is roughly as fast on both machines
b) Stockfish's search is much faster on the first PC

3. Jack has a 16 core PC. He runs Stockfish on 8 cores and then later that day on all 16 cores. He notices an improvement in both nps and time to depth. Jack is trying to decide which one is more indicative of the speed-up.
a) Time to depth
b) NPS
c) They both need to be considered.

4. (Related to Q3) As thread count increases Stockfish's search "thickens".
a) TRUE
b) FALSE
Top
AndrewGrant
Posts: 1952
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Stockfish search

Post by AndrewGrant »

I assume your term "Search Speed" refers to Strength? Assuming so...

[1]: Adding additional cores will always increase the NPS in a good engine on a good CPU. There is no reason why additional cores would _lose_ elo. In the past, it has been shown that even going from 192 cores to 384 threads, gains elo.

[2]: You would rather have a CPU that is 4x as fast, than you would have a CPU with 4x as many cores. The PC running at 5MN/S on one core will be significantly stronger than the PC running at 5MN/S on a collective 4 cores.

[3]: Hyperthreads appear to gain elo. This has been show in a large number of tests in the pre-NNUE era. I don't think there is much reason to assume this has changed. At least, for a time, top engines gained elo. This might ultimately be an engine specific thing, but I would argue that if an engine fails to gain elo on hyperthreads, then that is a short-coming of the engine.

[4]: LazySMP seems to not change the time to depth with additional cores. This means that you are either searching the exact same nodes, or "thickening" the search aka "widening". Widening is indeed the case, to my understanding.
Werewolf
Posts: 1983
Joined: Thu Sep 18, 2008 10:24 pm

Re: Stockfish search

Post by Werewolf »

AndrewGrant wrote: Sat Mar 26, 2022 4:17 pm I assume your term "Search Speed" refers to Strength? Assuming so...

[1]: Adding additional cores will always increase the NPS in a good engine on a good CPU. There is no reason why additional cores would _lose_ elo. In the past, it has been shown that even going from 192 cores to 384 threads, gains elo.

[2]: You would rather have a CPU that is 4x as fast, than you would have a CPU with 4x as many cores. The PC running at 5MN/S on one core will be significantly stronger than the PC running at 5MN/S on a collective 4 cores.

[3]: Hyperthreads appear to gain elo. This has been show in a large number of tests in the pre-NNUE era. I don't think there is much reason to assume this has changed. At least, for a time, top engines gained elo. This might ultimately be an engine specific thing, but I would argue that if an engine fails to gain elo on hyperthreads, then that is a short-coming of the engine.

[4]: LazySMP seems to not change the time to depth with additional cores. This means that you are either searching the exact same nodes, or "thickening" the search aka "widening". Widening is indeed the case, to my understanding.

Thanks for taking the time to reply Andrew!

1) Great.
2) Do you have any idea of how much faster the 1 core machine is? In the days of Rybka they used the formula N^0.76 (N=core count) to estimate speedup. But people were saying that very recently adding cores doesn't add much elo. However, I was wondering if that is just because elo is hard to come by at Stockfish's level, as opposed to a fundamental search scaling problem.
3) OK. On my Threadripper 64 cores, running on 60 threads is faster than 120 threads. But that may be because the clockspeed drops on 120 threads.
4) I see. So on massive hardware a depth of 34 would count for more than a depth of 34 on a few cores?

Thanks again.
Modern Times
Posts: 3693
Joined: Thu Jun 07, 2012 11:02 pm

Re: Stockfish search

Post by Modern Times »

In terms of engines gaining or losing Elo from hyperthreading - does the SMP methodology make a difference ? Is the answer different for an engine that uses YBWC to one that uses Lazy SMP ?
Werewolf
Posts: 1983
Joined: Thu Sep 18, 2008 10:24 pm

Re: Stockfish search

Post by Werewolf »

Modern Times wrote: Sat Mar 26, 2022 6:34 pm In terms of engines gaining or losing Elo from hyperthreading - does the SMP methodology make a difference ? Is the answer different for an engine that uses YBWC to one that uses Lazy SMP ?
I'm also interested in this. I clearly remember Rybka and (I think) the early versions of Houdini coming with the instruction to switch off HT. Both of those were YBWC. Maybe Houdini 5/6 were Lazy SMP.
JohnWoe
Posts: 529
Joined: Sat Mar 02, 2013 11:31 pm

Re: Stockfish search

Post by JohnWoe »

Multi-threaded search YBW young brothers wait and etc ... are copied from Crafty.
In Minimax-engines like SF. Multithreading works by populating TT by launching threads in different depths.

Minimax algorithm is hopelessly serial. All kind of ADHOC solutions exist to "parallelize".

In SF set TT to 0MB and threads to 32. And you will only get high 32 x NPS and 0 Elo.
Werewolf
Posts: 1983
Joined: Thu Sep 18, 2008 10:24 pm

Re: Stockfish search

Post by Werewolf »

This is up to date and from the Komodo website:

For best performance it is very important to correctly set the Threads value. The default is 1. You should usually set Threads to the number of "real" cores on your machine, except as noted below. Consult your computer manufacturer to determine how many cpu cores your machine has (not to be confused with the number of cpu threads your machine has). We recommend running Dragon with Hyperthreading turned off on your computer, although this is debatable and may depend on your hardware; it is pretty clear that Hyperthreading should be off on machines with many cores, but for machines with no more than four or perhaps six cores it’s unclear.
Jouni
Posts: 3611
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Stockfish search

Post by Jouni »

Has there ever been REAL hyperthreading test? You need 2 identical PCs one with HT=on and another with HT=off! In single PC You cannot test HT on/off at same time. Right?
Jouni
Werewolf
Posts: 1983
Joined: Thu Sep 18, 2008 10:24 pm

Re: Stockfish search

Post by Werewolf »

Jouni wrote: Tue Mar 29, 2022 1:13 pm Has there ever been REAL hyperthreading test? You need 2 identical PCs one with HT=on and another with HT=off! In single PC You cannot test HT on/off at same time. Right?
Yes agreed. It is alarming this still isn't clearly known though.

If one is testing for elo 2 PCs are essential. If "lesser tests" are admissable, namely: solving a tactical test suite, or nps, or time to depth, I guess one PC is enough but no one seems to know how much weight we can put on these tests.
abulmo2
Posts: 460
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: Stockfish search

Post by abulmo2 »

Jouni wrote: Tue Mar 29, 2022 1:13 pm Has there ever been REAL hyperthreading test? You need 2 identical PCs one with HT=on and another with HT=off! In single PC You cannot test HT on/off at same time. Right?
On a single PC you can select the cores you are running an engine on. I can do this with Amoeba:
Here the cpu monitoring of Amoeba running on 8 real cores:

Code: Select all

    | Mperf              || Idle_Stats
 CPU| C0   | Cx   | Freq  || POLL | C1   | C2
   0| 99,26|  0,74|  3798||  0,00|  0,00|  0,00
   8|  3,06| 96,94|  3797||  0,00| 11,54| 85,65
   1| 99,27|  0,73|  3747||  0,00|  0,00|  0,00
   9|  1,69| 98,31|  3662||  0,02| 50,50| 48,35
   2| 99,27|  0,73|  3795||  0,00|  0,00|  0,00
  10|  6,18| 93,82|  3789||  0,00|  4,00| 89,85
   3| 99,27|  0,73|  3795||  0,00|  0,00|  0,00
  11|  1,80| 98,20|  3780||  0,03| 53,91| 44,90
   4| 99,27|  0,73|  3792||  0,00|  0,00|  0,00
  12|  2,13| 97,87|  3789||  0,00|  1,67| 96,23
   5| 99,27|  0,73|  3798||  0,00|  0,00|  0,00
  13|  1,51| 98,49|  3798||  0,00|  4,37| 94,17
   6| 99,27|  0,73|  3796||  0,00|  0,00|  0,00
  14|  1,03| 98,97|  3791||  0,00|  1,17| 97,83
   7| 99,27|  0,73|  3716||  0,00|  0,00|  0,00
  15|  0,94| 99,06|  3606||  0,00|  6,64| 92,50
and here on all 16 cores (real + virtual)

Code: Select all

    | Mperf              || Idle_Stats
 CPU| C0   | Cx   | Freq  || POLL | C1   | C2
   0| 99,93|  0,07|  3798||  0,00|  0,00|  0,00
   8| 99,94|  0,06|  3798||  0,00|  0,00|  0,00
   1| 99,93|  0,07|  3796||  0,00|  0,00|  0,00
   9| 99,94|  0,06|  3796||  0,00|  0,00|  0,00
   2| 99,94|  0,06|  3797||  0,00|  0,00|  0,00
  10| 99,94|  0,06|  3797||  0,00|  0,00|  0,00
   3| 99,93|  0,07|  3798||  0,00|  0,00|  0,00
  11| 99,92|  0,08|  3798||  0,00|  0,00|  0,00
   4| 99,94|  0,06|  3798||  0,00|  0,00|  0,00
  12| 99,94|  0,06|  3798||  0,00|  0,00|  0,00
   5| 99,94|  0,06|  3798||  0,00|  0,00|  0,00
  13| 99,94|  0,06|  3798||  0,00|  0,00|  0,00
   6| 99,94|  0,06|  3798||  0,00|  0,00|  0,00
  14| 99,94|  0,06|  3798||  0,00|  0,00|  0,00
   7| 99,94|  0,06|  3798||  0,00|  0,00|  0,00
  15| 99,94|  0,06|  3798||  0,00|  0,00|  0,00
I do not know if the result is comparable when using 2 computers with HT disable or enable at the bios level, but I hope it is quite close.

I am running a 3'+1" tournament right now to see if I can see something for Amoeba. The result won't be generalisable to Stockfish or other stronger engine though.
Richard Delorme