Conditions:
Hardware: Dual AMD Opteron 6376, 32 x 2.3 GHz (Turbo Core off)
OS: Windows 7 Pro 64-Bit
GUI: no
Settings: all engines default settings
Large Tables: no
Position: starting position
Time: 20 seconds
UCI commands:
setoption name threads value 1 (to 32)
go movetime 20000
The tests were run in console mode.
Here the values from 1 to 32 threads, starting position, with 20 seconds of computing time.
nps = nodes per second
Komodo, Houdini and Zappa are almost equal up to 16 threads (factor 11.94 - 11.37 - 11.34).
Stockfish DD and also the latest Stockfish version lies somewhat behind (factor 8.01 - 9.79).
Komodo scales still excellent beyond 16 threads. Also Zappa shows a very good SMP implementation.
Beyond 16 threads Houdini and Stockfish DD benefit much less than the other tested engines.
Increase from 16 to 32 threads:
Komodo TCECr (11,94 - 20,60 = 73%)
Zappa Mexico II (11,37 - 16,46 = 45%)
Stockfish 140513 ( 9,79 - 14,21 = 45%)
Stockfish DD ( 8,01 - 10,18 = 27%)
Houdini 4 Pro (11,34 - 13,48 = 19%)
Threads factor: Komodo, Houdini, Stockfish and Zappa
Moderators: hgm, Rebel, chrisw
-
- Posts: 818
- Joined: Mon Aug 19, 2013 6:57 pm
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Threads factor: Komodo, Houdini, Stockfish and Zappa
Your Stockfish data seem remarkably monotone as a function of the number of threads. The factor increases each time you increment the number of threads (except once, when the number of threads goes from 21 to 22). Are you recording an average of multiple runs for each number of threads? If so, how many runs with each threads setting?
I ask because these are fixed-time searches, so essentially you are recording the total number of nodes searched in the twenty seconds. I'd imagine that this would vary quite a lot from run to run, even with no change in the threads setting. The trees searched might differ significantly, with even the best move changing from search to search. For example, here are two consecutive Stockfish searches, each done with 16 threads. Look at how much they differ:
I ask because these are fixed-time searches, so essentially you are recording the total number of nodes searched in the twenty seconds. I'd imagine that this would vary quite a lot from run to run, even with no change in the threads setting. The trees searched might differ significantly, with even the best move changing from search to search. For example, here are two consecutive Stockfish searches, each done with 16 threads. Look at how much they differ:
Code: Select all
info depth 24 seldepth 36 score cp 22 nodes 151240061 nps 7561246 time 20002 multipv 1 pv e2e4 c7c5 b1c3 d7d6 g1f3 e7e5 f1c4 f8e7 a2a3 g8f6 e1g1 e8g8 b2b4 b8d7 d2d3 a7a6 c1d2 b7b5 c4d5 f6d5 c3d5 c8b7 b4c5 d7c5 d2a5 d8d7 d5b6 d7d8
info nodes 151240061 time 20002
bestmove e2e4 ponder c7c5
Code: Select all
info depth 25 seldepth 33 score cp 29 nodes 182162118 nps 9106739 time 20003 multipv 1 pv d2d4 d7d5 c2c4 e7e6 g1f3 d5c4 b1c3 b8c6 e2e4
info nodes 182162118 time 20003
bestmove d2d4 ponder d7d5
-
- Posts: 818
- Joined: Mon Aug 19, 2013 6:57 pm
Re: Threads factor: Komodo, Houdini, Stockfish and Zappa
Yes.Are you recording an average of multiple runs for each number of
threads?
5 runs per thread setting, overall 800 runs!If so, how many runs with each threads setting?
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Threads factor: Komodo, Houdini, Stockfish and Zappa
Thanks Andreas. Your posts are always interesting, especially this one!
* Impressive scaling by Komodo
* Great improvement thanks to Joona's "late join" patch
* Impressive scaling by Komodo
* Great improvement thanks to Joona's "late join" patch
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 10296
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Threads factor: Komodo, Houdini, Stockfish and Zappa
Interesting information but the target of chess programs is not to search more nodes but to earn playing strength.
Nodes are not proportional to playing strength and I guess that for the same engine,
the same number of nodes with 1 thread is better than the same number of nodes with many threads.
Nodes are not proportional to playing strength and I guess that for the same engine,
the same number of nodes with 1 thread is better than the same number of nodes with many threads.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Threads factor: Komodo, Houdini, Stockfish and Zappa
These are NPS. Hard to tell strength-wise, or effective speed-up. Time to depth (TTD) won't help too much either, as even SF with Joona's patch widens a bit, without talking of Komodo.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Threads factor: Komodo, Houdini, Stockfish and Zappa
Good point. TTD would be a better measure than NPS. The ideal measure ie ELO but it's extremely costly to calculate with good enough precision.Uri Blass wrote:Interesting information but the target of chess programs is not to search more nodes but to earn playing strength.
Nodes are not proportional to playing strength and I guess that for the same engine,
the same number of nodes with 1 thread is better than the same number of nodes with many threads.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Threads factor: Komodo, Houdini, Stockfish and Zappa
This is of course true, but it does show that SF and H4 quite likely still have room for improvement here.Uri Blass wrote:Interesting information but the target of chess programs is not to search more nodes but to earn playing strength.
Nodes are not proportional to playing strength and I guess that for the same engine,
the same number of nodes with 1 thread is better than the same number of nodes with many threads.
An interesting question is whether Komdo's smp implementation is comparable at all with that of Zappa, SF and H4 (which are all YBWC tree splitters with some further refinements). As Richard Vida mentioned on the fishcooking forum, it might be that Komodo uses a "lazy smp"-like approach:
http://talkchess.com/forum/viewtopic.php?t=46858
http://talkchess.com/forum/viewtopic.ph ... 350#504350
-
- Posts: 265
- Joined: Sat Feb 22, 2014 8:37 pm
Re: Threads factor: Komodo, Houdini, Stockfish and Zappa
I think it would be interesting to repeat the exact same test with a different FEN, particularly an end game FEN.
I expect Komodo to earn a lot (in the TCEC I have seen it having 56 Mnps in end game, compared to 16 Mnps in early game. So surely, more cores = better) while SF DD having a totally different graph (passes from 16 Mnps in early game up to 7 Mnps in end game. A quad core is faster than 16 cores. So more cores = worse performance.).
It would be interesting to see how the newer SF dev versions are doing compared to SF DD.
As a side-note Critter had a pentium 4 performance in end game running on 16 cores: 750 kN/s.
I expect Komodo to earn a lot (in the TCEC I have seen it having 56 Mnps in end game, compared to 16 Mnps in early game. So surely, more cores = better) while SF DD having a totally different graph (passes from 16 Mnps in early game up to 7 Mnps in end game. A quad core is faster than 16 cores. So more cores = worse performance.).
It would be interesting to see how the newer SF dev versions are doing compared to SF DD.
As a side-note Critter had a pentium 4 performance in end game running on 16 cores: 750 kN/s.
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Threads factor: Komodo, Houdini, Stockfish and Zappa
But that is not the point of the experiment. This tells us about the upper limit of scalability, which is useful to know. In addition, it tells us how that upper limits suffers from addition of cores. For instance, Houdini starts to have problems after exactly 16 cores. Before that, it is among the best.Uri Blass wrote:Interesting information but the target of chess programs is not to search more nodes but to earn playing strength.
Nodes are not proportional to playing strength and I guess that for the same engine,
the same number of nodes with 1 thread is better than the same number of nodes with many threads.
Miguel