The thread test was supplemented by the latest versions of Stockfish and Komodo.
In Stockfish 5 we see an improvement in the SMP implementation at 8 threads compared to the previous version.
The patch which was added since Stockfish DD and causes this improvement is called "late-join".
Reference: https://github.com/mcostalba/Stockfish/ ... d80168c705
Under these test conditions the doubling of threads from 8 to 16 shows no improvement, even not for Stockfish 5.
This is different with Komodo. Here a continuous increase up to 16 threads is measurable.
Moreover, the SMP implementation in Komodo 8 has been improved again.
Here the data of the test and the graphical presentation, see also: http://www.fastgm.de/threads2.html including the test conditions.
Threads test incl. Stockfish 5 and Komodo 8
Moderators: hgm, Rebel, chrisw
-
- Posts: 3291
- Joined: Wed Mar 08, 2006 8:15 pm
Re: Threads test incl. Stockfish 5 and Komodo 8
Very interesting. So Komodo is clear favorite in TCEC with 16 threads, if there is no improvements in SMP implemantation in latest SF!
Jouni
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Threads test incl. Stockfish 5 and Komodo 8
I wonder if increasing Min Split Depth from its default value (of 7) when using 16 threads would result in some improvement.fastgm wrote: Under these test conditions the doubling of threads from 8 to 16 shows no improvement, even not for Stockfish 5.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Threads test incl. Stockfish 5 and Komodo 8
Do they use the crafty-like "minimum thread group" or something similar to limit how many threads work at one split point? That will make a difference on machines with more cores.zullil wrote:I wonder if increasing Min Split Depth from its default value (of 7) when using 16 threads would result in some improvement.fastgm wrote: Under these test conditions the doubling of threads from 8 to 16 shows no improvement, even not for Stockfish 5.
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Threads test incl. Stockfish 5 and Komodo 8
Once there was a UCI optionbob wrote:Do they use the crafty-like "minimum thread group" or something similar to limit how many threads work at one split point? That will make a difference on machines with more cores.zullil wrote:I wonder if increasing Min Split Depth from its default value (of 7) when using 16 threads would result in some improvement.fastgm wrote: Under these test conditions the doubling of threads from 8 to 16 shows no improvement, even not for Stockfish 5.
Code: Select all
option name Max Threads per Split Point type spin default 5 min 4 max 8
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Threads test incl. Stockfish 5 and Komodo 8
Thank you, Andreas, very important tests. I imagine that testing engines on 16 threads is time consuming as hell.
You bust to pieces two of Bob Hyatt loud claims:
1) That Komodo implementation of SMP is "buggy", "quick and dirty", and so on. It seems one of the best to 16 threads.
2) That formula for SMP improvement withe the number of cores is linear. He gave "his" mastermind formula (N-1)*0.7 + 1, IIRC. Nothing linear here, not for a single engine.
When I will reply to Bob, I will quote your results.
Thanks again.
You bust to pieces two of Bob Hyatt loud claims:
1) That Komodo implementation of SMP is "buggy", "quick and dirty", and so on. It seems one of the best to 16 threads.
2) That formula for SMP improvement withe the number of cores is linear. He gave "his" mastermind formula (N-1)*0.7 + 1, IIRC. Nothing linear here, not for a single engine.
When I will reply to Bob, I will quote your results.
Thanks again.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Threads test incl. Stockfish 5 and Komodo 8
Didn't bust a THING I said. I said "if a search "widens the tree", which is YOUR term, that it has a bug that can be fixed." I said nothing more or nothing less. If a program plays stronger using N cpus to a fixed depth than it plays using one CPU, it has a bug that can be fixed to improve the performance of the single-thread version. That YOU don't understand that simple statement doesn't mean you can "shoot it to pieces". It just shows YOU do not understand the issues of parallel search. This ranks right up there with the super-linear speedup nonsense that comes up on occasion. It does NOT happen unless the sequential program has a problem that can be fixed. period.Laskos wrote:Thank you, Andreas, very important tests. I imagine that testing engines on 16 threads is time consuming as hell.
You bust to pieces two of Bob Hyatt loud claims:
1) That Komodo implementation of SMP is "buggy", "quick and dirty", and so on. It seems one of the best to 16 threads.
2) That formula for SMP improvement withe the number of cores is linear. He gave "his" mastermind formula (N-1)*0.7 + 1, IIRC. Nothing linear here, not for a single engine.
When I will reply to Bob, I will quote your results.
Thanks again.
And will you please stop misquoting what I said about that speedup formula. I did NOT say it was a highly accurate fit to the observed data. I said it was a fairly accurate estimate that is quite easy for anyone to compute. Nothing more, nothing less. And it is pretty accurate through 16 cores for sure, and even beyond but with less testing data to support it. When you grow up and learn to read, you might understand the term "linear approximation" or "simple approximation" etc.
If you look back through old CCC archives, you can see ANOTHER discussion about this formula. Martin Fierz took a bunch of 1/2/4/8 core test data I ran for him and compared it to my formula. His discovery was that my formula was too pessimistic. But I didn't develop it to be optimistic or pessimistic. Just something that approximates the speedup for a rough estimate.
please...
And for the record, my approximation had NOTHING to do with predicting Elo. Just raw SMP speedup measured time to depth.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Threads test incl. Stockfish 5 and Komodo 8
Sure, besides your ad hominem, then:bob wrote:Didn't bust a THING I said. I said "if a search "widens the tree", which is YOUR term, that it has a bug that can be fixed." I said nothing more or nothing less. If a program plays stronger using N cpus to a fixed depth than it plays using one CPU, it has a bug that can be fixed to improve the performance of the single-thread version. That YOU don't understand that simple statement doesn't mean you can "shoot it to pieces". It just shows YOU do not understand the issues of parallel search. This ranks right up there with the super-linear speedup nonsense that comes up on occasion. It does NOT happen unless the sequential program has a problem that can be fixed. period.Laskos wrote:Thank you, Andreas, very important tests. I imagine that testing engines on 16 threads is time consuming as hell.
You bust to pieces two of Bob Hyatt loud claims:
1) That Komodo implementation of SMP is "buggy", "quick and dirty", and so on. It seems one of the best to 16 threads.
2) That formula for SMP improvement withe the number of cores is linear. He gave "his" mastermind formula (N-1)*0.7 + 1, IIRC. Nothing linear here, not for a single engine.
When I will reply to Bob, I will quote your results.
Thanks again.
And will you please stop misquoting what I said about that speedup formula. I did NOT say it was a highly accurate fit to the observed data. I said it was a fairly accurate estimate that is quite easy for anyone to compute. Nothing more, nothing less. And it is pretty accurate through 16 cores for sure, and even beyond but with less testing data to support it. When you grow up and learn to read, you might understand the term "linear approximation" or "simple approximation" etc.
If you look back through old CCC archives, you can see ANOTHER discussion about this formula. Martin Fierz took a bunch of 1/2/4/8 core test data I ran for him and compared it to my formula. His discovery was that my formula was too pessimistic. But I didn't develop it to be optimistic or pessimistic. Just something that approximates the speedup for a rough estimate.
please...
And for the record, my approximation had NOTHING to do with predicting Elo. Just raw SMP speedup measured time to depth.
Imagine how strong single-core Komodo 8 would be "fixing the bug".
That you fitted your linear approximation of effective sped-up to 3 data points is showing that overfitting is all what you learnt at your PhD.
In the future, please don't come with (N-1)*0.7 + 1 crap to teach others on how SMP behaves.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Threads test incl. Stockfish 5 and Komodo 8
Sorry, I fitted my simple linear approximation to MUCH more than just three data points. 1,2,3,4,5,6,7,8 cpus, plus, since then, ditto for 1-12.Laskos wrote:Sure, besides your ad hominem, then:bob wrote:Didn't bust a THING I said. I said "if a search "widens the tree", which is YOUR term, that it has a bug that can be fixed." I said nothing more or nothing less. If a program plays stronger using N cpus to a fixed depth than it plays using one CPU, it has a bug that can be fixed to improve the performance of the single-thread version. That YOU don't understand that simple statement doesn't mean you can "shoot it to pieces". It just shows YOU do not understand the issues of parallel search. This ranks right up there with the super-linear speedup nonsense that comes up on occasion. It does NOT happen unless the sequential program has a problem that can be fixed. period.Laskos wrote:Thank you, Andreas, very important tests. I imagine that testing engines on 16 threads is time consuming as hell.
You bust to pieces two of Bob Hyatt loud claims:
1) That Komodo implementation of SMP is "buggy", "quick and dirty", and so on. It seems one of the best to 16 threads.
2) That formula for SMP improvement withe the number of cores is linear. He gave "his" mastermind formula (N-1)*0.7 + 1, IIRC. Nothing linear here, not for a single engine.
When I will reply to Bob, I will quote your results.
Thanks again.
And will you please stop misquoting what I said about that speedup formula. I did NOT say it was a highly accurate fit to the observed data. I said it was a fairly accurate estimate that is quite easy for anyone to compute. Nothing more, nothing less. And it is pretty accurate through 16 cores for sure, and even beyond but with less testing data to support it. When you grow up and learn to read, you might understand the term "linear approximation" or "simple approximation" etc.
If you look back through old CCC archives, you can see ANOTHER discussion about this formula. Martin Fierz took a bunch of 1/2/4/8 core test data I ran for him and compared it to my formula. His discovery was that my formula was too pessimistic. But I didn't develop it to be optimistic or pessimistic. Just something that approximates the speedup for a rough estimate.
please...
And for the record, my approximation had NOTHING to do with predicting Elo. Just raw SMP speedup measured time to depth.
Imagine how strong single-core Komodo 8 would be "fixing the bug".
That you fitted your linear approximation of effective sped-up to 3 data points is showing that overfitting is all what you learnt at your PhD.
In the future, please don't come with (N-1)*0.7 + 1 crap to teach others on how SMP behaves.
For 1-8 that linear approximation is somewhat LOW. I gave you a hint on finding the data that someone ELSE used to compute actual speedup numbers. My formula suggests 3.1 for 4 cpus. The actual data showed 3.4, for example.
Not crap at all. Just a linear approximation to something that is not quite linear.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Threads test incl. Stockfish 5 and Komodo 8
BTW if you don't want me to reply to YOUR posts, politely stop mentioning my name. You invited me to comment because of your idiotic comment.
While you are looking around, look up the definition of "approximation" and "exact" and discover the differences...
While you are looking around, look up the definition of "approximation" and "exact" and discover the differences...