(Why) Is hyperthreading bad for chess engines?
Moderators: hgm, Rebel, chrisw
-
- Posts: 646
- Joined: Wed Jun 18, 2014 2:30 pm
- Full name: Fahad Syed
(Why) Is hyperthreading bad for chess engines?
Is it? If so, why?
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: (Why) Is hyperthreading bad for chess engines?
Because some guys say so. At 30% NPS speed-up from 4 physical cores to 8 logical cores HT, it will bring benefits. On the other hand, overclock achievable frequency on 8 logical cores is lower. If you are an overclocker, then use only physical cores and get the highest stable frequency. If not, use all logical cores HT.vittyvirus wrote:Is it? If so, why?
HT is also great using 4 threads for heavy chess duties (on 4 physical cores machine), and the rest on internet and such lowly CPU consuming crap. Your tests will be fine, although some overly cautious guys say otherwise.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: (Why) Is hyperthreading bad for chess engines?
Laskos wrote:Because some guys say so. At 30% NPS speed-up from 4 physical cores to 8 logical cores HT, it will bring benefits. On the other hand, overclock achievable frequency on 8 logical cores is lower. If you are an overclocker, then use only physical cores and get the highest stable frequency. If not, use all logical cores HT.vittyvirus wrote:Is it? If so, why?
HT is also great using 4 threads for heavy chess duties (on 4 physical cores machine), and the rest on internet and such lowly CPU consuming crap. Your tests will be fine, although some overly cautious guys say otherwise.
It is not "just because some guys say so." It is "because many guys have actually measured this carefully."
A SMP search absolutely introduces overhead, there is no way around it. Hyper-threading will improve NPS most of the time (but not all of the time, a very small program might show zero benefit). SO the question is, which is bigger, SMP search overhead or hyper threading NPS speedup. The general answer is SMP overhead is larger, which means the overhead outweighs the gain. If someone can figure out a way to solve the roughly 30% overhead for each thread added, then this might have a chance. But overhead is a direct result of move ordering, and getting the fail-high on the first move 90% of the time above that 90% range is not exactly easy. By far the best comparison might be to try 4 threads at speed N, and 8 threads at speed N/2. It becomes pretty obvious which is better on average.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: (Why) Is hyperthreading bad for chess engines?
I tested on my fried by now system _extensively_ Houdini 3. On _strength_. Without overclocking issue, my system _benefited_ from HT, having a 30-32% speed-up in NPS. R. Houdart also said that overhead from 4 to 8 threads is about 20%, so a 30% NPS speed-up is theoretically beneficial.bob wrote:Laskos wrote:Because some guys say so. At 30% NPS speed-up from 4 physical cores to 8 logical cores HT, it will bring benefits. On the other hand, overclock achievable frequency on 8 logical cores is lower. If you are an overclocker, then use only physical cores and get the highest stable frequency. If not, use all logical cores HT.vittyvirus wrote:Is it? If so, why?
HT is also great using 4 threads for heavy chess duties (on 4 physical cores machine), and the rest on internet and such lowly CPU consuming crap. Your tests will be fine, although some overly cautious guys say otherwise.
It is not "just because some guys say so." It is "because many guys have actually measured this carefully."
A SMP search absolutely introduces overhead, there is no way around it. Hyper-threading will improve NPS most of the time (but not all of the time, a very small program might show zero benefit). SO the question is, which is bigger, SMP search overhead or hyper threading NPS speedup. The general answer is SMP overhead is larger, which means the overhead outweighs the gain. If someone can figure out a way to solve the roughly 30% overhead for each thread added, then this might have a chance. But overhead is a direct result of move ordering, and getting the fail-high on the first move 90% of the time above that 90% range is not exactly easy. By far the best comparison might be to try 4 threads at speed N, and 8 threads at speed N/2. It becomes pretty obvious which is better on average.
I knew Bob will come to say something in his usual way.
-
- Posts: 5563
- Joined: Tue Feb 28, 2012 11:56 pm
Re: (Why) Is hyperthreading bad for chess engines?
If so, it is because the benefit in terms of higher nps does not outweigh the loss in search efficiency due to a higher number of threads (i.e. more nodes needed to search to the same depth).vittyvirus wrote:Is it? If so, why?
In some cases the benefit may well outweigh the cost. The benefit will depend on hardware, engine, maybe even hash size and type of position. The cost will depend on the engine and maybe hash size and type of position. The OS might make a difference as well.
Then again, the benefit from HT increase as the ratio cpu speed / memory speed increases. So there are lots of variables to consider.Kai wrote:On the other hand, overclock achievable frequency on 8 logical cores is lower.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: (Why) Is hyperthreading bad for chess engines?
One thing more, time control. 8 threads kick slower than 4 threads, so testing at ultra-fast TC is useless. Luckily, TTD is valid for Houdini, so it's not very hard to see which is better (take 1,000 positions one minute each, means adequate depth, in one day you are done with testing). A strength SPRT test at reasonable to long TC will take much longer, the benefit of HT on my system of H3 1GB Hash was only about 10-15 Elo points.syzygy wrote:If so, it is because the benefit in terms of higher nps does not outweigh the loss in search efficiency due to a higher number of threads (i.e. more nodes needed to search to the same depth).vittyvirus wrote:Is it? If so, why?
In some cases the benefit may well outweigh the cost. The benefit will depend on hardware, engine, maybe even hash size and type of position. The cost will depend on the engine and maybe hash size and type of position. The OS might make a difference as well.
Then again, the benefit from HT increase as the ratio cpu speed / memory speed increases. So there are lots of variables to consider.Kai wrote:On the other hand, overclock achievable frequency on 8 logical cores is lower.
-
- Posts: 5563
- Joined: Tue Feb 28, 2012 11:56 pm
Re: (Why) Is hyperthreading bad for chess engines?
Yes, good point.Laskos wrote:One thing more, time control. 8 threads kick slower than 4 threads, so testing at ultra-fast TC is useless.
-
- Posts: 3018
- Joined: Thu Mar 09, 2006 11:58 am
- Location: Antalya/Turkey
Re: (Why) Is hyperthreading bad for chess engines?
Hello Syed,vittyvirus wrote:Is it? If so, why?
A long time ago,
I opened a similar thread regarding Hyper-Threading, for more details:
http://www.talkchess.com/forum/viewtopi ... ight=sedat
Btw, I have tested HT OFF and HT ON,
And I noticed the both engines are almost equal in strength:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Houdini 2.0c Pro x64 6c 3423 19 18 1008 70% 3261 42%
2 Houdini 2.0c Pro x64 12t 3421 16 16 1399 70% 3281 38%
https://sites.google.com/site/computers ... ct-auto232
Best,
Sedat
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: (Why) Is hyperthreading bad for chess engines?
How many games have you played? I've played MILLIONS testing hyper-threading since this comes up so often. Best I have ever seen, is a break-even. UNLESS a program is very poorly optimized and makes excessive memory accesses.Laskos wrote:I tested on my fried by now system _extensively_ Houdini 3. On _strength_. Without overclocking issue, my system _benefited_ from HT, having a 30-32% speed-up in NPS. R. Houdart also said that overhead from 4 to 8 threads is about 20%, so a 30% NPS speed-up is theoretically beneficial.bob wrote:Laskos wrote:Because some guys say so. At 30% NPS speed-up from 4 physical cores to 8 logical cores HT, it will bring benefits. On the other hand, overclock achievable frequency on 8 logical cores is lower. If you are an overclocker, then use only physical cores and get the highest stable frequency. If not, use all logical cores HT.vittyvirus wrote:Is it? If so, why?
HT is also great using 4 threads for heavy chess duties (on 4 physical cores machine), and the rest on internet and such lowly CPU consuming crap. Your tests will be fine, although some overly cautious guys say otherwise.
It is not "just because some guys say so." It is "because many guys have actually measured this carefully."
A SMP search absolutely introduces overhead, there is no way around it. Hyper-threading will improve NPS most of the time (but not all of the time, a very small program might show zero benefit). SO the question is, which is bigger, SMP search overhead or hyper threading NPS speedup. The general answer is SMP overhead is larger, which means the overhead outweighs the gain. If someone can figure out a way to solve the roughly 30% overhead for each thread added, then this might have a chance. But overhead is a direct result of move ordering, and getting the fail-high on the first move 90% of the time above that 90% range is not exactly easy. By far the best comparison might be to try 4 threads at speed N, and 8 threads at speed N/2. It becomes pretty obvious which is better on average.
I knew Bob will come to say something in his usual way.
I just ran a quick test on my iMac, a quad-core I7. I picked a couple of positions and ran em several times with 4 threads and then with 8.
NPS numbers were 23.5M with 4 threads and 28.0M with 8 threads, averaged over several runs and a couple of positions. An improvement of 4.5M/23.5M = 19%. The size of the trees, all searched to the same depth was 1.01B nodes for 4 threads, 1.26B nodes for 8 threads. Tree size increased by .25/1.01B = 25%, which is a slight loss.
Rather than taking Houdart's number, why don't you get your own? All you have to do is start up houdini and have it search the same position using N real cores and then 2N hyper threaded cores. See what kind of REAL speedup you see in terms of NPS, and then in terms of tree growth. And if the growth in the tree is smaller than the NPS improvement, I would agree that it looks pretty good for that program.
However, another point of experience. If NPS and overhead are identical, I would ALWAYS choose the N thread rather than 2N thread option, because 2N threads has a much higher variance in time and tree size, which is not a particularly good thing.
And please don't start on the "widening" red herring argument.
All it takes to get a better gain from hyper-threading is to be sloppy on memory accesses, don't group variables by temporal locality, don't strive for sequential accesses when possible, etc. Then a thread within the CPU stalls waiting on cache misses, and the other thread actually does useful work, and the NPS climbs. I have spent a lot of time working on locality issues. And on a new I7 with 4 cores, I don't see that 30% number. I'd be concerned if I did as the bigger the HT gain, the more stall issues a program has in a single thread, something that is always bad and which can usually at least be mitigated.
When you test using hyper-threading and only measure speed, you can't claim that hyper-threading is better in the general sense. You might have proved that the SMP overhead is smaller in Houdini than in most. Or you might have proved that the memory access locality is not done very well and hyper-threading is correcting some of that. If you just allow one degree of freedom you can measure it. Fixed depth will show perfectly the tree space growth. NPS shows how the actual program scales ignoring SMP growth. Then you can come much closer to figuring out why HT helps or hurts and why.