Diminishing returns and hyperthreading

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
Laskos
Posts: 9476
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Diminishing returns and hyperthreading

Post by Laskos » Tue Dec 27, 2016 3:21 pm

From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.

lkaufman
Posts: 3734
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Diminishing returns and hyperthreading

Post by lkaufman » Tue Dec 27, 2016 3:47 pm

Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
On a typical i7 with four cores and eight threads, would you expect that best results would be with six, seven, or eight threads?
Komodo rules!

User avatar
Laskos
Posts: 9476
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Diminishing returns and hyperthreading

Post by Laskos » Tue Dec 27, 2016 3:58 pm

lkaufman wrote:
Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
On a typical i7 with four cores and eight threads, would you expect that best results would be with six, seven, or eight threads?
6-7, but the benefit compared to 8 threads is not outside error marigns (still, it seems plausible).

Milos
Posts: 3387
Joined: Wed Nov 25, 2009 12:47 am

Re: Diminishing returns and hyperthreading

Post by Milos » Tue Dec 27, 2016 4:49 pm

Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
What you write is very true. Just 2 corrections. When using HT, the amount of cache per thread is effectively divided per 2 rendering hash access slower and higher interthread overhead compared to case with equivalent number of real cores. This has a consequence of reduced efficiency in case of HT, so 95.5% efficiency might actually be overestimation.
Second thing is related to your test. When HT is on, performance of 4 real cores is slightly lower compared to performance on 4 real cores when HT is off. So test results have merit but only relative as a demonstration of diminishing benefit of HT with increased number of cores, but doesn't really mean than 8 HT is stronger than 4 real cores. If you tested on 2 separate machines HT vs. no-HT, that 7Elo advantage might very well turn out negative.

lkaufman
Posts: 3734
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Diminishing returns and hyperthreading

Post by lkaufman » Tue Dec 27, 2016 5:23 pm

Laskos wrote:
lkaufman wrote:
Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
On a typical i7 with four cores and eight threads, would you expect that best results would be with six, seven, or eight threads?
6-7, but the benefit compared to 8 threads is not outside error marigns (still, it seems plausible).
Thanks. I've been using six threads on my quad for chess analysis. Even if it's not stronger than four threads with hyperthreading off, it at least avoids the need to turn it on or off depending on intended use.
Komodo rules!

User avatar
Laskos
Posts: 9476
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Diminishing returns and hyperthreading

Post by Laskos » Tue Dec 27, 2016 10:16 pm

Milos wrote:
Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
What you write is very true. Just 2 corrections. When using HT, the amount of cache per thread is effectively divided per 2 rendering hash access slower and higher interthread overhead compared to case with equivalent number of real cores. This has a consequence of reduced efficiency in case of HT, so 95.5% efficiency might actually be overestimation.
Second thing is related to your test. When HT is on, performance of 4 real cores is slightly lower compared to performance on 4 real cores when HT is off. So test results have merit but only relative as a demonstration of diminishing benefit of HT with increased number of cores, but doesn't really mean than 8 HT is stronger than 4 real cores. If you tested on 2 separate machines HT vs. no-HT, that 7Elo advantage might very well turn out negative.
I assummed here that if I set affinities to logical cores correctly, I get the same result. I set them, I don t know how to do such tests otherwise. 2 identical machines are needed if there is a difference.

User avatar
Laskos
Posts: 9476
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Diminishing returns and hyperthreading

Post by Laskos » Thu Dec 29, 2016 1:37 pm

Laskos wrote:
Milos wrote: What you write is very true. Just 2 corrections. When using HT, the amount of cache per thread is effectively divided per 2 rendering hash access slower and higher interthread overhead compared to case with equivalent number of real cores. This has a consequence of reduced efficiency in case of HT, so 95.5% efficiency might actually be overestimation.
Second thing is related to your test. When HT is on, performance of 4 real cores is slightly lower compared to performance on 4 real cores when HT is off. So test results have merit but only relative as a demonstration of diminishing benefit of HT with increased number of cores, but doesn't really mean than 8 HT is stronger than 4 real cores. If you tested on 2 separate machines HT vs. no-HT, that 7Elo advantage might very well turn out negative.
I assummed here that if I set affinities to logical cores correctly, I get the same result. I set them, I don t know how to do such tests otherwise. 2 identical machines are needed if there is a difference.
I checked NPS with HT OFF in BIOS on those same 150 positions. They are just a bit larger, by 0.5-1%, but within statistical error margins from setting affinities with HT ON. The results, especially those "theoretical" ones from the table on doublings gain, seem to stand as they are, maybe 1% off.

Post Reply