Page 1 of 1

Diminishing returns and hyperthreading

Posted: Tue Dec 27, 2016 4:21 pm
by Laskos
From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.

Re: Diminishing returns and hyperthreading

Posted: Tue Dec 27, 2016 4:47 pm
by lkaufman
Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
On a typical i7 with four cores and eight threads, would you expect that best results would be with six, seven, or eight threads?

Re: Diminishing returns and hyperthreading

Posted: Tue Dec 27, 2016 4:58 pm
by Laskos
lkaufman wrote:
Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
On a typical i7 with four cores and eight threads, would you expect that best results would be with six, seven, or eight threads?
6-7, but the benefit compared to 8 threads is not outside error marigns (still, it seems plausible).

Re: Diminishing returns and hyperthreading

Posted: Tue Dec 27, 2016 5:49 pm
by Milos
Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
What you write is very true. Just 2 corrections. When using HT, the amount of cache per thread is effectively divided per 2 rendering hash access slower and higher interthread overhead compared to case with equivalent number of real cores. This has a consequence of reduced efficiency in case of HT, so 95.5% efficiency might actually be overestimation.
Second thing is related to your test. When HT is on, performance of 4 real cores is slightly lower compared to performance on 4 real cores when HT is off. So test results have merit but only relative as a demonstration of diminishing benefit of HT with increased number of cores, but doesn't really mean than 8 HT is stronger than 4 real cores. If you tested on 2 separate machines HT vs. no-HT, that 7Elo advantage might very well turn out negative.

Re: Diminishing returns and hyperthreading

Posted: Tue Dec 27, 2016 6:23 pm
by lkaufman
Laskos wrote:
lkaufman wrote:
Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
On a typical i7 with four cores and eight threads, would you expect that best results would be with six, seven, or eight threads?
6-7, but the benefit compared to 8 threads is not outside error marigns (still, it seems plausible).
Thanks. I've been using six threads on my quad for chess analysis. Even if it's not stronger than four threads with hyperthreading off, it at least avoids the need to turn it on or off depending on intended use.

Re: Diminishing returns and hyperthreading

Posted: Tue Dec 27, 2016 11:16 pm
by Laskos
Milos wrote:
Laskos wrote:From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25


Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

NPS 1 core 1 thread:
n/s: 1.708.948
NPS 1 core 2 threads:
n/s: 2.132.251
NPS speed-up: 1.25

NPS 4 cores 4 threads:
n/s: 6.828.883
NPS 4 cores 8 threads:
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
             in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03 
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

Score of SF8 2 threads vs SF8 1 thread: 1073 - 871 - 3056  [0.520] 5000
ELO difference: 14.04 +/- 5.99
Finished match

Code: Select all

Score of SF8 8 threads vs SF8 4 threads: 851 - 751 - 3398  [0.510] 5000
ELO difference: 6.95 +/- 5.44
Finished match
The test confirms the above on low number of cores.
What you write is very true. Just 2 corrections. When using HT, the amount of cache per thread is effectively divided per 2 rendering hash access slower and higher interthread overhead compared to case with equivalent number of real cores. This has a consequence of reduced efficiency in case of HT, so 95.5% efficiency might actually be overestimation.
Second thing is related to your test. When HT is on, performance of 4 real cores is slightly lower compared to performance on 4 real cores when HT is off. So test results have merit but only relative as a demonstration of diminishing benefit of HT with increased number of cores, but doesn't really mean than 8 HT is stronger than 4 real cores. If you tested on 2 separate machines HT vs. no-HT, that 7Elo advantage might very well turn out negative.
I assummed here that if I set affinities to logical cores correctly, I get the same result. I set them, I don t know how to do such tests otherwise. 2 identical machines are needed if there is a difference.

Re: Diminishing returns and hyperthreading

Posted: Thu Dec 29, 2016 2:37 pm
by Laskos
Laskos wrote:
Milos wrote: What you write is very true. Just 2 corrections. When using HT, the amount of cache per thread is effectively divided per 2 rendering hash access slower and higher interthread overhead compared to case with equivalent number of real cores. This has a consequence of reduced efficiency in case of HT, so 95.5% efficiency might actually be overestimation.
Second thing is related to your test. When HT is on, performance of 4 real cores is slightly lower compared to performance on 4 real cores when HT is off. So test results have merit but only relative as a demonstration of diminishing benefit of HT with increased number of cores, but doesn't really mean than 8 HT is stronger than 4 real cores. If you tested on 2 separate machines HT vs. no-HT, that 7Elo advantage might very well turn out negative.
I assummed here that if I set affinities to logical cores correctly, I get the same result. I set them, I don t know how to do such tests otherwise. 2 identical machines are needed if there is a difference.
I checked NPS with HT OFF in BIOS on those same 150 positions. They are just a bit larger, by 0.5-1%, but within statistical error margins from setting affinities with HT ON. The results, especially those "theoretical" ones from the table on doublings gain, seem to stand as they are, maybe 1% off.