Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Posts: 9408
Joined: Wed Jul 26, 2006 8:21 pm

From this thread of Andreas Strangmüller on Stockfish 8 parallelization:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25

Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

n/s: 1.708.948
n/s: 2.132.251
NPS speed-up: 1.25

n/s: 6.828.883
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

``````Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
``````
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

``````Score of SF8 2 threads vs SF8 1 thread&#58; 1073 - 871 - 3056  &#91;0.520&#93; 5000
ELO difference&#58; 14.04 +/- 5.99
Finished match``````

Code: Select all

``````Score of SF8 8 threads vs SF8 4 threads&#58; 851 - 751 - 3398  &#91;0.510&#93; 5000
ELO difference&#58; 6.95 +/- 5.44
Finished match``````
The test confirms the above on low number of cores.

lkaufman
Posts: 3671
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

### Re: Diminishing returns and hyperthreading

http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25

Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

n/s: 1.708.948
n/s: 2.132.251
NPS speed-up: 1.25

n/s: 6.828.883
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

``````Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
``````
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

``````Score of SF8 2 threads vs SF8 1 thread&#58; 1073 - 871 - 3056  &#91;0.520&#93; 5000
ELO difference&#58; 14.04 +/- 5.99
Finished match``````

Code: Select all

``````Score of SF8 8 threads vs SF8 4 threads&#58; 851 - 751 - 3398  &#91;0.510&#93; 5000
ELO difference&#58; 6.95 +/- 5.44
Finished match``````
The test confirms the above on low number of cores.
On a typical i7 with four cores and eight threads, would you expect that best results would be with six, seven, or eight threads?
Komodo rules!

Posts: 9408
Joined: Wed Jul 26, 2006 8:21 pm

### Re: Diminishing returns and hyperthreading

lkaufman wrote:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25

Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

n/s: 1.708.948
n/s: 2.132.251
NPS speed-up: 1.25

n/s: 6.828.883
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

``````Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
``````
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

``````Score of SF8 2 threads vs SF8 1 thread&#58; 1073 - 871 - 3056  &#91;0.520&#93; 5000
ELO difference&#58; 14.04 +/- 5.99
Finished match``````

Code: Select all

``````Score of SF8 8 threads vs SF8 4 threads&#58; 851 - 751 - 3398  &#91;0.510&#93; 5000
ELO difference&#58; 6.95 +/- 5.44
Finished match``````
The test confirms the above on low number of cores.
On a typical i7 with four cores and eight threads, would you expect that best results would be with six, seven, or eight threads?
6-7, but the benefit compared to 8 threads is not outside error marigns (still, it seems plausible).

Milos
Posts: 3387
Joined: Wed Nov 25, 2009 12:47 am

### Re: Diminishing returns and hyperthreading

http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25

Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

n/s: 1.708.948
n/s: 2.132.251
NPS speed-up: 1.25

n/s: 6.828.883
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

``````Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
``````
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

``````Score of SF8 2 threads vs SF8 1 thread&#58; 1073 - 871 - 3056  &#91;0.520&#93; 5000
ELO difference&#58; 14.04 +/- 5.99
Finished match``````

Code: Select all

``````Score of SF8 8 threads vs SF8 4 threads&#58; 851 - 751 - 3398  &#91;0.510&#93; 5000
ELO difference&#58; 6.95 +/- 5.44
Finished match``````
The test confirms the above on low number of cores.
What you write is very true. Just 2 corrections. When using HT, the amount of cache per thread is effectively divided per 2 rendering hash access slower and higher interthread overhead compared to case with equivalent number of real cores. This has a consequence of reduced efficiency in case of HT, so 95.5% efficiency might actually be overestimation.
Second thing is related to your test. When HT is on, performance of 4 real cores is slightly lower compared to performance on 4 real cores when HT is off. So test results have merit but only relative as a demonstration of diminishing benefit of HT with increased number of cores, but doesn't really mean than 8 HT is stronger than 4 real cores. If you tested on 2 separate machines HT vs. no-HT, that 7Elo advantage might very well turn out negative.

lkaufman
Posts: 3671
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

### Re: Diminishing returns and hyperthreading

lkaufman wrote:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25

Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

n/s: 1.708.948
n/s: 2.132.251
NPS speed-up: 1.25

n/s: 6.828.883
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

``````Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
``````
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

``````Score of SF8 2 threads vs SF8 1 thread&#58; 1073 - 871 - 3056  &#91;0.520&#93; 5000
ELO difference&#58; 14.04 +/- 5.99
Finished match``````

Code: Select all

``````Score of SF8 8 threads vs SF8 4 threads&#58; 851 - 751 - 3398  &#91;0.510&#93; 5000
ELO difference&#58; 6.95 +/- 5.44
Finished match``````
The test confirms the above on low number of cores.
On a typical i7 with four cores and eight threads, would you expect that best results would be with six, seven, or eight threads?
6-7, but the benefit compared to 8 threads is not outside error marigns (still, it seems plausible).
Thanks. I've been using six threads on my quad for chess analysis. Even if it's not stronger than four threads with hyperthreading off, it at least avoids the need to turn it on or off depending on intended use.
Komodo rules!

Posts: 9408
Joined: Wed Jul 26, 2006 8:21 pm

### Re: Diminishing returns and hyperthreading

Milos wrote:
http://www.talkchess.com/forum/viewtopic.php?t=62146

Amdahl's law:
Effective Speed-Up = 1 / (1 - 0.955 + 0.955/n_cores)
Which describes very accurately the results.

From this each doubling in physical cores is worth in effective speed-up:

1 ---> 2 cores: 1.91
2 ---> 4 cores: 1.84
4 ---> 8 cores: 1.73
8 ---> 16 cores: 1.57
16 ---> 32 cores: 1.40
32 ---> 64 cores: 1.25

Hyperthreading on 4 physical, 8 logical cores i7-4790 (affinities set each time), 150 positions:

n/s: 1.708.948
n/s: 2.132.251
NPS speed-up: 1.25

n/s: 6.828.883
n/s: 8.563.755
NPS speed-up: 1.25

So, we can assume that hyperthreading gives 25% boost in NPS (0.32 doubling in NPS) pretty much independently of the number of cores.
Doubling of physical cores, which gives diminishing returns, will give diminishing to negative results with doubling of logical cores (hyperthreading):

Code: Select all

``````Threads    Diminished return    NPS boost with hyperthrd   Total hyperthrd benefit
in doubling             in doubling             in doubling

1 to 2         -0.066                  0.32                      0.25
2 to 4         -0.12                   0.32                      0.20
4 to 8         -0.21                   0.32                      0.11
8 to 16        -0.35                   0.32                     -0.03
16 to 32       -0.51                   0.32                     -0.19
32 to 64       -0.68                   0.32                     -0.36
``````
So, hyperthreading using 8 or more physical cores is already useless strength-wise. With many cores, it becomes very harmful strength-wise. There is commentary on 8 physical cores, using 10 or 12 (not full 16) hyperthreads might help a bit. The prediction I can test on my i7-4790 is that gain 1 --> 2 threads on one same physical core is worth more than double the gain 4 --> 8 threads on 4 physical cores:

10'' + 0.1''

Code: Select all

``````Score of SF8 2 threads vs SF8 1 thread&#58; 1073 - 871 - 3056  &#91;0.520&#93; 5000
ELO difference&#58; 14.04 +/- 5.99
Finished match``````

Code: Select all

``````Score of SF8 8 threads vs SF8 4 threads&#58; 851 - 751 - 3398  &#91;0.510&#93; 5000
ELO difference&#58; 6.95 +/- 5.44
Finished match``````
The test confirms the above on low number of cores.
What you write is very true. Just 2 corrections. When using HT, the amount of cache per thread is effectively divided per 2 rendering hash access slower and higher interthread overhead compared to case with equivalent number of real cores. This has a consequence of reduced efficiency in case of HT, so 95.5% efficiency might actually be overestimation.
Second thing is related to your test. When HT is on, performance of 4 real cores is slightly lower compared to performance on 4 real cores when HT is off. So test results have merit but only relative as a demonstration of diminishing benefit of HT with increased number of cores, but doesn't really mean than 8 HT is stronger than 4 real cores. If you tested on 2 separate machines HT vs. no-HT, that 7Elo advantage might very well turn out negative.
I assummed here that if I set affinities to logical cores correctly, I get the same result. I set them, I don t know how to do such tests otherwise. 2 identical machines are needed if there is a difference.