Page 1 of 5

Some hyperthreading results

Posted: Mon Sep 12, 2016 8:41 pm
by Laskos
It seems Lazy SMP as implemented in Stockfish, Komodo (?) and Andscacs loves hyperthreading, at least on new i7 machines. On 4 physical cores, 8 logical of i7-4790 the NPS improvement 4 -> 8 threads is the following (10 seconds per position, 150 positions):

Lazy:
  • Stockfish dev: 1.52
    Komodo 10.1: 1.55
    Andscacs 0.872: 1.46
YBW:
  • Houdini 4: 1.29
    Crafty 25.01: 1.39
Fritz Benchmark gives 1.50 factor improvement 4->8 threads.
Also, a direct match of Stockfish dev 8 threads versus 4 threads at 10''+0.1'' gives:
  • Score of SF 8 threads vs SF 4 threads: 315 - 232 - 453 [0.541] 1000
    ELO difference: 28.90 +/- 10.90
Hyperthreading gives a significant advantage. Similar results with Komodo. No or very little benefit from hyperthreading strength-wise with YBW engines Houdini and Crafty.

Re: Some hyperthreading results

Posted: Mon Sep 12, 2016 9:36 pm
by bob
Laskos wrote:It seems Lazy SMP as implemented in Stockfish, Komodo (?) and Andscacs loves hyperthreading, at least on new i7 machines. On 4 physical cores, 8 logical of i7-4790 the NPS improvement 4 -> 8 threads is the following (10 seconds per position, 150 positions):

Lazy:
  • Stockfish dev: 1.52
    Komodo 10.1: 1.55
    Andscacs 0.872: 1.46
YBW:
  • Houdini 4: 1.29
    Crafty 25.01: 1.39
Fritz Benchmark gives 1.50 factor improvement 4->8 threads.
Also, a direct match of Stockfish dev 8 threads versus 4 threads at 10''+0.1'' gives:
  • Score of SF 8 threads vs SF 4 threads: 315 - 232 - 453 [0.541] 1000
    ELO difference: 28.90 +/- 10.90
Hyperthreading gives a significant advantage. Similar results with Komodo. No or very little benefit from hyperthreading strength-wise with YBW engines Houdini and Crafty.
I'm surprised at the 1.39 you gave. I've not experimented with HT on recent Intel boxes (although I did on the latest IBM PPC chip). I didn't see results that good.

That is just above the break-even point for Crafty, where an additional thread adds about .7x useful work and .3x overhead. Your .39 speedup just passes above that .30 break-even point. Probably not enough to change Elo as you said, but interesting that it doesn't seem to LOSE any Elo either, which represents a significant change from previous tests I have done. Of course, I have not really tested HT with the newer 25.0/25.1 versions, so the major changes there might have helped without my knowing.

Re: Some hyperthreading results

Posted: Mon Sep 12, 2016 10:05 pm
by mjlef
Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark

Re: Some hyperthreading results

Posted: Mon Sep 12, 2016 10:55 pm
by BBauer
Since lazy smp is used you may use more threads then you have procs.
A kind of thread spamming.
So I am not surprised.
Kind regards
Bernhard

Re: Some hyperthreading results

Posted: Mon Sep 12, 2016 11:03 pm
by Laskos
mjlef wrote:Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark
For NPS it's straightforward without switching HT off in BIOS. For example, for 4 cores, 8 threads, start a command prompt inside the folder of Komodo and type:

Start /affinity 55 Komodo.exe

55 is the hexadecimal representation of 01010101 (from core 7 to 0), i.e. physical cores 0, 2, 4, 6.

For matches I do a little sloppy job: in Cutechess-Cli with "restart=off" switch, I set the affinities by hand in task manager at the beginning of the match. If I need 4 threads on 4 physical cores, I leave only 0,2,4,6 checked. If I need 8 threads on 8 logical cores, I leave all checked. It can be done separately for each running engine.
I didn't notice significant differences using affinities to physical cores and switching HT off in the BIOS in Fritz Benchmark or NPS.

Re: Some hyperthreading results

Posted: Mon Sep 12, 2016 11:10 pm
by Laskos
bob wrote:
I'm surprised at the 1.39 you gave. I've not experimented with HT on recent Intel boxes (although I did on the latest IBM PPC chip). I didn't see results that good.

That is just above the break-even point for Crafty, where an additional thread adds about .7x useful work and .3x overhead. Your .39 speedup just passes above that .30 break-even point. Probably not enough to change Elo as you said, but interesting that it doesn't seem to LOSE any Elo either, which represents a significant change from previous tests I have done. Of course, I have not really tested HT with the newer 25.0/25.1 versions, so the major changes there might have helped without my knowing.
I tested Crafty 23.5, it came out with 1.32. With Crafty 25.01, HT seems indeed to not lose points, might even gain (inconclusive):
  • Score of Crafty 8 threads vs Crafty 4 threads: 343 - 303 - 354 [0.520] 1000
    ELO difference: 13.90 +/- 17.30

Re: Some hyperthreading results

Posted: Mon Sep 12, 2016 11:30 pm
by RJN
Laskos wrote:
mjlef wrote:Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark
For NPS it's straightforward without switching HT off in BIOS. For example, for 4 cores, 8 threads, start a command prompt inside the folder of Komodo and type:

Start /affinity 55 Komodo.exe

55 is the hexadecimal representation of 01010101 (from core 7 to 0), i.e. physical cores 0, 2, 4, 6.

For matches I do a little sloppy job: in Cutechess-Cli with "restart=off" switch, I set the affinities by hand in task manager at the beginning of the match. If I need 4 threads on 4 physical cores, I leave only 0,2,4,6 checked. If I need 8 threads on 8 logical cores, I leave all checked. It can be done separately for each running engine.
I didn't notice significant differences using affinities to physical cores and switching HT off in the BIOS in Fritz Benchmark or NPS.
Another way is to use Process Lasso, and assign different affinities to specific EXE names, such as StockFish1.EXE has a certain affinity, Stockfish2.EXE a different affinity, etc

Just be sure to turn off all other "features" of Process Lasso, like ProBalance.

Re: Some hyperthreading results

Posted: Tue Sep 13, 2016 12:08 am
by bob
mjlef wrote:Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark
You can (a) go into the BIOS and disable hyper threading (on most machines, excluding apple) or (b) just run with N/2 threads and any recent/decent O/S will schedule each thread on a physical core.

For testing, you can easily play 4 threads vs 8 threads on one machine. 4 threads will run on 4 physical cores, 8 threads will use all 8 logical processors (2 per core). If you are paranoid, you can resort to thread affinity.

Re: Some hyperthreading results

Posted: Tue Sep 13, 2016 12:18 am
by lkaufman
What is your opinion about how best to test in single-thread mode on these machines that have hyperthreading; test with HP off matching the physical core count (or minus one), or doubling the physical core count with hyperthreading on (or minus one or two)? We used to test with HT on and doubling the physical core count, but shortly before Don died we switched to testing with HT off and using the physical core count (minus 1) as almost everyone on this forum seemed convinced that HT should be off for single-thread testing. Clearly we can play more games per minute of equal quality with hyperthreading, but the suspicion is that they are less equal to each other in terms of available resources and hence more random. Of course this has nothing to do with lazy MP, as I'm talking about SP tests, but I could also ask the same question for four thread testing (on machines with 8 or more physical cores), and perhaps your answer would be different.

Re: Some hyperthreading results

Posted: Tue Sep 13, 2016 1:32 am
by Dann Corbit
Laskos wrote:
mjlef wrote:Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark
For NPS it's straightforward without switching HT off in BIOS. For example, for 4 cores, 8 threads, start a command prompt inside the folder of Komodo and type:

Start /affinity 55 Komodo.exe

55 is the hexadecimal representation of 01010101 (from core 7 to 0), i.e. physical cores 0, 2, 4, 6.

For matches I do a little sloppy job: in Cutechess-Cli with "restart=off" switch, I set the affinities by hand in task manager at the beginning of the match. If I need 4 threads on 4 physical cores, I leave only 0,2,4,6 checked. If I need 8 threads on 8 logical cores, I leave all checked. It can be done separately for each running engine.
I didn't notice significant differences using affinities to physical cores and switching HT off in the BIOS in Fritz Benchmark or NPS.
What happens when you exceed the hyperthread core count?
E.g. on a machine with 6 physical cores and 12 HT cores, what happens with 13 threads and above?