Some hyperthreading results

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Some hyperthreading results

Post by Laskos »

It seems Lazy SMP as implemented in Stockfish, Komodo (?) and Andscacs loves hyperthreading, at least on new i7 machines. On 4 physical cores, 8 logical of i7-4790 the NPS improvement 4 -> 8 threads is the following (10 seconds per position, 150 positions):

Lazy:
  • Stockfish dev: 1.52
    Komodo 10.1: 1.55
    Andscacs 0.872: 1.46
YBW:
  • Houdini 4: 1.29
    Crafty 25.01: 1.39
Fritz Benchmark gives 1.50 factor improvement 4->8 threads.
Also, a direct match of Stockfish dev 8 threads versus 4 threads at 10''+0.1'' gives:
  • Score of SF 8 threads vs SF 4 threads: 315 - 232 - 453 [0.541] 1000
    ELO difference: 28.90 +/- 10.90
Hyperthreading gives a significant advantage. Similar results with Komodo. No or very little benefit from hyperthreading strength-wise with YBW engines Houdini and Crafty.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some hyperthreading results

Post by bob »

Laskos wrote:It seems Lazy SMP as implemented in Stockfish, Komodo (?) and Andscacs loves hyperthreading, at least on new i7 machines. On 4 physical cores, 8 logical of i7-4790 the NPS improvement 4 -> 8 threads is the following (10 seconds per position, 150 positions):

Lazy:
  • Stockfish dev: 1.52
    Komodo 10.1: 1.55
    Andscacs 0.872: 1.46
YBW:
  • Houdini 4: 1.29
    Crafty 25.01: 1.39
Fritz Benchmark gives 1.50 factor improvement 4->8 threads.
Also, a direct match of Stockfish dev 8 threads versus 4 threads at 10''+0.1'' gives:
  • Score of SF 8 threads vs SF 4 threads: 315 - 232 - 453 [0.541] 1000
    ELO difference: 28.90 +/- 10.90
Hyperthreading gives a significant advantage. Similar results with Komodo. No or very little benefit from hyperthreading strength-wise with YBW engines Houdini and Crafty.
I'm surprised at the 1.39 you gave. I've not experimented with HT on recent Intel boxes (although I did on the latest IBM PPC chip). I didn't see results that good.

That is just above the break-even point for Crafty, where an additional thread adds about .7x useful work and .3x overhead. Your .39 speedup just passes above that .30 break-even point. Probably not enough to change Elo as you said, but interesting that it doesn't seem to LOSE any Elo either, which represents a significant change from previous tests I have done. Of course, I have not really tested HT with the newer 25.0/25.1 versions, so the major changes there might have helped without my knowing.
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Some hyperthreading results

Post by mjlef »

Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark
BBauer
Posts: 658
Joined: Wed Mar 08, 2006 8:58 pm

Re: Some hyperthreading results

Post by BBauer »

Since lazy smp is used you may use more threads then you have procs.
A kind of thread spamming.
So I am not surprised.
Kind regards
Bernhard
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Some hyperthreading results

Post by Laskos »

mjlef wrote:Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark
For NPS it's straightforward without switching HT off in BIOS. For example, for 4 cores, 8 threads, start a command prompt inside the folder of Komodo and type:

Start /affinity 55 Komodo.exe

55 is the hexadecimal representation of 01010101 (from core 7 to 0), i.e. physical cores 0, 2, 4, 6.

For matches I do a little sloppy job: in Cutechess-Cli with "restart=off" switch, I set the affinities by hand in task manager at the beginning of the match. If I need 4 threads on 4 physical cores, I leave only 0,2,4,6 checked. If I need 8 threads on 8 logical cores, I leave all checked. It can be done separately for each running engine.
I didn't notice significant differences using affinities to physical cores and switching HT off in the BIOS in Fritz Benchmark or NPS.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Some hyperthreading results

Post by Laskos »

bob wrote:
I'm surprised at the 1.39 you gave. I've not experimented with HT on recent Intel boxes (although I did on the latest IBM PPC chip). I didn't see results that good.

That is just above the break-even point for Crafty, where an additional thread adds about .7x useful work and .3x overhead. Your .39 speedup just passes above that .30 break-even point. Probably not enough to change Elo as you said, but interesting that it doesn't seem to LOSE any Elo either, which represents a significant change from previous tests I have done. Of course, I have not really tested HT with the newer 25.0/25.1 versions, so the major changes there might have helped without my knowing.
I tested Crafty 23.5, it came out with 1.32. With Crafty 25.01, HT seems indeed to not lose points, might even gain (inconclusive):
  • Score of Crafty 8 threads vs Crafty 4 threads: 343 - 303 - 354 [0.520] 1000
    ELO difference: 13.90 +/- 17.30
User avatar
RJN
Posts: 303
Joined: Fri Jun 21, 2013 5:18 am
Location: Orion Spiral Arm

Re: Some hyperthreading results

Post by RJN »

Laskos wrote:
mjlef wrote:Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark
For NPS it's straightforward without switching HT off in BIOS. For example, for 4 cores, 8 threads, start a command prompt inside the folder of Komodo and type:

Start /affinity 55 Komodo.exe

55 is the hexadecimal representation of 01010101 (from core 7 to 0), i.e. physical cores 0, 2, 4, 6.

For matches I do a little sloppy job: in Cutechess-Cli with "restart=off" switch, I set the affinities by hand in task manager at the beginning of the match. If I need 4 threads on 4 physical cores, I leave only 0,2,4,6 checked. If I need 8 threads on 8 logical cores, I leave all checked. It can be done separately for each running engine.
I didn't notice significant differences using affinities to physical cores and switching HT off in the BIOS in Fritz Benchmark or NPS.
Another way is to use Process Lasso, and assign different affinities to specific EXE names, such as StockFish1.EXE has a certain affinity, Stockfish2.EXE a different affinity, etc

Just be sure to turn off all other "features" of Process Lasso, like ProBalance.
i7-5930K @4.5GHz, H100i Hydro Cooler, 64GB DDR4 Corsair Dominator Platinum @3000MHz, ASUS X99 Deluxe mboard, 1TB EVO 850 SSD
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Some hyperthreading results

Post by bob »

mjlef wrote:Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark
You can (a) go into the BIOS and disable hyper threading (on most machines, excluding apple) or (b) just run with N/2 threads and any recent/decent O/S will schedule each thread on a physical core.

For testing, you can easily play 4 threads vs 8 threads on one machine. 4 threads will run on 4 physical cores, 8 threads will use all 8 logical processors (2 per core). If you are paranoid, you can resort to thread affinity.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some hyperthreading results

Post by lkaufman »

What is your opinion about how best to test in single-thread mode on these machines that have hyperthreading; test with HP off matching the physical core count (or minus one), or doubling the physical core count with hyperthreading on (or minus one or two)? We used to test with HT on and doubling the physical core count, but shortly before Don died we switched to testing with HT off and using the physical core count (minus 1) as almost everyone on this forum seemed convinced that HT should be off for single-thread testing. Clearly we can play more games per minute of equal quality with hyperthreading, but the suspicion is that they are less equal to each other in terms of available resources and hence more random. Of course this has nothing to do with lazy MP, as I'm talking about SP tests, but I could also ask the same question for four thread testing (on machines with 8 or more physical cores), and perhaps your answer would be different.
Komodo rules!
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Some hyperthreading results

Post by Dann Corbit »

Laskos wrote:
mjlef wrote:Kai,

How can both hyperthreading on and off be tested? Was this two identical machines or on 1 machine? I ask since I see a different nps with hyperthreading being off and using the half the thread, than hyperthreading on using half the threads. So I do not see how it can be tested on one machine.

If two identical machines could be setup and connected, one with hyperthreading on and one off, then that would rule out the issue. But nodes per second does not seem to be enough.

Mark
For NPS it's straightforward without switching HT off in BIOS. For example, for 4 cores, 8 threads, start a command prompt inside the folder of Komodo and type:

Start /affinity 55 Komodo.exe

55 is the hexadecimal representation of 01010101 (from core 7 to 0), i.e. physical cores 0, 2, 4, 6.

For matches I do a little sloppy job: in Cutechess-Cli with "restart=off" switch, I set the affinities by hand in task manager at the beginning of the match. If I need 4 threads on 4 physical cores, I leave only 0,2,4,6 checked. If I need 8 threads on 8 logical cores, I leave all checked. It can be done separately for each running engine.
I didn't notice significant differences using affinities to physical cores and switching HT off in the BIOS in Fritz Benchmark or NPS.
What happens when you exceed the hyperthread core count?
E.g. on a machine with 6 physical cores and 12 HT cores, what happens with 13 threads and above?
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.