Some Notes about Hyper-Threading

bob · Post by **bob** » Sun Dec 11, 2011 4:49 pm

rodolfoleoni wrote:
bob wrote:
rodolfoleoni wrote:
bob wrote: Problem with HT on is that if you have 4 physical cores, and search X NPS, when you go to 8 cores (HT on) the tree will grow by 30%. If your NPS doesn't grow by MORE than 30%, you see a net loss.

NPS is NOT the way to measure parallel search performance. It provides completely bogus comparisons...
And here's my problem: I tried to disable HT but I didn't find any option in BIOS-Advanced. There's only an utility, "Easy Flash", and I should only use it to browse and find the BIOS file.... but I've no idea about where to search for it.

It's an Asus laptop, X53S series. Any guess?

Thanks in advance.
It is usually under 'CPU information." But it is often called something bizarre like "logical processor on/off" or such nonsense. That is what our Dell boxes had the last time I fooled with this. I no longer disable HT, I just make sure to use 1 thread per physical core and let the O/S (Linux in my case) make certain that each physical thread runs on its own physical core...
Thanks!

As soon as the current test I'm running will be complete I'll try to disable HT again.

BTW, Crafty 23.4 is one of the sparring partners I always use to tune The Baron. Crafty seems to gain something in depth with HT enabled and 8 cores (ponder off), but I want to try it with 4 cores and HT disabled too. I think it'll run better.

I have not found a box where HT on helps crafty. Might help on some positions, but it hurts on others, and the overall effect is a loss, although not a big one.

diep · Post by **diep** » Sun Dec 11, 2011 5:25 pm

Sedat Canbaz wrote:
diep wrote:
Speaking is silver, testing is gold.

i5 has less memory channels than i7. 2 versus 3, so i5 is a lot worse than i7.

for diep HT works magnificent, and when overclocking a cpu to 4.5Ghz or so it works even better, just like hiarcs team also reported they turned on HT for hiarcs on the overclocked 12 core @ 24 logical core box. 4.65ghz or so overclocked during tournament?

At those speeds HT gives for diep 30% or so. At 3.xGhz it's more like 20%.

And yes - it does search deeper.

By the way you can also see it in testresults from Lostcircuits how bad i5 is.

At 3.7Ghz (turboboost) it's 1.0M nps versus C2Q doing better there if you extrapolate its speed.

http://www.lostcircuits.com/mambo//inde ... itstart=16

Of course the gulftown and sandy-bridge 6-core cpu's total dominate, as they have 50% more cores and in case of sandy bridge 4 memory channels.

But that's another story...

Dear Vincent,

Honestly i am surprised ...
I did not know that about there is any chess engine with HT ON,where its chess speed performance to be better than HT OFF ?!

So...Diep with HT ON has better performance than Diep with HT OFF ?

At those overclocked high speeds - yes it does.
And from what i understood for hiarcs as well.
Realize you just test some illegal produced engine which a year or 10 ago would have stumbled into 50 courtcases which would've finished houdart & CIA partners.

Realize most of the HT experiences date back from the P4 which was a very bad chip. If you move from 4 cores to 8 cores, then i guess only Diep really profits, but with i7 world changed of course.

First of all it doesn't give a practical 10% better scaling

Can you confirm with HT data (testings,games,benchmarks...) please ?

I don't have an i7 at home, then i could give you exact measurements, but the scaling increase is so huge, 30% increase in nps, that it's very easy to prove it using speedup numbers of Diep.

Most 'testers' and 'tests' performed is just pathetic amateuristic.

If you ask for a screenshot you'll see they run 100 other applications and spyware in the background - no way to accurately test a chess engine then.

An automated test of Diep runs a few weeks or so and tests 213 positions full automatic. The results published on lostcircuits is just scaling measurements which is much easier. That's done with 5 independant tests (full automatic). With excel you then calculate the standard error using 95% certainty etc.

Realize this is for speedup measurements only. Scaling is easy to see.

In official non-overclocked benchmarks you can see Diep gains 22% by hyperthreading. now that is a dangerous percentage, as a tad less than that and it won't break even. Above that it easily breaks even.

It has to do with speedup deterioration that all engines have when moving from n to 2n cores.

Most cloned engines seem to use a similar type of SMP algorithm, which actually is very professional work - i wonder who original implemented it - the rest seems to have typed it over. Someone got really paid to do this - there is no unpaid forces doing this. This is specialistic work for math guys who know something about CORRECTNESS.

Popular seems the alibaba type algorithm (as i call it) after the famous French researchers from the 90s, who wrote the impossible to read paper. That paper had several interpretations leading to similar idead algorithms. It has some crap form of YBW, which i wouldn't want to call YBW as it isn't aborting in case of fail high.

Best algorithm is YBW with all its enhancements. Diep, Crafty, Stockfish use it. In that order of efficiency.

Diep also combines it with non-centralized forms of administration, whereas all others use centralized YBW. So at some central spot it's locking entire datastructure for all splits, aborts, etc.

This centralization is a big problem when you want to scale to dozens of cores and therefore speedup deteriortes bigtime with those engines and HT won't break even.

Now scaling is different from speedup. Scaling is increase in nps, speedup is the wall clock to finish a ply.

So from speedup achieved at n cores moving to 2n cores you can prove whether moving to 2n is gonna work for hyperthreading.

Realize many programs have scaling issues when getting above 8 cores, so HT won't break soon even then, whereas speedup suffers bigtime above 4 cores.

I'm not sure what Hiarcs uses. Very little information there has come my way.

It also seems that the Jonny guy is spreading desinformation through the wire about what he did do or didn't do.

Diep doesn't have those issues. Hiarcs i don't know, but i do know that a scaling increase of 30% will have many engines break even quite quickly.

So for example if speedup is 7 out of 8 and 12 out of 16,
let's do some math for you there:

Then in order to let hyperthreading break even you need:

7 / 8 == 6h / 8

6h == 7

==> h = 7/6 = 17%

So with Diep hyperthreading giving above 17% will break even.

Oldie P4 simply practical hardly got that 17%. In fact it was more like 10% in most cases when we tested. Only some overclocked P4, with bus bigtime overclocked, from Ron Langeveld (correspondenceplayer) he managed to get with Diep around 20% improvement by hyperthreading. Very expensive RAM is what he used in combination with overclocked bus.

It's the only P4 other than some intel testmachines that achieved it.

With i7 this is all much simpler to achieve. Overclock it a tad and it'll go great. At default frequencies you'll get a 20-22% scaling increase by hyperthreading. Your nodes per second will benefit far over 20%.

When overclocking this goes up to 30% or so which really kicks butt.

Hiarcs team also must've noticed this as i was told they had it turned on. Harvey nods yes.

Now there might be a maximum to this. When moving from 16 cores to 32 logical cores, you'll have more scaling issues. So what i posted here works for a few cores, i didn't test it myself yet at 16 real cores becoming 32 logical cores, as i have no accurate testdata from 32 cores.

Ah yes as for houdini & clones - why run those beancounters on big hardware you know. It's beancounters designed for blitz. Waste of money to buy big hardware for those engines.

In my testings,its quite clear that Houdini 2.0c with HT OFF performs much better than Houdini 2.0c with HT ON
Just i'd like to mention and confirm again that Houdini 2.0c with HT ON is much slower in solving the mates than Houdini 2.0 HT OFF

I have no Diep chess engine and i can not check it,thats why its will be great if you inform us about:

1)Have you tested both systems between each other in Auto232 mode (i mean Diep HT ON against Diep HT OFF) ?

2) If you have already a such useful HT Auto232 Test:what is the ELO difference between HT ON and HT OFF ??

If you have no HT Auto232 test (still you did not test them to play against each other),then you can not be sure !!

The higher kns values dont mean that HT ON is faster or better
In other words: the most important is the Chess Speed-ELO Performance (not higher kns values)

So...in my opinion,the best way to measure which system is better for chess:
-HT OFF and HT ON should be played against each other in Auto232 mode (on two identical separate machines)
1) PC A (Hyper Threading ON-enabled from bios)
2)PC B (Hyper Threading OFF-disabled from bios)

Note:for the Hyper-Threading test should be used same neutral book and same chess engine

One thing more,actually i see a lot of comments here...but unfortunately no any useful data (exception my HT ON /HT OFF mate benchmarks)

Come on dear Friends,

Is anybody have any serious data for the current HT issue
But next time please no more comments,i prefer to see HT testings,HT games,HT benchmarks...

BTW,another notes by Robert Houdart about Hyper-Threading:

Houdini 2 will automatically limit the number of threads to the number of logical processors of your hardware.
If your computer supports hyper-threading it is recommended not using more threads than physical cores,
as the extra hyper-threads would usually degrade the performance of the engine.

Q: I'm running Houdini on a Core i7 CPU with hyper-threading. Would you recommend to use hyper-threading with Houdini?

The architecture of Houdini (and of chess engines in general) is not very well suited for hyper-threading;
using more threads than physical cores will usually degrade the performance of the engine.
Although the hyper-threads often produce a slightly higher node speed, the increased inefficiency
of the parallel alpha-beta search more than offsets the speed gain obtained with the additional hyper-threads.
To give a practical example, it's more efficient to use 4 threads running at 2,000 kN/s each than 8 threads
running at 1,100 kN/s each, although the latter situation produces a higher total node speed.

For this reason it's best to set the number of threads not higher than the number of physical cores of your hardware.

Best,
Sedat

ernest · Post by **ernest** » Sun Dec 11, 2011 7:00 pm

Sedat Canbaz wrote:Of course its very normal and i am not surprised too that there will be a few ones who will not like/hate my work

Hi my paranoid friend,

Well, you see, Forums are made for giving arguments, pro and con.

I think that your HT study is at least partially flawed because you based it on single multiprocessor test, and multiprocessor test results are not reproducible. So you need to average several tests to conclude.

Lots of machine power is good, but has no value when the tester is blind in the head...

Robert Flesher · Post by **Robert Flesher** » Sun Dec 11, 2011 8:00 pm

rodolfoleoni wrote:
bob wrote: Problem with HT on is that if you have 4 physical cores, and search X NPS, when you go to 8 cores (HT on) the tree will grow by 30%. If your NPS doesn't grow by MORE than 30%, you see a net loss.

NPS is NOT the way to measure parallel search performance. It provides completely bogus comparisons...
And here's my problem: I tried to disable HT but I didn't find any option in BIOS-Advanced. There's only an utility, "Easy Flash", and I should only use it to browse and find the BIOS file.... but I've no idea about where to search for it.

It's an Asus laptop, X53S series. Any guess?

Thanks in advance.

Heya Vincent, nice to see you add your thoughts. I have found that on my I7 920 (overclocked ALOT) that HT is infact better when left on. Even thought many experts say otherwise, all tests on my mahcine show that the engine is better with it on. It solves mates faster, searches deeper faster, so obviously plays stronger. Nice to see that Diep (you) and Hiarcs team are seeing the same results.

Cheers

diep · Post by **diep** » Sun Dec 11, 2011 8:12 pm

ernest wrote:
Sedat Canbaz wrote:Of course its very normal and i am not surprised too that there will be a few ones who will not like/hate my work
Hi my paranoid friend,

Well, you see, Forums are made for giving arguments, pro and con.

I think that your HT study is at least partially flawed because you based it on single multiprocessor test, and multiprocessor test results are not reproducible. So you need to average several tests to conclude.

Lots of machine power is good, but has no value when the tester is blind in the head...

For 95% certainty you need to do a 200 tests or so.

Note most doubt that number. Guys like Stefan Meyer-Kahlen they are not convinced at all with 200 tests, they require 1000+.

My automated tester uses 213 which, if your engine has a good SMP algorithm, like Diep has, usually gives a good speedup with 95% sureness.

It's quite possible that the clones under which Houdini which use a more risky form of SMP search, which counts at having luck with hashtable to quickly abort your already failed high CPU's which didn't get aborted yet, that you really need that 1000+ positions for an accurate measurement.

You can run those tests full automatic with some console tools. Then with a script you hack out all logfiles and average it.

This is one hell of a problem of course for accurate testing, but that's the problem with todays SMP searches, i can't make it simpler than it is.

Vincent

diep · Post by **diep** » Sun Dec 11, 2011 8:27 pm

Robert Flesher wrote:
rodolfoleoni wrote:
bob wrote: Problem with HT on is that if you have 4 physical cores, and search X NPS, when you go to 8 cores (HT on) the tree will grow by 30%. If your NPS doesn't grow by MORE than 30%, you see a net loss.

NPS is NOT the way to measure parallel search performance. It provides completely bogus comparisons...
And here's my problem: I tried to disable HT but I didn't find any option in BIOS-Advanced. There's only an utility, "Easy Flash", and I should only use it to browse and find the BIOS file.... but I've no idea about where to search for it.

It's an Asus laptop, X53S series. Any guess?

Thanks in advance.

Heya Vincent, nice to see you add your thoughts. I have found that on my I7 920 (overclocked ALOT) that HT is infact better when left on. Even thought many experts say otherwise, all tests on my mahcine show that the engine is better with it on. It solves mates faster, searches deeper faster, so obviously plays stronger. Nice to see that Diep (you) and Hiarcs team are seeing the same results.

Cheers

Hi it depends largely upon the total number of cores you have and which program you use, and how stupid you were to allow spyware to get installed at your box.

All this spyware and other updaters really influence your machines performance with HT.

I've also had a case that someone (Renze Steenhuisen) ran Diep in a CCT tournament, and then he used a remote login to the machine, causing basically 1 core to be fulltime buy with VNC; so Diep using 8 processes then was running at 7 cores during a CCT event.

If you do that to Stockfish it slows down stockfish factor 1000 or so. Diep "just" suffered factor 2 to 3 there. But factor 2 to 3 is a lot of course.

In short very small percentages of systemtime, which you don't even notice, they can have a huge impact onto whether HT works or doesn't work. it can be positive or negative to both sides.

Most importantly however it seems the tested program used.

Furthermore my experience from some years ago it is that another big factor is the tester itself and how he tests. They all use the Japanese Samurai way of thinking when testing. Some engine 'supposedly" number 1 always somehow gets the advantage.

I remember from a year or 12+ ago that i got Jan Louwman on the phone, who after a week of testing at 30 computers, phoned me.

"Diep has not won a SINGLE game" he reported. It was against a newer Rebel version at the time.

The next morning I stepped in the car and 1 hour later i was with Jan Louwman. Then i saw that Rebel had next habit.

It arbitrated games after X moves. a well known bug of the auto232 players which Rebel exploited commercial.

Games under 5 pawns up, it showed aborted or 'draw'. Games with 5 pawns up, it declared a victory for itself.

However it stopped playing out games when a rook down, and being a rook down for Rebel it scored like +4.0, so that was under the 5.0 limit, which by accident meant all those losses from Rebel didn't get counted.

So diep lost 'seemingly' with big score to Rebel, thanks to the human error.

Another talk i had with a SSDF guy some years ago. He had 2 computers. One was lower clocked than the other. The higher clocked computer he gave less time per game than the lower clocked computer. In reality the lower clocked computer was by far fastest computer and also the faster time control he enforced wasn't objectively correct of course.

Another tester reported to me: "Diep lost everything, it's a crap version this version". I got the logfiles. It played against DeepFritz. To my amazement the nps of diep in its logfiles wwasn't even close to what i would expect on his computer. We checked it out then and i found out that deepfritz by default used all cores, and when pondering still used all cores, total lobotomizing Diep during its thinking as Diep had to fight to get a core.

A simple click at taskmanager would've shown him this problem, yet again i had to figure it out without being at the site - just the logfile showed me.

Another weird thing happened when i got a report back from Ernst Walet: "diep didn't move so i had to force it to play a move". In reality he had clicked twice on the field, and already enforced the opponents move. Can call it a GUI bug, but he just didn't see he had made an extra move that in reality wasn't played on the board, so he would always have been allowed by the tournament director to take back that move.

The human error and the laziness of most dudes here is the biggest cause for problems in testing. It's always total silly and stupid things that dominate things over here.

diep · Post by **diep** » Sun Dec 11, 2011 8:35 pm

"my own tester uses 213 positions"

Note that it's pure coincident of course, yet i'm also searching for Wagstaff primes which we also call 213 as it is p = (2^n + 1) / 3

http://www.primenumbers.net/prptop/prptop.php

http://en.wikipedia.org/wiki/Wagstaff_prime

Yet with 213 positions i really mean 213 chess positions and where i know that it ain't enough for anyone else, for me it's a magic number

So all the testdata here which involves 1,2 or sometimes even 5 positions, proving something, that's just not serious you know.

The magic number is 213

Sedat Canbaz · Post by **Sedat Canbaz** » Mon Dec 12, 2011 3:03 am

Once more i'd like to confirm:
-In my HT bench testings,HT OFF is clearly the Winner (even with Hiarcs 13.2 engine too)

Hyper Threading Disabled:

Hyper Threading Enabled:

More Hyper-Threading details about i7 980X 4.33GHz:
-Only the best results have been published
-Each engine has been tested minimum 5-6 times
-Slightly different mate position has been used
-Before starting each bench,all engine's hashtables are cleaned
-The current HT OFF engines solve the mate position better than HT ON Engines
-Its true that HT ON has higher kns values than HT OFF,but the Chess Speed is favor for HT OFF

Download all HT OFF/ON Benchmarks by Rybka,Houdini,Hiarcs:
http://www.sedatcanbaz.com/chess/games/ ... 433GHz.rar
Note:the used 'Mate in 11' position is in the folder

BTW,i have a big database (thousands of games) played with HT ON and HT OFF
And honestly,the best results and performance are in favor for HT OFF too

One thing more:(this is just my opinion)
-The most TOP Players of all over the World prefer Houdini and Hyper Threading OFF

Best Wishes,
Sedat

Sedat Canbaz · Post by **Sedat Canbaz** » Mon Dec 12, 2011 3:37 am

ernest wrote:
Sedat Canbaz wrote:Of course its very normal and i am not surprised too that there will be a few ones who will not like/hate my work
Hi my paranoid friend,

Well, you see, Forums are made for giving arguments, pro and con.

I think that your HT study is at least partially flawed because you based it on single multiprocessor test, and multiprocessor test results are not reproducible. So you need to average several tests to conclude.

Lots of machine power is good, but has no value when the tester is blind in the head...

Hello my dear Jealous friend

Sorry...that i can not provide you more useful HT data...
But maybe its your turn to do something for our great hobby

Hmm...yes,its very easy to attack people on the forums,but its no so easy to attack people in the real life

And you are really lucky man,otherwise-you will be stay far away from me

BTW,i have no patience to see something single useful thing by you,really i will applause you
Maybe even if you agree of course,i can donate you (but first working/efforts are needed)

Your Friend,
Sedat

Vinvin · Post by **Vinvin** » Mon Dec 12, 2011 9:46 am

One more very interesting thing to see is : HT ON but engine only use 6 threads. I heard Windows 7 manage this very well. Is the results similar to HT OFF ? If not, Win7 probably put 2 use threads on a single CPU and that's bad ...

And 1 more thing : please run this position with "HT OFF and 6 threads" 10 times and post the 10 timings here (they shouldn't be constant) ...

Thx,
Vincent.

Sedat Canbaz wrote:Once more i'd like to confirm:
-In my HT bench testings,HT OFF is clearly the Winner (even with Hiarcs 13.2 engine too)

Hyper Threading Disabled:
http://www.sedatcanbaz.com/chess/pictur ... OFF_9s.gif

Hyper Threading Enabled:
http://www.sedatcanbaz.com/chess/pictur ... ON_51s.gif

More Hyper-Threading details about i7 980X 4.33GHz:
-Only the best results have been published
-Each engine has been tested minimum 5-6 times
-Slightly different mate position has been used
-Before starting each bench,all engine's hashtables are cleaned
-The current HT OFF engines solve the mate position better than HT ON Engines
-Its true that HT ON has higher kns values than HT OFF,but the Chess Speed is favor for HT OFF

Download all HT OFF/ON Benchmarks by Rybka,Houdini,Hiarcs:
http://www.sedatcanbaz.com/chess/games/ ... 433GHz.rar
Note:the used 'Mate in 11' position is in the folder

BTW,i have a big database (thousands of games) played with HT ON and HT OFF
And honestly,the best results and performance are in favor for HT OFF too

One thing more:(this is just my opinion)
-The most TOP Players of all over the World prefer Houdini and Hyper Threading OFF

Best Wishes,
Sedat

Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading

Re: Some Notes about Hyper-Threading