Hyperthreading and Computer Chess: Intel i5-3210M

Mike S. · Post by **Mike S.** » Sat Apr 13, 2013 11:27 pm

I am only hoping you are not just running a couple of positions and drawing conclusions from that???

Unfortunately yes, I do. I have now sixteen data pairs from eight engines and four positions. I am aware that my test does not meet scientific requirements as I take not enough samples. - But I do not do science, I do a hobby. And if 12 from 16 single tests show a gain from hyperthreading, that is sufficient for me. Case closed.

(That is for the i5-3210M CPU only, and I cannot comment on other systems.)

geots · Post by **geots** » Sun Apr 14, 2013 2:49 am

shrapnel wrote:

So by your theory, you would have to be saying that HT basically "dumbed down" the engine so it was not playing as strong as it normally did before you used hyperthreading. If you enable HT and can now beat opponents you used to draw, then you did experience degradation of performance with the engine. Unless you are saying HT enabled made YOU play stronger. What you are doing is making a case for testers to NEVER USE HT in their tests. No harm meant- but you need to go back to the drawing board. Your post is chaotic at best.

george[/quote]
Actually I'm saying that NOT using HT was dumbing down the Engine.
The statement in Bold is the only one you got correct.
So you think MY post is chaotic, eh ?
Wonder what others think of YOUR post ? No harm meant

[/quote]

"but what I DO know for a FACT is that enabling HT has made it possible for me to beat very strong opponents with whom I used to draw earlier !"

Once again- if "enabling" HT makes it possible for you to beat opponents that you used to draw- then those opponents evidently are not playing as well with HT enabled. Because HT has nothing to do with > or < your playing strength. I'm struggling to see what part of this statement you made is hard for you to understand.

gts

geots · Post by **geots** » Sun Apr 14, 2013 3:03 am

bob wrote:
syzygy wrote:
bnemias wrote:This subject comes up every so often, and it's hard to believe people still think searching a larger tree with a small increase in NPS is beneficial. I'm not completely convinced there's a relationship between time to depth and playing strength either. But lacking any data of my own, I tend to believe Bob.
I don't know why you doubt that there is a relationship between time to depth and playing strength. For sure these are strongly correlated. The same depth in less time, all else being equal, definitely results in stronger play.

It is also not strange to believe that the tree being larger, all else being equal, contributes positively to playing strength. With pure alpha-beta it would contribute nothing, but the top engines all use a highly selective search.

A more selective search is usually good because the added depth outweighs the errors introduced by more selectivity. But if you can search a somewhat larger tree in the same time and reaching the same depth, the impact of these errors is reduced and play may be expected to be stronger.

So even if time-to-depth increases with HT on, it just might be the case that overall this is outweighed by the positive impact of the larger tree.

Bob's HT tests were probably limited to crafty. I also believe they were done on a system that was not overclocked. The higher the overclock, the better the performance of HT. This is because memory latency has a bigger impact at higher clock frequencies and HT shines at hiding these latencies.

From the tests I have done myself I could not conclude that my engine benefits from HT, but I cannot say anything for engines that I have not tested.
I think you are hoping for "good luck". In reality, what is happening is that the parallel search is simply searching nodes that are completely unnecessary to produce the right score.

Sometimes a parallel search will find the answer much more quickly than expected, but this is generally a result of poor move ordering where the parallel search looks at the supposedly bad move sooner than the sequential search does, and quickly establishes a better bound that makes things go faster. But that is uncommon, not expected to happen regularly.

I'm likely the only person on the planet to have actually run 30K game matches with 1 cpu, then with 2, then with 4, and finally with 8. And I have found NO circumstance where time to depth suggests one thing, and actual games suggest another. That is, the speed of the parallel search is the thing that gains Elo, not some bizarre tree shape that happens regularly. Such is just "statistical noise" if the test is large enough...

The only exceptions I have seen are those where so few games are played, the statistical variance makes the results statistically insignificant.

The only exception to the "hyper threading is not good for chess" would be a poorly implemented program which gets an unusual boost from HT, that a well-designed implementation would not get. Such tricks (HT) tend to help poorly written code more than code that has been optimized to efficiently access memory and to reduce as much as possible unnecessary data dependencies or unnecessary computation that stalls/clogs pipelines...

Dr. Bob, I have a question. I am running 2- Intel i5 4-core systems. Neither has hyperthreading. I have on the way a new Intel i7 6-core system, in which the HP can be disabled- which I plan to do. But in theory- if I did not disable HT on the i7, in testing would that create a variable I don't want- 2 systems with no HT, and 1 system with HT- and the results all being thrown into the same batch?

Best,

george

bob · Post by **bob** » Sun Apr 14, 2013 7:07 am

syzygy wrote:
bob wrote:I think you are hoping for "good luck". In reality, what is happening is that the parallel search is simply searching nodes that are completely unnecessary to produce the right score.
Well, you don't have to read what I write, but I'm not sure why you still bother to answer?

I responded precisely to what you wrote...

quote:

It is also not strange to believe that the tree being larger, all else being equal, contributes positively to playing strength. With pure alpha-beta it would contribute nothing, but the top engines all use a highly selective search.

A more selective search is usually good because the added depth outweighs the errors introduced by more selectivity. But if you can search a somewhat larger tree in the same time and reaching the same depth, the impact of these errors is reduced and play may be expected to be stronger.

It is most definitely "strange" to believe that a larger tree is better. Turn off alpha/beta and it will get a LOT larger. Zero better.

Second. If you believe that using extra CPUS to search broader rather than deeper is better, why not search broader rather than deeper with a single CPU?

As I said, this is a pretty ridiculous way of looking at a tree search issue, parallel or not.

So, in reality, I responded specifically to what you wrote. Why you thought otherwise I have no idea, unless you did not read what I wrote and give it any thought...

bob · Post by **bob** » Sun Apr 14, 2013 7:11 am

geots wrote:
bob wrote:
syzygy wrote:
bnemias wrote:This subject comes up every so often, and it's hard to believe people still think searching a larger tree with a small increase in NPS is beneficial. I'm not completely convinced there's a relationship between time to depth and playing strength either. But lacking any data of my own, I tend to believe Bob.
I don't know why you doubt that there is a relationship between time to depth and playing strength. For sure these are strongly correlated. The same depth in less time, all else being equal, definitely results in stronger play.

It is also not strange to believe that the tree being larger, all else being equal, contributes positively to playing strength. With pure alpha-beta it would contribute nothing, but the top engines all use a highly selective search.

A more selective search is usually good because the added depth outweighs the errors introduced by more selectivity. But if you can search a somewhat larger tree in the same time and reaching the same depth, the impact of these errors is reduced and play may be expected to be stronger.

So even if time-to-depth increases with HT on, it just might be the case that overall this is outweighed by the positive impact of the larger tree.

Bob's HT tests were probably limited to crafty. I also believe they were done on a system that was not overclocked. The higher the overclock, the better the performance of HT. This is because memory latency has a bigger impact at higher clock frequencies and HT shines at hiding these latencies.

From the tests I have done myself I could not conclude that my engine benefits from HT, but I cannot say anything for engines that I have not tested.
I think you are hoping for "good luck". In reality, what is happening is that the parallel search is simply searching nodes that are completely unnecessary to produce the right score.

Sometimes a parallel search will find the answer much more quickly than expected, but this is generally a result of poor move ordering where the parallel search looks at the supposedly bad move sooner than the sequential search does, and quickly establishes a better bound that makes things go faster. But that is uncommon, not expected to happen regularly.

I'm likely the only person on the planet to have actually run 30K game matches with 1 cpu, then with 2, then with 4, and finally with 8. And I have found NO circumstance where time to depth suggests one thing, and actual games suggest another. That is, the speed of the parallel search is the thing that gains Elo, not some bizarre tree shape that happens regularly. Such is just "statistical noise" if the test is large enough...

The only exceptions I have seen are those where so few games are played, the statistical variance makes the results statistically insignificant.

The only exception to the "hyper threading is not good for chess" would be a poorly implemented program which gets an unusual boost from HT, that a well-designed implementation would not get. Such tricks (HT) tend to help poorly written code more than code that has been optimized to efficiently access memory and to reduce as much as possible unnecessary data dependencies or unnecessary computation that stalls/clogs pipelines...

Dr. Bob, I have a question. I am running 2- Intel i5 4-core systems. Neither has hyperthreading. I have on the way a new Intel i7 6-core system, in which the HP can be disabled- which I plan to do. But in theory- if I did not disable HT on the i7, in testing would that create a variable I don't want- 2 systems with no HT, and 1 system with HT- and the results all being thrown into the same batch?

Best,

george

My advice is as before. If you have an 86 with 6 cores, and you never use more than 6 cores at once (whether you play 6 games with ponder=off, 3 games with ponder=on, or one game with an engine using 6 cores ponder=off, the results should be reliable.

If, you go farther and use more than 6 cores, some of those physical cores get split into what is effectively 2 pieces, each 1/2 as fast as the original. Some programs will be running on a CPU 1/2 as fast as normal, some won't, and that can certainly add noise.

Further, since all of these things use turbo-boost, it gets a little trickier yet, since just using one core will use overclocking, but when you use several overclocking turns off completely or almost completely. Again introducing cpu speed variability. I turn this off on whatever boxes I can, apple being the exception where there is no bios to fiddle with.

bob · Post by **bob** » Sun Apr 14, 2013 7:18 am

syzygy wrote:
bob wrote:This plays right into the hands of a parallel search that by its very nature tends to do better when move ordering is sub-optimal.
Isn't it interesting that YBW is "known" to have no overhead compared to a sequential search with optimal move ordering?

I have a suspicion that with a good implementation of parallel search most search overhead is due to missed transpositions.

You would be wrong. We typically see 90% of fail highs on the first move searched, which means 10% of the tree is NOT optimally ordered. In Crafty, I measure the "stops" that occur, which are caused by failing high on an ALL node, something that should not happen. Yet it does.

Optimal move ordering does not, and never will exist, otherwise we would not need search in the first place. If you find a copy of the paper I wrote for the Journal of Parallel Computing, circa 1988, it very precisely analyzes this problem and predicts the tree size growth that it causes.

I'm sure some transpositions are missed, but this error analysis was done back in the days of 5 ply searches on the kopec positions, where transpositions were anything but a major problem. And the overhead was the same then as today...

bob · Post by **bob** » Sun Apr 14, 2013 7:20 am

Mike S. wrote:
I am only hoping you are not just running a couple of positions and drawing conclusions from that???
Unfortunately yes, I do. I have now sixteen data pairs from eight engines and four positions. I am aware that my test does not meet scientific requirements as I take not enough samples. - But I do not do science, I do a hobby. And if 12 from 16 single tests show a gain from hyperthreading, that is sufficient for me. Case closed.

(That is for the i5-3210M CPU only, and I cannot comment on other systems.)

Those are essentially random numbers. One has to use some level of statistical rigor to measure these things and make statements concerning whether the test shows something to be good or bad. With such a tiny sample, it shows nothing at all...

I've run such tests with 30K games per match, I have run problem sets of 300 positions run dozens of times and averaged, all to eliminate the statistical variance one expects with a non-deterministic parallel search.

geots · Post by **geots** » Sun Apr 14, 2013 7:58 am

bob wrote:
geots wrote:
bob wrote:
syzygy wrote:
bnemias wrote:This subject comes up every so often, and it's hard to believe people still think searching a larger tree with a small increase in NPS is beneficial. I'm not completely convinced there's a relationship between time to depth and playing strength either. But lacking any data of my own, I tend to believe Bob.
I don't know why you doubt that there is a relationship between time to depth and playing strength. For sure these are strongly correlated. The same depth in less time, all else being equal, definitely results in stronger play.

It is also not strange to believe that the tree being larger, all else being equal, contributes positively to playing strength. With pure alpha-beta it would contribute nothing, but the top engines all use a highly selective search.

A more selective search is usually good because the added depth outweighs the errors introduced by more selectivity. But if you can search a somewhat larger tree in the same time and reaching the same depth, the impact of these errors is reduced and play may be expected to be stronger.

So even if time-to-depth increases with HT on, it just might be the case that overall this is outweighed by the positive impact of the larger tree.

Bob's HT tests were probably limited to crafty. I also believe they were done on a system that was not overclocked. The higher the overclock, the better the performance of HT. This is because memory latency has a bigger impact at higher clock frequencies and HT shines at hiding these latencies.

From the tests I have done myself I could not conclude that my engine benefits from HT, but I cannot say anything for engines that I have not tested.
I think you are hoping for "good luck". In reality, what is happening is that the parallel search is simply searching nodes that are completely unnecessary to produce the right score.

Sometimes a parallel search will find the answer much more quickly than expected, but this is generally a result of poor move ordering where the parallel search looks at the supposedly bad move sooner than the sequential search does, and quickly establishes a better bound that makes things go faster. But that is uncommon, not expected to happen regularly.

I'm likely the only person on the planet to have actually run 30K game matches with 1 cpu, then with 2, then with 4, and finally with 8. And I have found NO circumstance where time to depth suggests one thing, and actual games suggest another. That is, the speed of the parallel search is the thing that gains Elo, not some bizarre tree shape that happens regularly. Such is just "statistical noise" if the test is large enough...

The only exceptions I have seen are those where so few games are played, the statistical variance makes the results statistically insignificant.

The only exception to the "hyper threading is not good for chess" would be a poorly implemented program which gets an unusual boost from HT, that a well-designed implementation would not get. Such tricks (HT) tend to help poorly written code more than code that has been optimized to efficiently access memory and to reduce as much as possible unnecessary data dependencies or unnecessary computation that stalls/clogs pipelines...

Dr. Bob, I have a question. I am running 2- Intel i5 4-core systems. Neither has hyperthreading. I have on the way a new Intel i7 6-core system, in which the HP can be disabled- which I plan to do. But in theory- if I did not disable HT on the i7, in testing would that create a variable I don't want- 2 systems with no HT, and 1 system with HT- and the results all being thrown into the same batch?

Best,

george
My advice is as before. If you have an 86 with 6 cores, and you never use more than 6 cores at once (whether you play 6 games with ponder=off, 3 games with ponder=on, or one game with an engine using 6 cores ponder=off, the results should be reliable.

If, you go farther and use more than 6 cores, some of those physical cores get split into what is effectively 2 pieces, each 1/2 as fast as the original. Some programs will be running on a CPU 1/2 as fast as normal, some won't, and that can certainly add noise.

Further, since all of these things use turbo-boost, it gets a little trickier yet, since just using one core will use overclocking, but when you use several overclocking turns off completely or almost completely. Again introducing cpu speed variability. I turn this off on whatever boxes I can, apple being the exception where there is no bios to fiddle with.

Thank you Dr. I missed one thing. Did you say you turned the turbo-boost or the overclocking OFF when you could?

Thanks again,

george

PS: One last thing. If I was running 6 single core matches- my real cores are all used up. There are 6 interfaces that all total will require a bit of a core- tho maybe not much. So if a virtual core comes into play here- how do I know that the power the 6 guis need is not coming from a real core and one engine is getting what was taken away from it to give to the guis from a virtual core? (Assuming what I have said makes any sense at all)

Mike S. · Post by **Mike S.** » Sun Apr 14, 2013 7:21 pm

bob wrote:One has to use some level of statistical rigor to measure these things and make statements concerning whether the test shows something to be good or bad.

Yes... I'd rather like to have 16,000 data pairs of that quality, than just 16. It is a problem, e.g. I tested that non-automated. Anyway, thinking twice I was not satisfied with these few samples. So I did run a testsuite, six times total. The SwissTest 4 (64 pos.), max. 10 seconds per position.

Dualcore i5-3210M/2.5...2.9 GHz, 512 MB hashtables

Code: Select all

            Houdini 1.5a 4T	 Stockfish 100413 4T  Critter 1.6a 4T  Stockfish 100413 2T  Houdini 1.5a 2T  Critter 1.6a 2T	
--------------------------------------------------------------------------------------------------------------------------
Total time&#58; 00&#58;02&#58;23           00&#58;03&#58;11             00&#58;03&#58;46         00&#58;03&#58;18             00&#58;03&#58;42         00&#58;04&#58;04	
solved&#58;     56                 55                   54               53                   50               47

All 3 engines have gained from hyperthreading, some more, some less. That is based on 6*64 = 384 single tests. I think all points into the same direction.

(Critter's session file was off.)

syzygy · Post by **syzygy** » Sun Apr 14, 2013 8:07 pm

bob wrote:
syzygy wrote:
bob wrote:I think you are hoping for "good luck". In reality, what is happening is that the parallel search is simply searching nodes that are completely unnecessary to produce the right score.
Well, you don't have to read what I write, but I'm not sure why you still bother to answer?
I responded precisely to what you wrote...

quote:

It is also not strange to believe that the tree being larger, all else being equal, contributes positively to playing strength. With pure alpha-beta it would contribute nothing, but the top engines all use a highly selective search.

A more selective search is usually good because the added depth outweighs the errors introduced by more selectivity. But if you can search a somewhat larger tree in the same time and reaching the same depth, the impact of these errors is reduced and play may be expected to be stronger.
It is most definitely "strange" to believe that a larger tree is better. Turn off alpha/beta and it will get a LOT larger. Zero better.

So you simply don't read what I wrote. I gave an explanation and you ignore it and instead come up with a comparison with minimax.

Second. If you believe that using extra CPUS to search broader rather than deeper is better, why not search broader rather than deeper with a single CPU?

Obviously if you can reach the same depth in the same time with a broader search (i.e. less pruning), your search is better. I'm not sure why you take issue with this.

As I said, this is a pretty ridiculous way of looking at a tree search issue, parallel or not.

The problem is your refusal to read.

Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M

Re: Hyperthreading and Computer Chess: Intel i5-3210M