Strongest MPI-capable (cluster) engine?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Strongest MPI-capable (cluster) engine?

Post by abulmo »

I updated the model to make it a linear relationship and to distinguish the ivy bridge CPU.
Diep's NPS can be estimated as:

Code: Select all

<NPS> = &#40;amd * (-5.28%) + 1&#41; * &#40;bulldozer*(-37.38%)+1&#41;
      * &#40;nehalem*3.82% + 1&#41; * &#40;sandy bridge * &#40;10.23%)+1&#41; * &#40;ivy bridge * &#40;13.92%) + 1&#41;
      * &#40;hyperthreading * 24% + 1&#41; * n_cores * frequency * 66080 + 96860
with n_cores, the number of cores, frequency the frequency in Ghz (including turboboost for n_cores). The other variables equal 1 if true, 0 otherwise.

for example, the i7-965 runs with 4 cores/8 threads at 3.33 Ghz: NPS = (3.82%+1)*(24%+1)*3.33*4*66080+96860 = 1.23 MNPS, which is not so far from the 1.247MNPS measured value.
Richard
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Strongest MPI-capable (cluster) engine?

Post by diep »

abulmo wrote:I updated the model to make it a linear relationship and to distinguish the ivy bridge CPU.
Diep's NPS can be estimated as:

Code: Select all

<NPS> = &#40;amd * (-5.28%) + 1&#41; * &#40;bulldozer*(-37.38%)+1&#41;
      * &#40;nehalem*3.82% + 1&#41; * &#40;sandy bridge * &#40;10.23%)+1&#41; * &#40;ivy bridge * &#40;13.92%) + 1&#41;
      * &#40;hyperthreading * 24% + 1&#41; * n_cores * frequency * 66080 + 96860
with n_cores, the number of cores, frequency the frequency in Ghz (including turboboost for n_cores). The other variables equal 1 if true, 0 otherwise.

for example, the i7-965 runs with 4 cores/8 threads at 3.33 Ghz: NPS = (3.82%+1)*(24%+1)*3.33*4*66080+96860 = 1.23 MNPS, which is not so far from the 1.247MNPS measured value.
Your table is wrong and numbers you quote don't make sense at all.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Strongest MPI-capable (cluster) engine?

Post by abulmo »

diep wrote:
abulmo wrote:I updated the model to make it a linear relationship and to distinguish the ivy bridge CPU.
Diep's NPS can be estimated as:

Code: Select all

<NPS> = &#40;amd * (-5.28%) + 1&#41; * &#40;bulldozer*(-37.38%)+1&#41;
      * &#40;nehalem*3.82% + 1&#41; * &#40;sandy bridge * &#40;10.23%)+1&#41; * &#40;ivy bridge * &#40;13.92%) + 1&#41;
      * &#40;hyperthreading * 24% + 1&#41; * n_cores * frequency * 66080 + 96860
with n_cores, the number of cores, frequency the frequency in Ghz (including turboboost for n_cores). The other variables equal 1 if true, 0 otherwise.

for example, the i7-965 runs with 4 cores/8 threads at 3.33 Ghz: NPS = (3.82%+1)*(24%+1)*3.33*4*66080+96860 = 1.23 MNPS, which is not so far from the 1.247MNPS measured value.
Your table is wrong and numbers you quote don't make sense at all.
May you be more precise?
What table? I just write a formula.
Which numbers do not make sense?
Maybe there are some misunderstanding. % abbreviates "/ 100" and * is the multiplication operator.
Richard
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Strongest MPI-capable (cluster) engine?

Post by diep »

abulmo wrote:
diep wrote:
abulmo wrote:I updated the model to make it a linear relationship and to distinguish the ivy bridge CPU.
Diep's NPS can be estimated as:

Code: Select all

<NPS> = &#40;amd * (-5.28%) + 1&#41; * &#40;bulldozer*(-37.38%)+1&#41;
      * &#40;nehalem*3.82% + 1&#41; * &#40;sandy bridge * &#40;10.23%)+1&#41; * &#40;ivy bridge * &#40;13.92%) + 1&#41;
      * &#40;hyperthreading * 24% + 1&#41; * n_cores * frequency * 66080 + 96860
with n_cores, the number of cores, frequency the frequency in Ghz (including turboboost for n_cores). The other variables equal 1 if true, 0 otherwise.

for example, the i7-965 runs with 4 cores/8 threads at 3.33 Ghz: NPS = (3.82%+1)*(24%+1)*3.33*4*66080+96860 = 1.23 MNPS, which is not so far from the 1.247MNPS measured value.
Your table is wrong and numbers you quote don't make sense at all.
May you be more precise?
What table? I just write a formula.
Which numbers do not make sense?
Maybe there are some misunderstanding. % abbreviates "/ 100" and * is the multiplication operator.
You keep ignoring what i posted before.
Stubborn like hell huh?

A table based upon base frequency. That's not realistic. All we want is an IPC number. IPC per process of course. As number of processes determines your speedup.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Strongest MPI-capable (cluster) engine?

Post by abulmo »

diep wrote:A table based upon base frequency. That's not realistic.
If I am correct, you think there is some magic in turboboost by Intel/turbocore by Amd and that they are lying about the real frequencies of their CPU.
Personally I never heard of any conspiracy theory about CPU maker lying about their CPU frequencies. To me the CPU frequency are well known and published data. No secrets here.
diep wrote:All we want is an IPC number. IPC per process of course. As number of processes determines your speedup.
I think IPC is a misleading unit. Hyperthreading can compensate for a bad IPC. New instructions like hardware popcount lead to lower IPC but higher overall speed.

The only thing I care is the overall speed of my program.
Richard
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Strongest MPI-capable (cluster) engine?

Post by diep »

abulmo wrote:
diep wrote:A table based upon base frequency. That's not realistic.
If I am correct, you think there is some magic in turboboost by Intel/turbocore by Amd and that they are lying about the real frequencies of their CPU.
Personally I never heard of any conspiracy theory about CPU maker lying about their CPU frequencies. To me the CPU frequency are well known and published data. No secrets here.
diep wrote:All we want is an IPC number. IPC per process of course. As number of processes determines your speedup.
I think IPC is a misleading unit. Hyperthreading can compensate for a bad IPC. New instructions like hardware popcount lead to lower IPC but higher overall speed.

The only thing I care is the overall speed of my program.
If we do not use the rumours intel spreaded itself in 2008 when their CPU released and just use FACTS.

then we do simple calculation whether normalized each todays Ivy Bridge in the shops is faster than Nehalem i7-965, intels first released i7.

Speed ivy bridge : i7-3770k. Turbo frequency: 3.9Ghz
i7-965 turbo frequency 3.46Ghz

Now we normalize to 1Ghz the results:

Ivy Bridge i7 3770k: 1478333.4 / 3.9Ghz / 8 cores = 47,382 nodes per proces per Ghz

Official statement i have is that for THIS MACHINE all cores i7-965 were overclocked in bios to 3.33Ghz. The wording 'overclock' is of course what i use for it. Intel will say this was 'within specs of the turboboost anyway'.

Old Nehalem i7-965: 1247903 / 3.33Ghz / 8 cores = 46843.2

This is a 1.15% difference whereas ivy bridge has the much higher clocked DDR3-1600 versus older nehalem has DDR3-1066, which is a BIG difference in performance.

So any evidence of ivy bridge being better as a CPU core is nonsense, whereas the machine has a huge RAM advantage over old i7's.

Note the above result really makes SENSE if you think about it. The SAME executable of course was benchmarked.

The advantages of newer intel cpu's is they are higher clocked and have more cores, yet the core still can decode only 4 instructions per clock.
The weakest part of todays cpu's according to designers of cpu's, is quickly decoding instructions. This is the real bottleneck, otherwise hyperthreading would of course be magnificent faster than it is now.

This also explains why AMD's bulldozer is the same speed nearly like Ivy Bridge and like Nehalem as it also can decode just 4 instructions per clock per module (1 module = 2 minicores). Bulldozer, of course it's a junk cpu eating too much power (but that's another discussion), we also have statement on what Ghz it ran at and we have its result.

Take into account latency to RAM is factor 2.5 times slower at bulldozer, that will explain the difference why i7 is doing better.

Official links :

http://www.lostcircuits.com/mambo//inde ... itstart=13

Intel page of ivy bridge: http://ark.intel.com/products/65523
http://ark.intel.com/products/37149/Int ... Intel-QPI)
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Strongest MPI-capable (cluster) engine?

Post by abulmo »

diep wrote:Ivy Bridge i7 3770k: 1478333.4 / 3.9Ghz / 8 cores = 47,382 nodes per proces per Ghz
When running on all cores, i7-3770k frequency is 3.7 Ghz, not 3.9 Ghz(except if the machine is manually overcloked to other values).
diep wrote:Official statement i have is that for THIS MACHINE all cores i7-965 were overclocked in bios to 3.33Ghz. The wording 'overclock' is of course what i use for it. Intel will say this was 'within specs of the turboboost anyway'.

Old Nehalem i7-965: 1247903 / 3.33Ghz / 8 cores = 46843.2

This is a 1.15% difference
This is more likely a 6.6% performance improvement.

I snip the rest of the discussion which is likely based on wrong data. I agree however that faster memory on ivy-bridge may explain most if not all of speed improvement. However memory specifications are hard to separate from CPU specifications nowadays.
Richard
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Strongest MPI-capable (cluster) engine?

Post by diep »

abulmo wrote:
diep wrote:Ivy Bridge i7 3770k: 1478333.4 / 3.9Ghz / 8 cores = 47,382 nodes per proces per Ghz
When running on all cores, i7-3770k frequency is 3.7 Ghz, not 3.9 Ghz(except if the machine is manually overcloked to other values).
You are wrong. This is intel testmachines. This is NOT what you can buy in a shop. These testmachines already get released long before the cpu officially releases.

Please note that manufacturers always release a lot of CPU's prior to official release. Usually the first few, which are a lot LOWER clocked, go to developers like intel. Then after all the bugfixing, they ship some testmachines to the different testers. All this is under NDA. fines and penalties if you betray those hurting such manufacturers will instantly bankrupt you.

As this is long before official release, they usually have special bioses which later production motherboards usually do not have.

Intel would be very stupid of course to nowadays put a 3.9Ghz 'turboboost' sticker on a CPU without shipping to official benchmarks the very best they have got, which is an i7-3770k at 3.9Ghz at all cores.

That is what has gone with that turboboost. Initially under full load only their specint submissions were higher clocked than the normal frequency, yet just relative little compared to what happens nowadays...

They have a turnover of $100 billion a year you know.

You must not do as if they are kids who do not know how to get the optimal performance out of their CPU's. If you put on the sticker it can turboboost to 3.9Ghz then you have to achieve that of course on your own testmachines or you are a joke.

If you release a CPU that's just 4 cores @ 8 logical cores that still decodes 4 instructions a cycle as a maximum, its IPC of course will not be better for existing chess program executables.

So Intels manner of slowly boosting their cpu's more by means of turboboost, gives them for each new release something to cheer about as they put it simply 100Mhz higher or so each time.

That's how they get additional performance of course from a chip with just 2 memory channels.

Please note you can easily get a lot more performance out of Ivy Bridge by replacing its heatspreader.

The reason why more Sandy bridge cpu's overclock better than ivy bridge is because intel has for their cpu's you can buy in the store, used a cheaper manner of attaching the heatspreader to the cpu. In past it was soldered and nowadays they just use some thermal grease.

So if you want to achieve the same performance for chess with ivy bridge like Sandy Bridge and overclock, you have to modify the CPU a lot.

Ivy Bridge from my viewpoint is a cheapskate Sandy Bridge.
Another problem from the newer 22 nm proces technology seems to be that the cpu's overclock very inconsistently. Some do not overclock at all others overclock (after removing the grease and replacing it by something better) a lot better.

Very variable.

So sorting out an Ivy Bridge cpu that overclocks well is a much harder problem than with Sandy Bridge.

Yet you shouldn't do as if Intel doesn't know how to sort out cpu's for their testmachines and by claiming turboboost they have the legal right to clock it higher for benchmarks.

So that's what happens of course. There is $100 billion at stake.
mike_bike_kite
Posts: 98
Joined: Tue Jul 26, 2011 12:18 am
Location: London

Re: Strongest MPI-capable (cluster) engine?

Post by mike_bike_kite »

ZirconiumX wrote:I will be getting another RasPi soon
Tom's Hardware recently did an article on a cluster of 64 RasPi's. End result was a 11 GHz of processing power and 1TB of memory - not sure if this really counts as supercomputing these days but the pictures of it are great.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Strongest MPI-capable (cluster) engine?

Post by diep »

mike_bike_kite wrote:
ZirconiumX wrote:I will be getting another RasPi soon
Tom's Hardware recently did an article on a cluster of 64 RasPi's. End result was a 11 GHz of processing power and 1TB of memory - not sure if this really counts as supercomputing these days but the pictures of it are great.
It's not even serious.

More like 2Ghz of processing power it is if you compare it with an i7.

So a single core i7 wipes away this 'cluster'. If i remember well they used USB to 'cluster'.

So that's not serious either.

Yet their intentions succeeded - they got massive publicity.

Though you can see it as a cluster what they built, if you compare it with normal clusters in terms of power usage, maintenance, speed, and how it behaves it's not similar of course.

It's not even suited to learn how to cluster as it's too different.

In total if you'd buy what they produced, including the automatic boot, you'll end up paying $10k+.

That's a high price for junk that belongs somewhere in the end 80s.