Increase in Elo ..Question For The Experts

Uri Blass · Post by **Uri Blass** » Fri Dec 09, 2011 3:46 pm

MikeGL wrote:Inline with your subject I have also similar question that
I'm just curious about.

If I have an average engine, say rated 2500 running at P-III, but it has in its
disposal upto 4-5-6-7-8men TB, then, even just in theory, will it be able to
defeat stronger engines (elo 3200) running at powerful hardware?
Assuming the stronger engine has only upto 5 piece TB.

I have seen 2TB external drives even in small electronic shops, so I guess
6 & 7-men TB is already possible (although maybe hidden for private use).

Tablebases are not going to change much and they are not relevant for most games.

practically they worth nothing based on testing because they cause the search to be slower and using them only at the root is not relevant for most games.

Better hardware can help in theory but if we talk about 700 elo difference you need hardware that is clearly more than 100 times faster and it is
not practical to expect it.

MikeGL · Post by **MikeGL** » Fri Dec 09, 2011 4:18 pm

Uri Blass wrote:
MikeGL wrote:Inline with your subject I have also similar question that
I'm just curious about.

If I have an average engine, say rated 2500 running at P-III, but it has in its
disposal upto 4-5-6-7-8men TB, then, even just in theory, will it be able to
defeat stronger engines (elo 3200) running at powerful hardware?
Assuming the stronger engine has only upto 5 piece TB.

I have seen 2TB external drives even in small electronic shops, so I guess
6 & 7-men TB is already possible (although maybe hidden for private use).
Tablebases are not going to change much and they are not relevant for most games.

practically they worth nothing based on testing because they cause the search to be slower and using them only at the root is not relevant for most games.

Better hardware can help in theory but if we talk about 700 elo difference you need hardware that is clearly more than 100 times faster and it is
not practical to expect it.

Oh ok, thanks.

bob · Post by **bob** » Fri Dec 09, 2011 5:37 pm

Werewolf wrote:Bob, thank you.

Can I ask one more question: how relevant is the size of cache in a processor? I notice Intel have gone from 2 MB /per core to 2.5 MB / per core in their xeons but some people say cache isn't relevant for chess.

Can you say whether it is or isn't with a brief (and simple ) explanation of why this is so, please?

It makes a difference, but not a huge difference. Only way I would attempt to compare them is by actually running my program. Years ago I had the choice of a PIII xeon with 512k, 1024K or 2048K of cache (single core chips, obviously, but this box had 4 sockets. The 512K chips were about $1K each. The 2mb chips were about $5k each. The 2mb chips were faster by maybe 10% or some such, but the price difference didn't make them reasonable.

diep · Post by **diep** » Sun Dec 11, 2011 5:56 pm

kasinp wrote:Steve,

Please have a look at this resource:

http://tldp.org/HOWTO/BogoMips/bogo-list.html

For the 486 66MHz I think the speed index is around 33.
For P3 866MHz the speed index is ca. 1730.

These numbers give a much bigger difference that the raw clock speed comparison. Bob is absolutely right - once cycle of the 486 is NOT the same as one cycle of the P3.

I used these index values to calibrate DOS Box version of the Genius 3 program to my dedicated Mephisto unit (there are Motorola 68030 speeds here as well). The results were really close to the index predictions.

Regards,
PK

bogomips is not really interesting for computerchess as the 486 didn't really have much of a branch mispredictoin penalty and pentiumpro and pII and P3 do.

So some programs will suck on a P3 relative seen, which did do great on a 486.

As for Diep. Moving from a 486/66 to a Pentium100 gave exactly 3.0 speedup. Moving from pentium100 to pentiumpro was also exactly factor 3.0 speedup. Moving from pentiumpro200 to P2-450 was not a 2.25 speedup, more like factor 2.0. MOving from P2 to P3 was another 20% IPC difference as P3 is capable, so my memory doesn't fail, to reorder instructions very well.

if i add this all up then a 486/66 to 866Mhz P3 at least for Diep, which is a C program, would have meant: 3.0 * 3.0 * 2.0 * 1.2 * (866/450) = factor 41.6

That's still far away from factor 52 in bogomips, and this for 'the ideal program to profit from newer cpu's with bigger caches'.

However this speedup is not true for old programs, usually written in 16 bits assembler, which really suffer on pentiumpro/PII/P3 as 16 bits code is ugly slow there. On a P3-1Ghz it's not much faster than on a pentium-200Mhz.

Vincent

diep · Post by **diep** » Sun Dec 11, 2011 6:23 pm

Werewolf wrote:Bob, thank you.

Can I ask one more question: how relevant is the size of cache in a processor? I notice Intel have gone from 2 MB /per core to 2.5 MB / per core in their xeons but some people say cache isn't relevant for chess.

Can you say whether it is or isn't with a brief (and simple ) explanation of why this is so, please?

Not so much the size of a cache, the speed of a cache can make or break a processor. As for the L1 cache the size is really relevant. 8KB is really too tiny. 16KB is real tiny (i7 and bulldozer have 16KB effectively per minicore), 32KB tends to be seen as 'ok' again and 64KB is luxury for the datacache.

Usually the cache size gets quoted with instruction and datacache together and also it's for 2 cores.

So a 64KB L1 for i7 means in reality effectively 16KB a core, as first you have to split it in 32 + 32 for data and instruction cache and then somehow it has to be shared by 2 cores. bulldozer has a hard split, so in a hard manner splits it into 16KB datacache for each core. instruction cache they are a bit more gibberish about.

The chip with the best cache is the fastest chip simply. The real problem which determines how fast a chip is for computerchess usually is the speed at which it can decode integer instructions from the L1 instruction cache to the execution cores/units. i7 can decode 4 instructions a cycle a core, which 2 logical cores share, compare the P4 could decode just 1 instruction a cycle. That some tracecache could on paper retrieve 3 a cycle somehow didn't really speed it up for computerchess. A bulldozer module can decode 4 instructions a cycle. Not surprisingly for Diep a 4 module bulldozer is the same speed like an i7-quadcore at the same Ghz, with the 6 core i7-gulftown (i7-970,i7-980,i7-990) totally destroying it.

It's relative cheap and easy to put a big cache onto a chip. A fast cache is however really complicated.

Right now the L1 caches are comparable in speed somewhat. Say 4 cycles or so, but they get prefetched pretty ok, so you don't suffer that 4 cycle penalty to do an access in the L1 cache.

The real difference is the L2 cache right now. i7 has the fastest L2 cache. It's just 256KB a core, yet for computerchess this is enough. Say above 512KB you can't even measure a difference between 512KB and 1MB, let alone between 1MB and 2MB. AMD's bulldroop (bulldozer) has 2MB cache a module (same thing like a core in i7), and it's ugly slow. Like over 2 times slower than from i7.

AMD has a fundamental problem from performance viewpoint. Intel on other hand is a tad expensive with its quadcore i7's, knowing AMD is a lot cheaper there with bulldozer. Might intel soon release their new sandy bridge sixcores they can cleansweep the bulldozer of course and annihilate it. All based upon having 50% more cores, from 4 to 6, and a fast L2 cache.

Bulldozer needs a whopping 2 billion transistors, compare a gulftown i7-sixcore is 1.2 billion transistors. That's everything counted huh - recently AMD released a statement which is vague that it's 1.2 billion transistors as well, but seems without counting its L3 caches which alone are making up for a 850 million transistors.

All those cpu's are very well optimized internal, so some simple change won't win them speed. So the entire design of i7 requires a fast tiny cache, and the entire design of bulldozer requires a huge slow cache. There is no way to change that except if you design an entirely new cpu.

It's unclear to me why AMD designed this total failed bulldozer chip which basically achieves a similar performance like the i7-quadcores, which intel already released november 2008 with the i7-965.

Sure it's a lot cheaper. So intel will wipe those bulldozers away when they release a cheap sixcore sandybridge. note that the sandy bridge design is a 8-core design, but seems they usually turn off 2 cores of it. The Xeons have the full 8 cores.

Vincent

diep · Post by **diep** » Sun Dec 11, 2011 6:36 pm

CRoberson wrote:IIRC, the Pentium 90 ran some things (neural networks with much floating point) 5 times faster than the 486 66. Don't recall the chess program gains. The Pentium III (the name really pissed off the engineering team that designed it) was the first of Intel's superscalar architecture chips. The 800 MHz version was 3 or 4 steppings after the original P III release. It got better. Thus, there should be a nonlinear gain (relative to MHz) between the two in chess performance.

To help this discussion out. I actually have 2 operational machines that can help. One is a Pentium 90 (still with the original FP bug) and a Pentium III 800 Mhz machine.

If you have any specific experiments for me to run, let me know.

It's the pentium-60Mhz that had the FP bug AFAIK

The pentium can execute 2 instructions a cycle therefore it's the first superscalar cpu.

The pentium-pro really improved upon this, but as a drawback has a big branch mispredict penalty which nearly didn't exist before.

I use all this old stuff for a selfmade linux firewall here, for 2 reasons:
a) it's more secure
b) it uses less power.

Even then by the way they really systematically try to hack the chessprogrammers. Recently i had massive attackwaves at my linux firewall with the IP numbers seemingly originating from Brazil (but you never know that for sure).

The new hardware really eats too much power. I remember how Jan Louwman had at home 36 computers testing. Majority were in 1 room at a single 16 ampere @ 240 volt circuit. That would be impossible nowadays.

Just the gpu would already eat too much nowadays

Most machines are well over 220-240 watt now and the overclocked i7's are soon a 400 watt or so.

All that hardware from mid and end 90s is far under a 100 watt. The first box i had that was over a 100 watt, to be precise 134 watt it says over here, was a dual k7. Just the Q6600 on its own already is eating 134 watt however...

Terry McCracken · Post by **Terry McCracken** » Sun Dec 11, 2011 6:59 pm

diep wrote:
CRoberson wrote:IIRC, the Pentium 90 ran some things (neural networks with much floating point) 5 times faster than the 486 66. Don't recall the chess program gains. The Pentium III (the name really pissed off the engineering team that designed it) was the first of Intel's superscalar architecture chips. The 800 MHz version was 3 or 4 steppings after the original P III release. It got better. Thus, there should be a nonlinear gain (relative to MHz) between the two in chess performance.

To help this discussion out. I actually have 2 operational machines that can help. One is a Pentium 90 (still with the original FP bug) and a Pentium III 800 Mhz machine.

If you have any specific experiments for me to run, let me know.
It's the pentium-60Mhz that had the FP bug AFAIK
The pentium can execute 2 instructions a cycle therefore it's the first superscalar cpu.

The pentium-pro really improved upon this, but as a drawback has a big branch mispredict penalty which nearly didn't exist before.

I use all this old stuff for a selfmade linux firewall here, for 2 reasons:
a) it's more secure
b) it uses less power.

Even then by the way they really systematically try to hack the chessprogrammers. Recently i had massive attackwaves at my linux firewall with the IP numbers seemingly originating from Brazil (but you never know that for sure).

The new hardware really eats too much power. I remember how Jan Louwman had at home 36 computers testing. Majority were in 1 room at a single 16 ampere @ 240 volt circuit. That would be impossible nowadays.

Just the gpu would already eat too much nowadays

Most machines are well over 220-240 watt now and the overclocked i7's are soon a 400 watt or so.

All that hardware from mid and end 90s is far under a 100 watt. The first box i had that was over a 100 watt, to be precise 134 watt it says over here, was a dual k7. Just the Q6600 on its own already is eating 134 watt however...

OMG what nonsense! My Q6600 uses 95 watts stock speed. My i7 Sandy Bridge also uses 95 watts stock speed.

My GPU in the i7 machine can hit 210 watts at full throttle but uses little while idling.

Point, modern machines use less power per clock cycle than old machines.

Can't speak for AMD. They're crap.

diep · Post by **diep** » Sun Dec 11, 2011 7:25 pm

bob wrote:
Werewolf wrote:Bob,
Is there any chance you could provide a quick definition of 'super-scaler' and 'out-of order' and explain why the out of order approach is so much better?

Much appreciated.
Superscalar architectures simply issue two (or more) instructions per clock cycle. The original pentium issued two at a time when possible. More recent versions can issue up to 4, perhaps more on very recent versions.

it seems the real bottleneck nowadays is 2 folded. The decode speed of the instructions and the speed of the L2.

The real problem in those cpu's according to some rumours from hardware designers is the complexity they need in the modern cpu's to decode instructions as that seems to be a rather complex proces nowadays.

i7 decodes 4 instructions and i calculated diep is around IPC = 1.73 or above at it effectively with hyperthreading. Maybe i'll do once some effort to improve that. Last systematic effort in diep to really adjust the engines code to the modern processors was around 1998...

Not sure it's possible to really get above ipc=1.73 considering the number of branches and huge amount of L1 requests for each instruction.

From what i understand the number of execution units really is huge in the modern processors. SMT and Hyperthreading probably won't make it easier to precisely understand what the heck they're doing - i guess.

Where the bulldroop seems to win a 66% with its second core in speed, intels hyperthreading is more like 20%+. So the single threaded speed of i7 seems to be there because of a much faster L2.

Some years ago, before opteron, it seems the CPU's also had problems decoding instructions to L1, that nowadays seems less of a problem as they also store it on L2.

But i'm pretty amazed that the instruction decoding speed is such a bottleneck. This seems also the limitation to scale to more cores.

diep · Post by **diep** » Sun Dec 11, 2011 7:34 pm

Terry McCracken wrote:
diep wrote:
CRoberson wrote:IIRC, the Pentium 90 ran some things (neural networks with much floating point) 5 times faster than the 486 66. Don't recall the chess program gains. The Pentium III (the name really pissed off the engineering team that designed it) was the first of Intel's superscalar architecture chips. The 800 MHz version was 3 or 4 steppings after the original P III release. It got better. Thus, there should be a nonlinear gain (relative to MHz) between the two in chess performance.

To help this discussion out. I actually have 2 operational machines that can help. One is a Pentium 90 (still with the original FP bug) and a Pentium III 800 Mhz machine.

If you have any specific experiments for me to run, let me know.
It's the pentium-60Mhz that had the FP bug AFAIK
The pentium can execute 2 instructions a cycle therefore it's the first superscalar cpu.

The pentium-pro really improved upon this, but as a drawback has a big branch mispredict penalty which nearly didn't exist before.

I use all this old stuff for a selfmade linux firewall here, for 2 reasons:
a) it's more secure
b) it uses less power.

Even then by the way they really systematically try to hack the chessprogrammers. Recently i had massive attackwaves at my linux firewall with the IP numbers seemingly originating from Brazil (but you never know that for sure).

The new hardware really eats too much power. I remember how Jan Louwman had at home 36 computers testing. Majority were in 1 room at a single 16 ampere @ 240 volt circuit. That would be impossible nowadays.

Just the gpu would already eat too much nowadays

Most machines are well over 220-240 watt now and the overclocked i7's are soon a 400 watt or so.

All that hardware from mid and end 90s is far under a 100 watt. The first box i had that was over a 100 watt, to be precise 134 watt it says over here, was a dual k7. Just the Q6600 on its own already is eating 134 watt however...
OMG what nonsense! My Q6600 uses 95 watts stock speed. My i7 Sandy Bridge also uses 95 watts stock speed.

My GPU in the i7 machine can hit 210 watts at full throttle but uses little while idling.

Point, modern machines use less power per clock cycle than old machines.

Can't speak for AMD. They're crap.

http://www.lostcircuits.com/mambo//inde ... mitstart=9

This is measured at the chip itself. So it doesn't include the memory controller which is off-die which you need for the Q6600 which the i7 doesn't need as it is on chip and it's also excluding the huge losses of the psu and mainboard.

QX6700 in this table, not overclocked, is 114.8 watt, this is measured at the chip itself, excluding all those losses, as the Q6600/Q6700 has an off-die memory controller as we know.

Your 95 watt is total crap claim.

Measurements outside of the chip itself give it 134 watt at accurate measurement.

The top gpu's are around 450 to 500 watt usage. On paper pci-e specs give a max of 300 watt and the manufacturers usually rate gpu's under 300 watt, meanwhile eating a 100 watt more or so.

Both nvidia as well as AMD rate their current top gpu around 375 watt. You can safely add 100 watt to that effectively; note that 375 watt is far over official pci-e specs which support a max of 300.

Effectively the Q6600 if you add mainboard and turn off videocard and boot it and have all 4 cores active, you'll not be able to eat less than 160-166 watt a machine, depending upon which mainboard you'll use, including very efficient PSU losses. Note this is without using a videocard and a minimum of fans in the machine.

Again this is all non-overclocked cpu's, once you overclock power usage skyrockets with hundreds of extra watts of course.

Vincent

p.s. you clearly see the problem AMD has with bulldozer at this lostcircuits diagram. 115 watt or so just cpu usage excluding psu/mainboard losses, that's huge usage. bulldozer might bankrupt AMD.

p.s.2 the reason the Q6600 series is eating so much more juice than any other intel core series is because it is 2 cpu's glued together in 1 package.
So it's in fact 2 core2 cpu's. Power usage also reflects that.

diep · Post by **diep** » Sun Dec 11, 2011 7:49 pm

Terry McCracken wrote:
..snip...

Point, modern machines use less power per clock cycle than old machines.
...snip...

Above is the most stupid statement i've seen someone do the past 25 years.

Because it means that for every increase of factor 2 in transistors you're willing to pay 2x more for power. That means that 20 years from now you'll accept it that your computer eats a megawatt of power, provided that it is 1000x faster than todays computer?

Are you realizing the total stupidity of your statement?

I definitely don't like the increase in power usage of modern processors and really do not encourage manufacturers to eat more. Yet it seems that's what the average household seems to accepts.

For running several computers at home, the increased power usage really is a problem. I remember how in 2001, Jan Louwman was testing Diep at 36 computers... ...today that would be a problem, as even the simplest quadcore box right now that's a good performer is eating a 220+ watt in juice...

Note that the i7-gulftown sixcores which dominate so much in computerchess now online, and not your sandy bridge quadcore as that's 50% slower of course than a gulftown, overclocked the machines eat a watt or 400.

At tomshardware it's in fact even i7-2600k, so just a quadcore, is peaking over 500 watt, this at an intel fanboy site...

http://www.tomshardware.com/reviews/san ... 50-13.html

Vincent

Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts