Ivy Bridge vs Sandy Bridge for computer chess

lkaufman · Post by **lkaufman** » Sat Sep 15, 2012 6:17 pm

As I mention in the matches section, I observed a marked improvement in relative nodes per second for Komodo compared to other top engines when going from a Sandy Bridge 12 core machine to an Ivy Bridge 16 core machine. Both tests were run using Ubuntu Linux, with the 16 core machine using a newer version of Ubuntu.
I don't know for sure whether the new hardware or the new Ubuntu version is the cause of this improvement, but it seems far more likely to me that it is the hardware. Does anyone have an opinion on this?
Assuming that it is the new Ivy Bridge technology that accounts for the difference, can anyone suggest an explanation as to why one engine (Komodo) would benefit much more from it than other closely rated engines? To give an idea of the magnitude of what I am talking about, I have read that the Ivy Bridge machines are supposed to run from 5-15% faster than Sandy Bridge machines at the same GHz. In round numbers, it appears than Komodo is getting close to the top end of this range, while other top engines are near the bottom of it. Considering that differences about chess engines are small compared to differences between chess engines and other software, I find this remarkable and puzzling. What could account for it?
Thanks in advance for your replies.

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Sep 15, 2012 7:27 pm

lkaufman wrote:As I mention in the matches section, I observed a marked improvement in relative nodes per second for Komodo compared to other top engines when going from a Sandy Bridge 12 core machine to an Ivy Bridge 16 core machine. Both tests were run using Ubuntu Linux, with the 16 core machine using a newer version of Ubuntu.
I don't know for sure whether the new hardware or the new Ubuntu version is the cause of this improvement, but it seems far more likely to me that it is the hardware. Does anyone have an opinion on this?
Assuming that it is the new Ivy Bridge technology that accounts for the difference, can anyone suggest an explanation as to why one engine (Komodo) would benefit much more from it than other closely rated engines? To give an idea of the magnitude of what I am talking about, I have read that the Ivy Bridge machines are supposed to run from 5-15% faster than Sandy Bridge machines at the same GHz. In round numbers, it appears than Komodo is getting close to the top end of this range, while other top engines are near the bottom of it. Considering that differences about chess engines are small compared to differences between chess engines and other software, I find this remarkable and puzzling. What could account for it?
Thanks in advance for your replies.

May be Komodo does a lot of divions/modulo and profits from the 2x throughput compared to Sandy Bridge

Possibly for whatever reasons, i.e. copy-make approach, the balance of mov instructions to alu-instructions in Komodo takes more advantage from mov no longer occupy an execution port, with potential for improved Instruction-level parallelism (ILP).

http://www.anandtech.com/show/5626/ivy- ... i7-3770k/2

lkaufman · Post by **lkaufman** » Sun Sep 16, 2012 5:23 am

Thanks. It now appears that the large change in relative speed of Komodo vs. other engines may not be due to the Ivy Bridge technology, but rather it seems that for some unknown reason Komodo's relative speed is lower on my older twelve core machine than on all other machines checked, regardless of Sandy Bridge or Ivy Bridge. I should mention that we are running one match per thread, so that on a normal i7 we run 8 matches, on the twelve core we run 24, and on the 16 core we run 32. Somehow, Komodo slows down much more than other engines when we do this on the Sandy Bridge 12 core machine, though even if we only run one match per core the effect is still substantial. Running just a single game, the problem vanishes.
So my revised question is, why would running many matches at once hurt Komodo's speed more than other engines on a two-processor, 12 core Sandy Bridge machine, as compared to either single processor systems or to a two-processor, 16 core Ivy Bridge machine? This doesn't make any sense to me.

F. Bluemers · Post by **F. Bluemers** » Sun Sep 16, 2012 10:12 am

lkaufman wrote:Thanks. It now appears that the large change in relative speed of Komodo vs. other engines may not be due to the Ivy Bridge technology, but rather it seems that for some unknown reason Komodo's relative speed is lower on my older twelve core machine than on all other machines checked, regardless of Sandy Bridge or Ivy Bridge. I should mention that we are running one match per thread, so that on a normal i7 we run 8 matches, on the twelve core we run 24, and on the 16 core we run 32. Somehow, Komodo slows down much more than other engines when we do this on the Sandy Bridge 12 core machine, though even if we only run one match per core the effect is still substantial. Running just a single game, the problem vanishes.
So my revised question is, why would running many matches at once hurt Komodo's speed more than other engines on a two-processor, 12 core Sandy Bridge machine, as compared to either single processor systems or to a two-processor, 16 core Ivy Bridge machine? This doesn't make any sense to me.

Well,1 match on a 32 core machine in a bit on the lean side of things

.
Did you try core-1 matches on the machines?
That would leave a bit of "room" for others things (os,gui etc).

sje · Post by **sje** » Sun Sep 16, 2012 11:34 am

As I've read, the main delta from Sandy to Ivy is a better calculation/watt efficiency and better integrated graphics. I don't see much, if any, difference in chess playing strength.

syzygy · Post by **syzygy** » Sun Sep 16, 2012 3:00 pm

lkaufman wrote:So my revised question is, why would running many matches at once hurt Komodo's speed more than other engines on a two-processor, 12 core Sandy Bridge machine, as compared to either single processor systems or to a two-processor, 16 core Ivy Bridge machine? This doesn't make any sense to me.

What kind of machines are these exactly? Dual socket 6-core SB Xeon and a dual socket 8-core IB Xeon? I did not know that Intel has released 8-core IB Xeons.

There are 4-core IB Xeons, but they do not support dual socket. Regular IB does not support dual socket, either.

diep · Post by **diep** » Sun Sep 16, 2012 5:05 pm

lkaufman wrote:Thanks. It now appears that the large change in relative speed of Komodo vs. other engines may not be due to the Ivy Bridge technology, but rather it seems that for some unknown reason Komodo's relative speed is lower on my older twelve core machine than on all other machines checked, regardless of Sandy Bridge or Ivy Bridge. I should mention that we are running one match per thread, so that on a normal i7 we run 8 matches, on the twelve core we run 24, and on the 16 core we run 32. Somehow, Komodo slows down much more than other engines when we do this on the Sandy Bridge 12 core machine, though even if we only run one match per core the effect is still substantial. Running just a single game, the problem vanishes.
So my revised question is, why would running many matches at once hurt Komodo's speed more than other engines on a two-processor, 12 core Sandy Bridge machine, as compared to either single processor systems or to a two-processor, 16 core Ivy Bridge machine? This doesn't make any sense to me.

Are they all running the same operating system, so all linux or all windows and in case both are windows which version runs on which?

how many processes do you run at the same time. A match is 2 processes so 1 program playing another?

So 12 matches run 24 processes?

How many physical cores does each machine have and how many logical cores are enabled?

You can turn off hyperthreading - some HPC centers turn off hyperthreading on newer machines with many cores.

So are we comparing the same things here?

Maybe you ran a machine with hyperthreading, so 12 cores @ 12 logical cores and compared with 16 physical cores @ 32 logical cores.

Figuring this out is important.

Further important is the RAM. I see so many companies that sell hardware put in total junk RAM into machines. Usually clocked the minimum what machines can handle.

For chess this is a big difference if you run that many matches at the same time.

Add to that, that most profilers do not factor in the RAM and have even the system time spent in functions wrong as they didn't factor in when which function gives a cachemiss. Intels Vtune suffers relative little from this phenomena.

Which types of RAM does each machine have?

A simple way to find out is run a testprogram to benchmark the RAM in parallel. So at all cores at the same time.

The only test on the planet i know that's doing this, as 99.9% of them run only at 1 single core, is one i wrote. If you give me an email i can email it to you.

I wrote it to benchmark on the supercomputer.

Now another big problem is when the machine is located in a HPC center.

What happens there is that a machine has huge RAM and that with the chessprogram you eat relative little of it.

So they also give some other user a big part of the RAM. That will screw your bandwidth. The above RAM test would notice this directly (at the moment that this happens - not if it happens later on).

Todays buzzword for screwing you that way on a machine is called virtualization. Commercial parties are world champion in doing this.

(Did the name Amazon echo in the corridor?)

In case it's a government HPC center I assume you don't have admin access rights but Don with some ps type commands might be able to see this when it happens. You can easily hack all those linux HPC machines. They run always stable solid kernels with everything enabled including selfhacking.

The problem in HPC centers is that you usually do not have the garantuee that you exclusively run on a given node. So as soon as a few cores seem idle, it will schedule other jobs there as well.

lkaufman · Post by **lkaufman** » Sun Sep 16, 2012 6:28 pm

F. Bluemers wrote:
lkaufman wrote:Thanks. It now appears that the large change in relative speed of Komodo vs. other engines may not be due to the Ivy Bridge technology, but rather it seems that for some unknown reason Komodo's relative speed is lower on my older twelve core machine than on all other machines checked, regardless of Sandy Bridge or Ivy Bridge. I should mention that we are running one match per thread, so that on a normal i7 we run 8 matches, on the twelve core we run 24, and on the 16 core we run 32. Somehow, Komodo slows down much more than other engines when we do this on the Sandy Bridge 12 core machine, though even if we only run one match per core the effect is still substantial. Running just a single game, the problem vanishes.
So my revised question is, why would running many matches at once hurt Komodo's speed more than other engines on a two-processor, 12 core Sandy Bridge machine, as compared to either single processor systems or to a two-processor, 16 core Ivy Bridge machine? This doesn't make any sense to me.
Well,1 match on a 32 core machine in a bit on the lean side of things .
Did you try core-1 matches on the machines?
That would leave a bit of "room" for others things (os,gui etc).

No, but if we do one match per thread (rather than per core) I don't know if it would help anything to leave one thread for other things, would it? In any case the effect of using 31 rather than 32 threads would seem to be minor.

lkaufman · Post by **lkaufman** » Sun Sep 16, 2012 6:30 pm

syzygy wrote:
lkaufman wrote:So my revised question is, why would running many matches at once hurt Komodo's speed more than other engines on a two-processor, 12 core Sandy Bridge machine, as compared to either single processor systems or to a two-processor, 16 core Ivy Bridge machine? This doesn't make any sense to me.
What kind of machines are these exactly? Dual socket 6-core SB Xeon and a dual socket 8-core IB Xeon? I did not know that Intel has released 8-core IB Xeons.

There are 4-core IB Xeons, but they do not support dual socket. Regular IB does not support dual socket, either.

They are dual socket 6 and 8 core Xeon machines. The 8 cores have been out for some months. I bought mine from JNCS computers, if you want details see their website. My 16 core has a base speed of 2.6 GHz.

lkaufman · Post by **lkaufman** » Sun Sep 16, 2012 6:43 pm

diep wrote: Are they all running the same operating system, so all linux or all windows and in case both are windows which version runs on which?

how many processes do you run at the same time. A match is 2 processes so 1 program playing another?

So 12 matches run 24 processes?

How many physical cores does each machine have and how many logical cores are enabled?

You can turn off hyperthreading - some HPC centers turn off hyperthreading on newer machines with many cores.

So are we comparing the same things here?

Maybe you ran a machine with hyperthreading, so 12 cores @ 12 logical cores and compared with 16 physical cores @ 32 logical cores.

Figuring this out is important.

Further important is the RAM. I see so many companies that sell hardware put in total junk RAM into machines. Usually clocked the minimum what machines can handle.

For chess this is a big difference if you run that many matches at the same time.

Add to that, that most profilers do not factor in the RAM and have even the system time spent in functions wrong as they didn't factor in when which function gives a cachemiss. Intels Vtune suffers relative little from this phenomena.

Which types of RAM does each machine have?

A simple way to find out is run a testprogram to benchmark the RAM in parallel. So at all cores at the same time.

The only test on the planet i know that's doing this, as 99.9% of them run only at 1 single core, is one i wrote. If you give me an email i can email it to you.

I wrote it to benchmark on the supercomputer.

Now another big problem is when the machine is located in a HPC center.

What happens there is that a machine has huge RAM and that with the chessprogram you eat relative little of it.

So they also give some other user a big part of the RAM. That will screw your bandwidth. The above RAM test would notice this directly (at the moment that this happens - not if it happens later on).

Todays buzzword for screwing you that way on a machine is called virtualization. Commercial parties are world champion in doing this.

(Did the name Amazon echo in the corridor?)

In case it's a government HPC center I assume you don't have admin access rights but Don with some ps type commands might be able to see this when it happens. You can easily hack all those linux HPC machines. They run always stable solid kernels with everything enabled including selfhacking.

The problem in HPC centers is that you usually do not have the garantuee that you exclusively run on a given node. So as soon as a few cores seem idle, it will schedule other jobs there as well.

The computers are my own, in my own home, no other users or uses. The new one cost just over $5,000.
Both the twelve core and the 16 core have two xeon processors, each with six and eight cores respectively. Both were bought from the same company, one that uses quality parts. Both used the best RAM available at the time (excluding any hyper-expensive RAM). The 16 core is 1.5 years newer so presumably has somewhat better RAM. They all run Ubuntu Linux, each with the version that was out when the machine was made (so not the same). Normally we run one test per thread, so 24 tests on the 12 core and 32 on the 16 core. We keep hyperthreading on; it may not be helpful for an MP program but it is clearly helpful for multiple tests of a single core engine. The new machine is Ivy bridge, the old one Sandy bridge. Somehow, the new machine "likes" Komodo and the older one "likes" Houdini, Critter, Ivanhoe, and Stockfish more. My standard off the shelf i-7 acts more like the new machine, i.e. is more friendly to Komodo.
Can you think of anything that would account for this?

Ivy Bridge vs Sandy Bridge for computer chess

Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess