AMD Phenom Hex core (SMP performance problem)

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

AMD Phenom Hex core (SMP performance problem)

Post by michiguel »

Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine it's a software problem... or is it?

Miguel
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: AMD Phenom Hex core (SMP performance problem)

Post by rbarreira »

michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine is a software problem... or is it?

Miguel
Does your motherboard support the Phenom II X6? I had to update my BIOS to get it working correctly. As long as all the cores are detected by the OS, you should be fine.

I don't see any such problem with my CPU (Phenom II X6 1055T). NPS with six cores is not 6x, but that's to be expected with other bottlenecks, but with two cores the speed-up is certainly close to 2x. The turbo mode should be in use as long as <= 3 cores are being used, so turbo shouldn't be the reason for what you're seeing.

How can I run your test here to compare?
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: AMD Phenom Hex core (SMP performance problem)

Post by michiguel »

rbarreira wrote:
michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine is a software problem... or is it?

Miguel
Does your motherboard support the Phenom II X6? I had to update my BIOS to get it working correctly. As long as all the cores are detected by the OS, you should be fine.
When I run all 6 threads I see of cores being detected and busy when I run it for 2-3 minutes, so I think that the cores are detected.

I don't see any such problem with my CPU (Phenom II X6 1055T). NPS with six cores is not 6x, but that's to be expected with other bottlenecks, but with two cores the speed-up is certainly close to 2x. The turbo mode should be in use as long as <= 3 cores are being used, so turbo shouldn't be the reason for what you're seeing.

How can I run your test here to compare?
Thanks, that would be fantastic. Tonight when I get home I will post a binary for you to download.

Linux? Windows? 32 or 64 bits?

I observed the problems with Linux 64.
Miguel
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: AMD Phenom Hex core (SMP performance problem)

Post by rbarreira »

Linux 64.
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: AMD Phenom Hex core (SMP performance problem)

Post by marcelk »

michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine it's a software problem... or is it?

Miguel
I had terrible speedups with my program until I disabled the Cool'n'Quiet in BIOS.

What this function does is throttle down a core when it is not being used. If your processes/threads are idling when there is no work for them this will hurt because there is a delay between throttle down and ramping up. This was the case with my program. It was effectively operating at the throttle down speed all the time.

As a bonus, when you disable the Cool'n'quiet, each core gets locked at its "turbo core" speed. (which is 3.6GHz for the 1090T and 3.7GHz for the 1100T). This is a slight overclock because not all 6 cores are supposed to run at this speed simultaneously, but I had no problems with it running 24/7 for a few months now.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: AMD Phenom Hex core (SMP performance problem)

Post by bob »

michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine it's a software problem... or is it?

Miguel
It is possibly a cache issue. You have to be _very_ careful what is shared. Remember that memory if fetched in 64 byte blocks. If you have two adjacent 4-byte or 8-byte values, each being updated by a different thread, your goose is cooked. That is sometimes called "false sharing". The caches transfer that block back and forth between the cores, killing performance...

As a simple test, run two separate instances of your program in two different windows and see if each runs at its normal nps. If so, then you likely have a cache issue as above. If not, then you have something going on with the hardware settings...

Make sure you are not using the Intel compiler if you are running on AMD of course.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: AMD Phenom Hex core (SMP performance problem)

Post by Joost Buijs »

marcelk wrote: I had terrible speedups with my program until I disabled the Cool'n'Quiet in BIOS.
In fact there is a 'Turbo Core' compatibility problem with older Linux kernels, look at - for instance - the article here: http://www.h-online.com/open/news/item/ ... 93127.html
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: AMD Phenom Hex core (SMP performance problem)

Post by Joost Buijs »

bob wrote: It is possibly a cache issue. You have to be _very_ careful what is shared. Remember that memory if fetched in 64 byte blocks. If you have two adjacent 4-byte or 8-byte values, each being updated by a different thread, your goose is cooked.
I had cache thrashing when I first moved from 2 to many threads.
In my case the distance between the board structs in a splitpoint was too small.

The speedup on 6 cores for my program is now around 4 to 4.75 depending upon the position. And there is still room for improvement because I didn't implement 'the helpful master concept' yet.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: AMD Phenom Hex core (SMP performance problem)

Post by michiguel »

Joost Buijs wrote:
marcelk wrote: I had terrible speedups with my program until I disabled the Cool'n'Quiet in BIOS.
In fact there is a 'Turbo Core' compatibility problem with older Linux kernels, look at - for instance - the article here: http://www.h-online.com/open/news/item/ ... 93127.html
Thanks Joost and Marcel, it looks like something like this was the problem!

I disable both turbocore and cool n quiet and now I have a speed up of 1.7 using two threads, similar to what I have in my dual.

Probably I do not need to disable turbocore (as what Marcel seems to indicate) but this was my first test. I had the newest kernel, so that should not be a problem. I had the gut feeling that this may happens with programs that uses mutexes or semaphores that put the cores to rest and wake them up.

This was driving me nuts. THANKS!

Miguel
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: AMD Phenom Hex core (SMP performance problem)

Post by michiguel »

bob wrote:
michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine it's a software problem... or is it?

Miguel
It is possibly a cache issue. You have to be _very_ careful what is shared. Remember that memory if fetched in 64 byte blocks. If you have two adjacent 4-byte or 8-byte values, each being updated by a different thread, your goose is cooked. That is sometimes called "false sharing". The caches transfer that block back and forth between the cores, killing performance...

As a simple test, run two separate instances of your program in two different windows and see if each runs at its normal nps. If so, then you likely have a cache issue as above. If not, then you have something going on with the hardware settings...

Make sure you are not using the Intel compiler if you are running on AMD of course.
I buillt a version that almost did not save anything in memory (counters, hashtable, killers etc etc) and still had the problem. I used gcc. I also run two instances and they were fine (not 100% but close enough). Apparently, it was this cool n quiet thing...

Miguel