AMD Phenom Hex core (SMP performance problem)

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
michiguel
Posts: 6388
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

AMD Phenom Hex core (SMP performance problem)

Post by michiguel » Mon Apr 04, 2011 9:50 pm

Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine it's a software problem... or is it?

Miguel

rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 1:48 pm

Re: AMD Phenom Hex core (SMP performance problem)

Post by rbarreira » Mon Apr 04, 2011 9:57 pm

michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine is a software problem... or is it?

Miguel
Does your motherboard support the Phenom II X6? I had to update my BIOS to get it working correctly. As long as all the cores are detected by the OS, you should be fine.

I don't see any such problem with my CPU (Phenom II X6 1055T). NPS with six cores is not 6x, but that's to be expected with other bottlenecks, but with two cores the speed-up is certainly close to 2x. The turbo mode should be in use as long as <= 3 cores are being used, so turbo shouldn't be the reason for what you're seeing.

How can I run your test here to compare?

User avatar
michiguel
Posts: 6388
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Re: AMD Phenom Hex core (SMP performance problem)

Post by michiguel » Mon Apr 04, 2011 10:12 pm

rbarreira wrote:
michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine is a software problem... or is it?

Miguel
Does your motherboard support the Phenom II X6? I had to update my BIOS to get it working correctly. As long as all the cores are detected by the OS, you should be fine.
When I run all 6 threads I see of cores being detected and busy when I run it for 2-3 minutes, so I think that the cores are detected.

I don't see any such problem with my CPU (Phenom II X6 1055T). NPS with six cores is not 6x, but that's to be expected with other bottlenecks, but with two cores the speed-up is certainly close to 2x. The turbo mode should be in use as long as <= 3 cores are being used, so turbo shouldn't be the reason for what you're seeing.

How can I run your test here to compare?
Thanks, that would be fantastic. Tonight when I get home I will post a binary for you to download.

Linux? Windows? 32 or 64 bits?

I observed the problems with Linux 64.
Miguel

rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 1:48 pm

Re: AMD Phenom Hex core (SMP performance problem)

Post by rbarreira » Mon Apr 04, 2011 10:17 pm

Linux 64.

User avatar
marcelk
Posts: 348
Joined: Fri Feb 26, 2010 11:21 pm
Contact:

Re: AMD Phenom Hex core (SMP performance problem)

Post by marcelk » Mon Apr 04, 2011 10:23 pm

michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine it's a software problem... or is it?

Miguel
I had terrible speedups with my program until I disabled the Cool'n'Quiet in BIOS.

What this function does is throttle down a core when it is not being used. If your processes/threads are idling when there is no work for them this will hurt because there is a delay between throttle down and ramping up. This was the case with my program. It was effectively operating at the throttle down speed all the time.

As a bonus, when you disable the Cool'n'quiet, each core gets locked at its "turbo core" speed. (which is 3.6GHz for the 1090T and 3.7GHz for the 1100T). This is a slight overclock because not all 6 cores are supposed to run at this speed simultaneously, but I had no problems with it running 24/7 for a few months now.

bob
Posts: 20636
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: AMD Phenom Hex core (SMP performance problem)

Post by bob » Tue Apr 05, 2011 4:52 am

michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine it's a software problem... or is it?

Miguel
It is possibly a cache issue. You have to be _very_ careful what is shared. Remember that memory if fetched in 64 byte blocks. If you have two adjacent 4-byte or 8-byte values, each being updated by a different thread, your goose is cooked. That is sometimes called "false sharing". The caches transfer that block back and forth between the cores, killing performance...

As a simple test, run two separate instances of your program in two different windows and see if each runs at its normal nps. If so, then you likely have a cache issue as above. If not, then you have something going on with the hardware settings...

Make sure you are not using the Intel compiler if you are running on AMD of course.

Joost Buijs
Posts: 987
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

Re: AMD Phenom Hex core (SMP performance problem)

Post by Joost Buijs » Tue Apr 05, 2011 4:58 am

marcelk wrote: I had terrible speedups with my program until I disabled the Cool'n'Quiet in BIOS.
In fact there is a 'Turbo Core' compatibility problem with older Linux kernels, look at - for instance - the article here: http://www.h-online.com/open/news/item/ ... 93127.html

Joost Buijs
Posts: 987
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

Re: AMD Phenom Hex core (SMP performance problem)

Post by Joost Buijs » Tue Apr 05, 2011 5:33 am

bob wrote: It is possibly a cache issue. You have to be _very_ careful what is shared. Remember that memory if fetched in 64 byte blocks. If you have two adjacent 4-byte or 8-byte values, each being updated by a different thread, your goose is cooked.
I had cache thrashing when I first moved from 2 to many threads.
In my case the distance between the board structs in a splitpoint was too small.

The speedup on 6 cores for my program is now around 4 to 4.75 depending upon the position. And there is still room for improvement because I didn't implement 'the helpful master concept' yet.

User avatar
michiguel
Posts: 6388
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Re: AMD Phenom Hex core (SMP performance problem)

Post by michiguel » Tue Apr 05, 2011 6:28 am

Joost Buijs wrote:
marcelk wrote: I had terrible speedups with my program until I disabled the Cool'n'Quiet in BIOS.
In fact there is a 'Turbo Core' compatibility problem with older Linux kernels, look at - for instance - the article here: http://www.h-online.com/open/news/item/ ... 93127.html
Thanks Joost and Marcel, it looks like something like this was the problem!

I disable both turbocore and cool n quiet and now I have a speed up of 1.7 using two threads, similar to what I have in my dual.

Probably I do not need to disable turbocore (as what Marcel seems to indicate) but this was my first test. I had the newest kernel, so that should not be a problem. I had the gut feeling that this may happens with programs that uses mutexes or semaphores that put the cores to rest and wake them up.

This was driving me nuts. THANKS!

Miguel

User avatar
michiguel
Posts: 6388
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Re: AMD Phenom Hex core (SMP performance problem)

Post by michiguel » Tue Apr 05, 2011 6:31 am

bob wrote:
michiguel wrote:Gaviota has a speed up of ~1.7 (both nps and time to ply, roughly) in an AMD dual running two threads. I tested it in a AMD hexacore 1090T and the speed up (running two threads to make it comparable) is not more than 1.2x in nodes per second. Awful.
Anybody has any idea why this could be possible? What did I set up wrong with the hardware? Any hint? I cannot imagine it's a software problem... or is it?

Miguel
It is possibly a cache issue. You have to be _very_ careful what is shared. Remember that memory if fetched in 64 byte blocks. If you have two adjacent 4-byte or 8-byte values, each being updated by a different thread, your goose is cooked. That is sometimes called "false sharing". The caches transfer that block back and forth between the cores, killing performance...

As a simple test, run two separate instances of your program in two different windows and see if each runs at its normal nps. If so, then you likely have a cache issue as above. If not, then you have something going on with the hardware settings...

Make sure you are not using the Intel compiler if you are running on AMD of course.
I buillt a version that almost did not save anything in memory (counters, hashtable, killers etc etc) and still had the problem. I used gcc. I also run two instances and they were fine (not 100% but close enough). Apparently, it was this cool n quiet thing...

Miguel

Post Reply