threading

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

flok

threading

Post by flok »

I wondered why adding more threads/cores to my chess program was so far away from linear speed increase.
So I added a thread which samples every 250ms the number of idle thread-slots (on a 6 core + ht system) and the amount of cpu time (user + sys) used in that slice.
In a nice graph:

Image

x-axis is sample number, starting 250ms after the search started

So yes there are some moments where there are one or more threads idle, but they are never longer than the 250ms interval (well maybe 499ms).
System overhead is also very low.
Conclusion: it must be a locking issue.
Next step: running it through mutrace and see which locks are holding things back.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: threading

Post by jdart »

250ms is actually a long time to have a thread idle. So I would not conclude your problem is lock performance. It sounds more like an algorithm issue.

I have found Oprofile (http://oprofile.sourceforge.net) helpful for measuring performance bottlenecks. Intel Parallel Studio is also very good but can be complex to use/understand.

--Jon
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: threading

Post by Joost Buijs »

I've never timed in my program how long threads can be idle, 250ms seems long but it can happen in YBW without helpful master when a master is ready and sits waiting for his slaves to finish.

The bad speedup you see can also be caused by different threads poking into the same cache-line, this is something I experienced in the past when I first started with SMP.
You have to make sure that the data structures for each thread are separated at least 1 cache-line (64 bytes on Intel i7) apart from each other.
flok

Re: threading

Post by flok »

Well, a thread may sit idle for 250ms. Because of the 250ms samplerate we don't know.
I'll give 100ms a try.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: threading

Post by jdart »

Tools like Oprofile can use monitoring features built into the CPU. This is much more efficient and accurate than your sampling method.

--Jon
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: threading

Post by mar »

Joost Buijs wrote:The bad speedup you see can also be caused by different threads poking into the same cache-line, this is something I experienced in the past when I first started with SMP.
You have to make sure that the data structures for each thread are separated at least 1 cache-line (64 bytes on Intel i7) apart from each other.
Yes, this is called false sharing and can totally kill performance,
but this issue doesn't arise if n threads are reading the same block of memory.
Writes are of course problematic because they invalidate all cachelines on other cores that point to same memory (assuming per-core caches).