Profile multithreaded engine

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Profile multithreaded engine

Post by Fabio Gobbato »

How do you profile your engine in multithreading mode?
I'm usual to develop and test all under linux and gprof with multithreaded application doesn't help.
I've read something about oprofile but the documentation seems too problematic and without examples. Does anyone use it?
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Profile multithreaded engine

Post by jdart »

I have used oprofile. I don't know what's hard about it. There is a "cheat sheet" here: http://oprofile.sourceforge.net/docs/ with basic instructions.

If you want something with a better GUI try Intel Parallel Studio (free download for non-commercial use - see https://software.intel.com/en-us/qualif ... ontributor). It only works on Intel CPUs and works best on recent ones (post Sandy Bridge, if I remember right).

The problem with both these tools is that eventually you need to understand quite a bit about the internal CPU and system architecture and the associated performance counters to get the most out of them.

--Jon
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Profile multithreaded engine

Post by bob »

Fabio Gobbato wrote:How do you profile your engine in multithreading mode?
I'm usual to develop and test all under linux and gprof with multithreaded application doesn't help.
I've read something about oprofile but the documentation seems too problematic and without examples. Does anyone use it?
I use Intel Vtune. I'm not sure there is a freebie of that, as we had to pay for a license to use it. But it is REALLY good at this stuff, and it is an adaptive system so it samples most everything, then samples more when it sees a potential problem.
petero2
Posts: 684
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Profile multithreaded engine

Post by petero2 »

Fabio Gobbato wrote:How do you profile your engine in multithreading mode?
I'm usual to develop and test all under linux and gprof with multithreaded application doesn't help.
I've read something about oprofile but the documentation seems too problematic and without examples. Does anyone use it?
I sometimes use perf. "perf top" is quite good when you want to get a quick idea of where your bottlenecks are. It works similarly to "top" but on a function or instruction level instead of top's process or thread level.

I suspect it is not as good as VTune though. For my engine "perf top" reports that a lot of time is spent on the instruction after my transposition table prefetch instruction. This seems odd given that the whole point of the prefetch instruction is to run in the background, and the instruction after the prefetch is a branch instruction.

If you have NUMA hardware, numatop can be used to analyze remote and local memory accesses.
User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Profile multithreaded engine

Post by Fabio Gobbato »

With oprofile how do you get a similar output to gprof for a multithreaded application?
What are the commands? I tried a little but with very few results.

Thank you
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Profile multithreaded engine

Post by jdart »

You are not saying what you want to do and see in the output. There are a lot of options.

The default output with opreport --symbols is similar to gprof's "flat" output. If you want callgraphs, add -c. If you want line-level information add -g.

If you are not getting this output, something is wrong with your execution of operf.

Note also: by default operf measures execution time. There are a lot of other processor events you can measure, for example TLB and L2 cache misses, indicating memory access bottlenecks (see http://oprofile.sourceforge.net/doc/eventspec.html).


--Jon
User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Profile multithreaded engine

Post by Fabio Gobbato »

The idea is to have an output like this:

Code: Select all

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 39.50     15.73    15.73 11775988     0.00     0.00  Eval
  9.67     19.58     3.85 43976152     0.00     0.00  BestMove
  6.35     22.11     2.53 18357271     0.00     0.00  Make
  4.93     24.08     1.97 46645969     0.00     0.00  Illegal
  4.90     26.03     1.95     7625     0.00     0.01  Search
  4.86     27.96     1.94 35224965     0.00     0.00  TTLoad
  4.11     29.60     1.64 30778350     0.00     0.00  ExtractMove
  4.08     31.22     1.63 10846151     0.00     0.00  SEE
I would like to get something similar for the multithreaded search.
User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Profile multithreaded engine

Post by Fabio Gobbato »

I found it! Thank you!

Code: Select all

operf ./engine
opreport --symbols