Profile multithreaded engine

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
Fabio Gobbato
Posts: 132
Joined: Fri Apr 11, 2014 8:45 am
Contact:

Profile multithreaded engine

Post by Fabio Gobbato » Sat Apr 09, 2016 1:00 pm

How do you profile your engine in multithreading mode?
I'm usual to develop and test all under linux and gprof with multithreaded application doesn't help.
I've read something about oprofile but the documentation seems too problematic and without examples. Does anyone use it?

jdart
Posts: 3843
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Profile multithreaded engine

Post by jdart » Sat Apr 09, 2016 2:33 pm

I have used oprofile. I don't know what's hard about it. There is a "cheat sheet" here: http://oprofile.sourceforge.net/docs/ with basic instructions.

If you want something with a better GUI try Intel Parallel Studio (free download for non-commercial use - see https://software.intel.com/en-us/qualif ... ontributor). It only works on Intel CPUs and works best on recent ones (post Sandy Bridge, if I remember right).

The problem with both these tools is that eventually you need to understand quite a bit about the internal CPU and system architecture and the associated performance counters to get the most out of them.

--Jon

bob
Posts: 20643
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Profile multithreaded engine

Post by bob » Sat Apr 09, 2016 4:58 pm

Fabio Gobbato wrote:How do you profile your engine in multithreading mode?
I'm usual to develop and test all under linux and gprof with multithreaded application doesn't help.
I've read something about oprofile but the documentation seems too problematic and without examples. Does anyone use it?
I use Intel Vtune. I'm not sure there is a freebie of that, as we had to pay for a license to use it. But it is REALLY good at this stuff, and it is an adaptive system so it samples most everything, then samples more when it sees a potential problem.

petero2
Posts: 587
Joined: Mon Apr 19, 2010 5:07 pm
Location: Sweden
Contact:

Re: Profile multithreaded engine

Post by petero2 » Sat Apr 09, 2016 10:37 pm

Fabio Gobbato wrote:How do you profile your engine in multithreading mode?
I'm usual to develop and test all under linux and gprof with multithreaded application doesn't help.
I've read something about oprofile but the documentation seems too problematic and without examples. Does anyone use it?
I sometimes use perf. "perf top" is quite good when you want to get a quick idea of where your bottlenecks are. It works similarly to "top" but on a function or instruction level instead of top's process or thread level.

I suspect it is not as good as VTune though. For my engine "perf top" reports that a lot of time is spent on the instruction after my transposition table prefetch instruction. This seems odd given that the whole point of the prefetch instruction is to run in the background, and the instruction after the prefetch is a branch instruction.

If you have NUMA hardware, numatop can be used to analyze remote and local memory accesses.

User avatar
Fabio Gobbato
Posts: 132
Joined: Fri Apr 11, 2014 8:45 am
Contact:

Re: Profile multithreaded engine

Post by Fabio Gobbato » Sun Apr 10, 2016 6:35 am

With oprofile how do you get a similar output to gprof for a multithreaded application?
What are the commands? I tried a little but with very few results.

Thank you

jdart
Posts: 3843
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Profile multithreaded engine

Post by jdart » Sun Apr 10, 2016 1:55 pm

You are not saying what you want to do and see in the output. There are a lot of options.

The default output with opreport --symbols is similar to gprof's "flat" output. If you want callgraphs, add -c. If you want line-level information add -g.

If you are not getting this output, something is wrong with your execution of operf.

Note also: by default operf measures execution time. There are a lot of other processor events you can measure, for example TLB and L2 cache misses, indicating memory access bottlenecks (see http://oprofile.sourceforge.net/doc/eventspec.html).


--Jon

User avatar
Fabio Gobbato
Posts: 132
Joined: Fri Apr 11, 2014 8:45 am
Contact:

Re: Profile multithreaded engine

Post by Fabio Gobbato » Sun Apr 10, 2016 7:07 pm

The idea is to have an output like this:

Code: Select all

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 39.50     15.73    15.73 11775988     0.00     0.00  Eval
  9.67     19.58     3.85 43976152     0.00     0.00  BestMove
  6.35     22.11     2.53 18357271     0.00     0.00  Make
  4.93     24.08     1.97 46645969     0.00     0.00  Illegal
  4.90     26.03     1.95     7625     0.00     0.01  Search
  4.86     27.96     1.94 35224965     0.00     0.00  TTLoad
  4.11     29.60     1.64 30778350     0.00     0.00  ExtractMove
  4.08     31.22     1.63 10846151     0.00     0.00  SEE
I would like to get something similar for the multithreaded search.

User avatar
Fabio Gobbato
Posts: 132
Joined: Fri Apr 11, 2014 8:45 am
Contact:

Re: Profile multithreaded engine

Post by Fabio Gobbato » Sun Apr 10, 2016 7:20 pm

I found it! Thank you!

Code: Select all

operf ./engine
opreport --symbols

Post Reply