How do you profile your engine in multithreading mode?
I'm usual to develop and test all under linux and gprof with multithreaded application doesn't help.
I've read something about oprofile but the documentation seems too problematic and without examples. Does anyone use it?
Profile multithreaded engine
Moderators: hgm, Rebel, chrisw
-
- Posts: 217
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Profile multithreaded engine
I have used oprofile. I don't know what's hard about it. There is a "cheat sheet" here: http://oprofile.sourceforge.net/docs/ with basic instructions.
If you want something with a better GUI try Intel Parallel Studio (free download for non-commercial use - see https://software.intel.com/en-us/qualif ... ontributor). It only works on Intel CPUs and works best on recent ones (post Sandy Bridge, if I remember right).
The problem with both these tools is that eventually you need to understand quite a bit about the internal CPU and system architecture and the associated performance counters to get the most out of them.
--Jon
If you want something with a better GUI try Intel Parallel Studio (free download for non-commercial use - see https://software.intel.com/en-us/qualif ... ontributor). It only works on Intel CPUs and works best on recent ones (post Sandy Bridge, if I remember right).
The problem with both these tools is that eventually you need to understand quite a bit about the internal CPU and system architecture and the associated performance counters to get the most out of them.
--Jon
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Profile multithreaded engine
I use Intel Vtune. I'm not sure there is a freebie of that, as we had to pay for a license to use it. But it is REALLY good at this stuff, and it is an adaptive system so it samples most everything, then samples more when it sees a potential problem.Fabio Gobbato wrote:How do you profile your engine in multithreading mode?
I'm usual to develop and test all under linux and gprof with multithreaded application doesn't help.
I've read something about oprofile but the documentation seems too problematic and without examples. Does anyone use it?
-
- Posts: 690
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: Profile multithreaded engine
I sometimes use perf. "perf top" is quite good when you want to get a quick idea of where your bottlenecks are. It works similarly to "top" but on a function or instruction level instead of top's process or thread level.Fabio Gobbato wrote:How do you profile your engine in multithreading mode?
I'm usual to develop and test all under linux and gprof with multithreaded application doesn't help.
I've read something about oprofile but the documentation seems too problematic and without examples. Does anyone use it?
I suspect it is not as good as VTune though. For my engine "perf top" reports that a lot of time is spent on the instruction after my transposition table prefetch instruction. This seems odd given that the whole point of the prefetch instruction is to run in the background, and the instruction after the prefetch is a branch instruction.
If you have NUMA hardware, numatop can be used to analyze remote and local memory accesses.
-
- Posts: 217
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
Re: Profile multithreaded engine
With oprofile how do you get a similar output to gprof for a multithreaded application?
What are the commands? I tried a little but with very few results.
Thank you
What are the commands? I tried a little but with very few results.
Thank you
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Profile multithreaded engine
You are not saying what you want to do and see in the output. There are a lot of options.
The default output with opreport --symbols is similar to gprof's "flat" output. If you want callgraphs, add -c. If you want line-level information add -g.
If you are not getting this output, something is wrong with your execution of operf.
Note also: by default operf measures execution time. There are a lot of other processor events you can measure, for example TLB and L2 cache misses, indicating memory access bottlenecks (see http://oprofile.sourceforge.net/doc/eventspec.html).
--Jon
The default output with opreport --symbols is similar to gprof's "flat" output. If you want callgraphs, add -c. If you want line-level information add -g.
If you are not getting this output, something is wrong with your execution of operf.
Note also: by default operf measures execution time. There are a lot of other processor events you can measure, for example TLB and L2 cache misses, indicating memory access bottlenecks (see http://oprofile.sourceforge.net/doc/eventspec.html).
--Jon
-
- Posts: 217
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
Re: Profile multithreaded engine
The idea is to have an output like this:
I would like to get something similar for the multithreaded search.
Code: Select all
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
39.50 15.73 15.73 11775988 0.00 0.00 Eval
9.67 19.58 3.85 43976152 0.00 0.00 BestMove
6.35 22.11 2.53 18357271 0.00 0.00 Make
4.93 24.08 1.97 46645969 0.00 0.00 Illegal
4.90 26.03 1.95 7625 0.00 0.01 Search
4.86 27.96 1.94 35224965 0.00 0.00 TTLoad
4.11 29.60 1.64 30778350 0.00 0.00 ExtractMove
4.08 31.22 1.63 10846151 0.00 0.00 SEE
-
- Posts: 217
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
Re: Profile multithreaded engine
I found it! Thank you!
Code: Select all
operf ./engine
opreport --symbols