Don wrote:I don't really know the exact details, but this is from the gcc manpage:
Well, easy enough to compile two times, "gcc -O2 -S" and "gcc -O2 -S -pg", and see what the difference is:
A small part of qperft:
Code: Select all
void board_init(char *b)
{ /* make an empty board surrounded by guard band of uncapturable pieces */
int i;
for(i= -1; i<0xBC; i++) b[i] = (i-0x22)&0x88 ? GUARD : DUMMY;
}
gcc -O2 -pg -S qperft.c:
Code: Select all
.globl board_init
.type board_init, @function
board_init:
pushl %ebp
movl %esp, %ebp
pushl %ebx
call mcount
movl $-1, %edx
movl $-32, %eax
movl 8(%ebp), %ebx
leal -1(%ebx), %ecx
jmp .L2
.p2align 4,,7
.p2align 3
.L5:
leal -34(%edx), %eax
andl $-120, %eax
cmpb $1, %al
sbbl %eax, %eax
andl $-65, %eax
subl $32, %eax
leal (%ebx,%edx), %ecx
.L2:
addl $1, %edx
cmpl $188, %edx
movb %al, (%ecx)
jne .L5
popl %ebx
popl %ebp
ret
.size board_init, .-board_init
gcc -O2 -S qperft.c:
Code: Select all
.globl board_init
.type board_init, @function
board_init:
pushl %ebp
movl $-1, %edx
movl %esp, %ebp
movl $-32, %eax
pushl %ebx
movl 8(%ebp), %ebx
leal -1(%ebx), %ecx
jmp .L2
.p2align 4,,7
.p2align 3
.L5:
leal -34(%edx), %eax
andl $-120, %eax
cmpb $1, %al
sbbl %eax, %eax
andl $-65, %eax
subl $32, %eax
leal (%ebx,%edx), %ecx
.L2:
addl $1, %edx
cmpl $188, %edx
movb %al, (%ecx)
jne .L5
popl %ebx
popl %ebp
ret
.size board_init, .-board_init
A "diff -u" between the two
Code: Select all
@@ -5,11 +5,10 @@
.type board_init, @function
board_init:
pushl %ebp
- movl %esp, %ebp
- pushl %ebx
- call mcount
movl $-1, %edx
+ movl %esp, %ebp
movl $-32, %eax
+ pushl %ebx
movl 8(%ebp), %ebx
leal -1(%ebx), %ecx
jmp .L2
You can see the only extra code is the "call mcount" in the entry code of the routine, and a slight re-ordering of the instructions around it. The actual body of the routine is not touched at all.
So I don't think your skepticism towards profiling is justified. Of course there will be statistical noise, as the timer interrupt samples the code only 100 times/sec (i.e.0.01 sec/sample). So you should expect sqrt(N) noise, and the results only get 1% accurate when you spent 100 sec in the routine. (And even then, for programs with long-term very regular behavior, such as matrix inversions, the samples might not be independent.) I spent 29 sec in SEE, though:
Code: Select all
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
43.89 102.30 102.30 110301449 0.00 0.00 Evaluate
32.30 177.60 75.30 8 9.41 29.13 Search
12.59 206.94 29.34 196194872 0.00 0.00 SEE
7.31 223.99 17.05 35669452 0.00 0.00 capt_gen
2.71 230.31 6.32 8159522 0.00 0.00 move_gen
1.16 233.02 2.71 31231960 0.00 0.00 capturable
0.02 233.06 0.04 200272 0.00 0.00 make_list
0.01 233.08 0.02 135676 0.00 0.00 Attacked
0.01 233.10 0.02 main
The mcount routine probably does not only count based on the address of the mcount call (i.e. the return address on the stack), but also digs deeper in the stack to figure out who called the routine that is now counted, because gprof gives me that info too, afterwards.