Hi everybody,
I need to implement some pattern searching.
I use a list of bitboards to indentify the position and then
I use an array of base scores for each pattern.
Now the base score is not enough for each pattern, I need some additional scoring depending on several board conditions.
So I will need some scoring functions to complement the base score in the array.
I was thinking using functions pointers. So I would need a second array of functions pointers.
And for each pattern I assign a function pointer to a complement scoring function.
So the pattern score = base_score + complement_scoring_func(...)
My question is this: will functions pointers hurt performance of the cpu predictors?
In the loop through the patterns if a pattern is found I read the array of functions pointers and call it.
I still don't have the code to show.
thanks in advance.
Alvaro
Function pointers hurt performance?
Moderators: hgm, Rebel, chrisw
-
- Posts: 362
- Joined: Thu Mar 16, 2006 7:39 pm
- Location: Portugal
- Full name: Alvaro Cardoso
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Function pointers hurt performance?
This seems to me by far the slowest: to find a match looping across an array.Cardoso wrote:Hi everybody,
I use a list of bitboards to indentify the position
I'd guess you are looking definitely in the wrong direction regarding the source of possible slowdowns, anyhow I strongly suggest to use a profiler, even very experienced developers don't trust their intuition but use a profiler for optimizing.
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Function pointers hurt performance?
Function pointers mean an indirect call, like calling a virtual method. x86 chips can only predict these if its a monomorphic call site (always calling the same function at that spot in the code). They predict it to the same address it went to last time. If you look up the function pointer in some kind of table and the values vary, its going to mispredict a lot. I've heard that some older chip designs like Pentium II and III would always predict it correctly if you loaded it into a register and left it there for ~40 cycles and then jumped or called thru that register. I would expect it to also be true of Pentium M-derived chips. I don't know if it applies to other modern chips. I have no idea about AMD chips either.
-
- Posts: 859
- Joined: Mon Aug 10, 2009 10:05 pm
- Location: Italy
- Full name: Stefano Gemma
Re: Function pointers hurt performance?
This seems similar to the virtual functions calling in languages like C++. Maybe the overhead could be the same plus the search in your array.
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Function pointers hurt performance?
According to Agner Fog's document, Pentium M, Core2 and newer designs have something more sophisticated for predicting indirect branches, that can sometimes predict patterns with various different branch targets. Of course if the branches don't actually follow a pattern, it will mispredict most of the time.wgarvin wrote:Function pointers mean an indirect call, like calling a virtual method. x86 chips can only predict these if its a monomorphic call site (always calling the same function at that spot in the code). They predict it to the same address it went to last time. If you look up the function pointer in some kind of table and the values vary, its going to mispredict a lot. I've heard that some older chip designs like Pentium II and III would always predict it correctly if you loaded it into a register and left it there for ~40 cycles and then jumped or called thru that register. I would expect it to also be true of Pentium M-derived chips. I don't know if it applies to other modern chips. I have no idea about AMD chips either.
Also the misprediction penalty of x86 chips is not as large as I thought. Agner Fog gives details on the penalty for all of Intel and AMD's chips, and they all seem to be between 13 and 17 cycles. e.g. Pentium M and Nehalem are 15 cycles.
-
- Posts: 155
- Joined: Mon Feb 15, 2010 9:33 am
- Location: New Zealand
Re: Function pointers hurt performance?
If I call Foo()
directly and indirectly through a function pointer, the corresponding assembler is
Experiments timing a gazillion calls show no difference between the two calling methods. Function pointers do not hurt performance.
<rant>
In the old days it was easy to arrange this kind of timing experiment, but modern compilers put up obstacles. For example, at high optimization level a compiler discovers that a gazillion invocations of Foo() is equivalent toThe timing run now occupies a few nanoseconds instead of the expected seconds (or whatever), causing open mouths and squeaks of surprise. Moreover, it doesn't test what you wanted, because Foo() isn't called even once. A workaround is to put Foo()'s definition in a separate compilation unit, away from the prying eyes of the inliner.
</rant>
Robert P.
Code: Select all
SInt64 gFoo;
void Foo() { gFoo++; }
Code: Select all
callq _Foo ; direct
callq *%r14 ; function pointer
<rant>
In the old days it was easy to arrange this kind of timing experiment, but modern compilers put up obstacles. For example, at high optimization level a compiler discovers that a gazillion invocations of Foo() is equivalent to
Code: Select all
gFoo += GAZILLION;
</rant>
Robert P.
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Function pointers hurt performance?
That test is not very meaningful, its a monomorphic call site. All x86 chips will easily predict that. Try pseudoeandomly selecting the function pointer from an array of 8 pointers. At startup, use a command line option to decide to either (1) set them to 8 separate functions that add 8 different constants to a global variable, or (2) set all 8 pointers to point to the same function.
Then use RKISS or something to get a 3-bit index for each iteration. Run at least 10,000 iterations (maybe 100,000?) and compare timings from (1) and (2).
Then use RKISS or something to get a 3-bit index for each iteration. Run at least 10,000 iterations (maybe 100,000?) and compare timings from (1) and (2).
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Function pointers hurt performance?
There are two parts to predicting a branch on x86. 1. Is the branch taken (for a call it is always "yes")? 2. Where is the branch going?micron wrote:If I call Foo()directly and indirectly through a function pointer, the corresponding assembler isCode: Select all
SInt64 gFoo; void Foo() { gFoo++; }
Experiments timing a gazillion calls show no difference between the two calling methods. Function pointers do not hurt performance.Code: Select all
callq _Foo ; direct callq *%r14 ; function pointer
<rant>
In the old days it was easy to arrange this kind of timing experiment, but modern compilers put up obstacles. For example, at high optimization level a compiler discovers that a gazillion invocations of Foo() is equivalent toThe timing run now occupies a few nanoseconds instead of the expected seconds (or whatever), causing open mouths and squeaks of surprise. Moreover, it doesn't test what you wanted, because Foo() isn't called even once. A workaround is to put Foo()'s definition in a separate compilation unit, away from the prying eyes of the inliner.Code: Select all
gFoo += GAZILLION;
</rant>
Robert P.
(2) is more interesting because when you fetch and then predict the branch, you don't have a clue where it is going since the register being used might not yet be ready for access. The solution is a "branch target buffer" which simply predicts the branch AND where it is going, based on the last time it was encountered. You can do a conditional jump to an indirect address and predict the jump correctly and miss the address (entire thing is then predicted wrong) or you can predict the address correctly and miss the jump (again, entire thing is wrong), or you can miss both. Only when you get both right do you have any success.
Your code always jumps to the same place, whether you use the explicit jump address, or the indirect address through a register. When you get into a call where the address changes, performance will drop. Your code really is not testing that at all...
-
- Posts: 155
- Joined: Mon Feb 15, 2010 9:33 am
- Location: New Zealand
Re: Function pointers hurt performance?
Ah, right. A performance hit from scrambling the branch predictor's tiny mind is easily demonstrated.
Timings based on the code above give 10 ns/call. With all gFuncPtr[] = &Foo, branch prediction never fails and the time is only 2.4 ns/call.
A 'direct call' alternative to variable function pointers corresponds test-code like this:
This clocks in at 3.6 ns/call. The timing is identical whether the values in gFuncID[] are 0--7, or all 0, suggesting that branch prediction is not involved.
It seems that in artificial but (I think) fair comparison, calls through a function pointer are not necessarily slower than direct calls via switch, and can even be faster. Note that the tests are deliberately designed to expose small differences. It's hard to believe that they could affect an engine's n.p.s. measurably.
Robert P.
Code: Select all
gFuncPtr[0] = &Foo;
gFuncPtr[1] = &Foo1;
...
gFuncPtr[7] = &Foo7;
for ( i = 0; i < GAZILLION; i++ ) {
(*gFuncPtr[i % 8])();
}
A 'direct call' alternative to variable function pointers corresponds test-code like this:
Code: Select all
for ( i = 0; i < GAZILLION; i++ ) {
switch ( gFuncID[j % 8] ) {
case 0: Foo(); break;
case 1: Foo1(); break;
...
case 7: Foo7(); break;
}
}
It seems that in artificial but (I think) fair comparison, calls through a function pointer are not necessarily slower than direct calls via switch, and can even be faster. Note that the tests are deliberately designed to expose small differences. It's hard to believe that they could affect an engine's n.p.s. measurably.
Robert P.
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Function pointers hurt performance?
Some modern chips can apparently predict more than 1 or 2 different targets. In this case, there is a very regular pattern that is 8 values long, but I guess it can't remember as many as 8 separate branch targets so it is unable to successfully predict them on each pass through the table. That extra 7.6 ns probably represents the full penalty (15 cycles or whatever) for every call. Notice that without it, the effective cost of the call/return and whatever other entry/exit instructions was only like 5 cycles.micron wrote:Ah, right. A performance hit from scrambling the branch predictor's tiny mind is easily demonstrated.
Timings based on the code above give 10 ns/call. With all gFuncPtr[] = &Foo, branch prediction never fails and the time is only 2.4 ns/call.Code: Select all
gFuncPtr[0] = &Foo; gFuncPtr[1] = &Foo1; ... gFuncPtr[7] = &Foo7; for ( i = 0; i < GAZILLION; i++ ) { (*gFuncPtr[i % 8])(); }
You should look at the disassembly... I think its more likely that the compiler turned that switch statement into a tree of ordinary conditional branches, and then successfully predicted most or all of them.micron wrote: A 'direct call' alternative to variable function pointers corresponds test-code like this:This clocks in at 3.6 ns/call. The timing is identical whether the values in gFuncID[] are 0--7, or all 0, suggesting that branch prediction is not involved.Code: Select all
for ( i = 0; i < GAZILLION; i++ ) { switch ( gFuncID[j % 8] ) { case 0: Foo(); break; case 1: Foo1(); break; ... case 7: Foo7(); break; } }
Switch statements can be compiled in a couple of different ways, depending on how many case values there are and whether they are dense or sparse, etc.
micron wrote: It seems that in artificial but (I think) fair comparison, calls through a function pointer are not necessarily slower than direct calls via switch, and can even be faster. Note that the tests are deliberately designed to expose small differences. It's hard to believe that they could affect an engine's n.p.s. measurably.
Robert P.