Performance loss when removing unused function

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Performance loss when removing unused function

Post by OliverBr »

When I remove an unused function, the nps performance drops about 0.5%. It may not be much, but it's reproducible and not logical.

After some refactoring this method is not being used anymore:

Code: Select all

void setBit(int f, u64 *b) {
	*b |= BIT[f];
}
So, when I remove this function and recompile, everything is the same, except the nps performance.
It is reproducible and not a fluctuation on the computer, it's a server with nothing else running on it and the fluctuations with the same executable are much less.
I made a lot of runs to be sure and the performance does drop by removing an unused method. Of course, the executable is different, too.

This is the compiler command on an AMD EPYC 7502P 32-Core Processor:

Code: Select all

clang -O3 -mavx2    olithink.c   -o olithink
So, now it's better to keep the unused function? I don't see any logic in this.

PS: An empty method instead of setBit at the same location keeps the performance equal:

Code: Select all

void tata() {}
When I remove this method, performance drops. Is there any explanation?
Last edited by OliverBr on Mon Oct 05, 2020 11:54 pm, edited 3 times in total.
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Performance loss when removing unused function

Post by Ras »

Maybe removing that function just happens to shift the following code up a little in the address space, and that may interact with the cache lines of the CPU.
Rasmus Althoff
https://www.ct800.net
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Performance loss when removing unused function

Post by OliverBr »

Ras wrote: Mon Oct 05, 2020 11:42 pm Maybe removing that function just happens to shift the following code up a little in the address space, and that may interact with the cache lines of the CPU.
Yes, this may be an explanation I have been thinking about, too. Is this a known phenomena?
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
human
Posts: 1
Joined: Thu Apr 02, 2020 5:25 am
Full name: Andrew Yan

Re: Performance loss when removing unused function

Post by human »

OliverBr wrote: Mon Oct 05, 2020 11:47 pm Yes, this may be an explanation I have been thinking about, too. Is this a known phenomena?
Yes: https://www.youtube.com/watch?v=koTf7u0v41o. There's some details about 15 minutes in.
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Performance loss when removing unused function

Post by OliverBr »

human wrote: Tue Oct 06, 2020 12:05 am
OliverBr wrote: Mon Oct 05, 2020 11:47 pm Yes, this may be an explanation I have been thinking about, too. Is this a known phenomena?
Yes: https://www.youtube.com/watch?v=koTf7u0v41o. There's some details about 15 minutes in.
Thank you for this link. I will watch it later.

Just for amusement: After removing another now unused function I have this situation:
I can boost performance by adding two dummy functions:

Code: Select all

void tata() {}
void tata2() {}
One is not enough.. 8-)
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Performance loss when removing unused function

Post by bob »

You might check out your compilers many alignment optimization options...
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Performance loss when removing unused function

Post by hgm »

When I was optimizing qperft I even saw a 20% drop in speed from removing some unreachable code. I don't think caching could have been a problem for such a small program, even then. It was probably more an alignment issue. These super-scalar CPUs are very unpredictable, because they tend to decode instructions in bundles that have some specific alignment requirements. So it can matter whether you jump into an early or a late instruction in such a bundle.
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Performance loss when removing unused function

Post by OliverBr »

hgm wrote: Tue Oct 06, 2020 11:03 am These super-scalar CPUs are very unpredictable, because they tend to decode instructions in bundles that have some specific alignment requirements.
This is very true. I have analyzed such thing down to an easy special example:

Changing this line in OliThink's function "protV2":

Code: Select all

else if (!strncmp(buf,"random",6));
to this:

Code: Select all

else if (!strncmp(buf,"random",6)) random = 1;
drops performance notably because of alignment.
When disassembling the difference becomes apparent:

Code: Select all

000000000040ab10 <protV2>:
.... <code of function protV2>
000000000040b660 <isDraw>:
becomes because of the extra handling of "random":

Code: Select all

000000000040ab10 <protV2>:
.... <code of function protV2>
000000000040b6a0 <isDraw>:
So, "isDraw" on 0x40b6a0 is bad idea, only on 0x40b660 is has the best performance.
But exactly why?
Anyway, on any of the other addresses I have tested, e.g. 0x40b620 etc.. the performance worse than on 0x40b660. Some are even worse than 0x40b6a0 and there is no apparent pattern.

This looks like some kind of lottery.
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Performance loss when removing unused function

Post by OliverBr »

The alignment issue ist really something.
It's possible to add 2% speed just by adding a call to a dummy function which virtually does nothing and isn't called anyway:

By adding the following line into the search (condition is never true),

Code: Select all

if (ply == 131) return dummy();
there is a performance boost from

Code: Select all

Nodes: 95905038 cs: 2975 knps: 4123
to

Code: Select all

Nodes: 95905038 cs: 2922 knps: 4198
which is a 2% gain in speed and more than 5 ELO gain in strength. Of course there is nothing deterministic here and just pure luck. After the next code change somewhere else the numbers changes again and adding such line drops performance.

I wonder how others handle this issue?
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: Performance loss when removing unused function

Post by maksimKorzh »

OliverBr wrote: Thu Oct 29, 2020 12:08 am The alignment issue ist really something.
It's possible to add 2% speed just by adding a call to a dummy function which virtually does nothing and isn't called anyway:

By adding the following line into the search (condition is never true),

Code: Select all

if (ply == 131) return dummy();
there is a performance boost from

Code: Select all

Nodes: 95905038 cs: 2975 knps: 4123
to

Code: Select all

Nodes: 95905038 cs: 2922 knps: 4198
which is a 2% gain in speed and more than 5 ELO gain in strength. Of course there is nothing deterministic here and just pure luck. After the next code change somewhere else the numbers changes again and adding such line drops performance.

I wonder how others handle this issue?
Oliver, would it be the same if compile like:
gcc -Ofast -fomit-frame-pointer olithink.c -o olithink?
(using gcc instead clang, not using avx2 and using -fomit-frame-pointer)

and how do you test knps? running search from starting position?
How can I reproduce this behavior in exact way on my side?