Stack Memory Alignment an Issue?

Discussion of chess software programming and technical issues.

Moderator: Ras

rjgibert
Posts: 317
Joined: Mon Jun 26, 2006 9:44 am

Stack Memory Alignment an Issue?

Post by rjgibert »

This idea popped into my ahead as I was waking up this morning. I got to wondering if stack memory alignment is an issue for chess programs?

BTW, I looked up this type of topic on the web and this seems to be an area of concern that includes methods that have been patented.
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: Stack Memory Alignment an Issue?

Post by marcelk »

rjgibert wrote:This idea popped into my ahead as I was waking up this morning. I got to wondering if stack memory alignment is an issue for chess programs?

BTW, I looked up this type of topic on the web and this seems to be an area of concern that includes methods that have been patented.

It matters, depending on CPU. Note that your operating system may inject random run to run offsets that make it hard to measure the effect at all.

Image[/img]
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Stack Memory Alignment an Issue?

Post by wgarvin »

marcelk wrote:It matters, depending on CPU. Note that your operating system may inject random run to run offsets that make it hard to measure the effect at all.
Not sure why it would do that (perhaps for some DEP/randomization/anti-stack-smashing reason?) You could counteract the effect by manually aligning your stack in your own function that gets called from main and runs your whole engine inside of it. E.g. you could align it on a 128-byte or 4096-byte boundary, if you wanted to. That code will be non-portable (probably needing inline assembly, etc.) but you could make it a part of your portability layer for each platform. If you spawn new threads, you might want to align their stacks as well.

To do this, you'd write a function in assembly for each platform, that takes two args: one void* and a function pointer void (*foo)(void*). (and maybe a third to specify the alignment you want?) It would align the stack as desired, then it would call the function foo with the void* argument. Now you can use this in your main and in each worker thread, to align them to a known boundary before they start doing stuff. Actually I just thought of a fourth parameter you might want: offset within the aligned block.

I'm not sure how you'd determine the optimal values for alignment size and offset, it would probably depend on where your engine's frequently-accessed data structures are such as board state, certain lookup tables, etc. You could compile a bunch of executables with different values and test them to see which is fastest, but it would probably have to be the last thing you do before a release.

It all sounds like quite a lot of work for a pretty minor gain though.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stack Memory Alignment an Issue?

Post by bob »

marcelk wrote:
rjgibert wrote:This idea popped into my ahead as I was waking up this morning. I got to wondering if stack memory alignment is an issue for chess programs?

BTW, I looked up this type of topic on the web and this seems to be an area of concern that includes methods that have been patented.

It matters, depending on CPU. Note that your operating system may inject random run to run offsets that make it hard to measure the effect at all.

Image[/img]
I'm not even sure your measurements are valid. Each time you run an executable, it is loaded into a different set of real pages in memory, which will then map into cache differently, with different aliasing issues. Too many physical pages matching in the low order N bits will map to the same cache set, and cause thrashing. next time you might get a better balance of memory to cache and run a little faster.
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: Stack Memory Alignment an Issue?

Post by marcelk »

bob wrote:
I'm not even sure your measurements are valid. Each time you run an executable, it is loaded into a different set of real pages in memory, which will then map into cache differently, with different aliasing issues. Too many physical pages matching in the low order N bits will map to the same cache set, and cause thrashing. next time you might get a better balance of memory to cache and run a little faster.
Every point is the best of 10 measurements and the shape of the graph is repeatable. The measurement is valid. I would not have provided it otherwise.
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Stack Memory Alignment an Issue?

Post by jwes »

marcelk wrote:
rjgibert wrote:This idea popped into my ahead as I was waking up this morning. I got to wondering if stack memory alignment is an issue for chess programs?

BTW, I looked up this type of topic on the web and this seems to be an area of concern that includes methods that have been patented.

It matters, depending on CPU. Note that your operating system may inject random run to run offsets that make it hard to measure the effect at all.

Image[/img]
I think the scale distorts the size of the effect. If I am reading the graph correctly, the speed difference is about 1%.
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: Stack Memory Alignment an Issue?

Post by marcelk »

wgarvin wrote:
marcelk wrote:It matters, depending on CPU. Note that your operating system may inject random run to run offsets that make it hard to measure the effect at all.
Not sure why it would do that (perhaps for some DEP/randomization/anti-stack-smashing reason?) You could counteract the effect by manually aligning your stack in your own function that gets called from main and runs your whole engine inside of it.
You could switch randomization off instead of counter-acting, if you have access to the kernel.
I'm not sure how you'd determine the optimal values for alignment size and offset,
Brute-force works well. Just one run overnight with all combinations of interest.
It all sounds like quite a lot of work for a pretty minor gain though.
It is relatively simple compared to other aspects of engine tuning, but the gain is low. On the other hand, several such things combined may yield another elo point. As said, not all CPU types have a similar response function so YMMV.
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: Stack Memory Alignment an Issue?

Post by marcelk »

jwes wrote:If I am reading the graph correctly, the speed difference is about 1%.
1% is above my threshold for accepting speed improvements.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stack Memory Alignment an Issue?

Post by bob »

marcelk wrote:
bob wrote:
I'm not even sure your measurements are valid. Each time you run an executable, it is loaded into a different set of real pages in memory, which will then map into cache differently, with different aliasing issues. Too many physical pages matching in the low order N bits will map to the same cache set, and cause thrashing. next time you might get a better balance of memory to cache and run a little faster.
Every point is the best of 10 measurements and the shape of the graph is repeatable. The measurement is valid. I would not have provided it otherwise.
Sounds better. I didn't see the "average of 10 runs for each" mentioned.