How to Shift Left a Negative Number??

lucasart · Post by **lucasart** » Sun Apr 07, 2013 4:08 am

I don't know why you're all so obsessed with this left/right shift thing. People have gone as far as:
1/ duplicating the code for everything, for an alleged speed improvement. have they really measured it ? How much time is actually spent in these operations compared to all the other things an engine does ?
2/ using Ippolit style unguarded recursive inclusions, in order to emulate a template mechanism in plain C. This is ugly beyond belief...

So what's wrong with a simple ?

Code: Select all

uint64_t shift(uint64_t b, int i)
{ return i > 0 ? b << i : b >> -i; }

Has anyone actually measured how much they gain from duplicating all the code for white abd black (whether directly, or by using C++ templates, or by using "C templates" Ippolit-style) ?

Steve Maughan · Post by **Steve Maughan** » Sun Apr 07, 2013 5:19 am

Hi Lucas,

lucasart wrote:I don't know why you're all so obsessed with this left/right shift thing.[...]

I agree!

I'm amazed this thread has gone to 3 pages. I expected a quick yes / no type answer. Nevertheless it's been interesting, although I doubt it has any impact on playing strength.

Steve

lucasart · Post by **lucasart** » Sun Apr 07, 2013 6:03 am

Actually this is quite revealing. There seem to "a school of though" that duplicating code is a great idea, because it supposedly improves performance.

1/ some people use templates everywhere (often C++ fans), instead of funtion parameters, especially in areas that are not even time critical. This leads to obscure templatized code, and bloatware (code is repeated all over the place in the executable). In fact it may even be slower if abused, as the size of the executable itself becomes large, and simply calling functions results in a cache miss...

2/ some people (often C fans) love to do loop unrolling. typically these people don't really use a profiler. they just "read somewhere" that it makes code faster. they do it in one place, and then they feel that "to be consistent" they should do it everywhere. and loop unrolling spreads all over their code like a cancer.

=> 1/ and 2/ are acute symptoms of brain damage.

There are cases where doing 1/ or 2/ in a FEW places can be useful, but you should always look at like that:
* first write the unoptimized version, keep it simple, general, maintainable, clean etc.
* then use a profiler, and detect the bottle necks. if there is indeed a trick like 1/ or 2/ that would make the bottleneck faster, and you can prove it by use of the profiler, then ask yourself another question
* is the remedy worse than the cure ? Is the pain/gain ratio worth it ?
* if it is, then do the 1/ or 2/ hack.

But in reality, if we come back to the negative shift problem, it's often impossible to measure. I don't know about other engines, but in mine (DiscoCheck) the few calls to shift_bit() are not critical in terms of time, and it's not even possible to measure if optimizing would be faster. The running time (even if your program does sth deterministic) is itself a random variable, with a variance, bigger than we tend to imagine (other OS processes are doing things at the same time, the extra layer added by the profiling etc.)

hgm · Post by **hgm** » Sun Apr 07, 2013 8:03 am

lucasart wrote:So what's wrong with a simple ?
Code: Select all
uint64_t shift(uint64_t b, int i)
{ return i > 0 ? b << i : b >> -i; }

It requires a branch.

Of course the effect of a single branch on the code as a whole will be hardly measurable, but it is more a matter of programming style. If you are generous with branches in all places they could be easily avoided, you would have hundreds of unnecessary branches. And that will hurt.

lucasart · Post by **lucasart** » Sun Apr 07, 2013 11:30 am

hgm wrote:
lucasart wrote:So what's wrong with a simple ?
Code: Select all
uint64_t shift(uint64_t b, int i)
{ return i > 0 ? b << i : b >> -i; }
It requires a branch.

Of course the effect of a single branch on the code as a whole will be hardly measurable, but it is more a matter of programming style. If you are generous with branches in all places they could be easily avoided, you would have hundreds of unnecessary branches. And that will hurt.

The compiler will inline this function. There will be no memory access (no "call", "no "push", no "ret") in most cases, especially x86-64 (extra registers). Everything is computed using registries directly.

So what do you mean by "branch" ?

ZirconiumX · Post by **ZirconiumX** » Sun Apr 07, 2013 11:47 am

It needs a branch to know whether to take the left-shift or the right-shift.

Matthew:out

hgm · Post by **hgm** » Sun Apr 07, 2013 11:56 am

A branch is a point where control flow can go two ways. (So that excludes calls and (direct) jumps, as control flow does always follow the same path there, and the only peculiarity is that this path is not contiguous in terms of memory addressing.)

Branches interfere with pipelining (on which the efficiency of modern CPUs depends), as there is no certainty what the CPU should start doing after the branch before the branch is resolved (i.e. the condition has been calculated and tested). And that happens only very close to the end of the pipeline. In the mean time the pipeline would be stalled.

Now this is ameliorated in modern CPUs by speculative execution of the instructions on one of the possible paths, combined with prediction of what this path would be (i.e. whether the branch will be taken or fall through). But predictions can be wrong, and it is not obvious at all that in the application at hand it can be better than random. And if the prediction is wrong the speculatively executed path was wasted effort, and has to be flushed out of the pipeline.

On the highly pipelined CPUs of today this can easily cost 11-12 clock cycles (the number of stages in the pipeline before the execute stage, which resolves the branch), each containing 3-4 instructions. (Even when the condition is immediately available for testing, like it probably is here; if it takes additional time to calculate the condition, e.g. because in the example i would not already be in a CPU register but has to come from a cache, or worse yet, from main memory, you could be even further down the speculative path before you discover that it was the wrong path.) So a mis-predicted branch can easily be as expensive as 30-40 normal instructions.

AlvaroBegue · Post by **AlvaroBegue** » Sun Apr 07, 2013 3:53 pm

ZirconiumX wrote:It needs a branch to know whether to take the left-shift or the right-shift.

Not true. One can compute both results and use a conditional `mov' instruction to determine which value to return. The compiler is free to use the more naive branch or the conditional `mov'.

This is what g++ 4.6.3 does:

Code: Select all

        movl    %esi, %ecx
        movq    %rdi, %rax
        negl    %ecx
        shrq    %cl, %rax
        movl    %esi, %ecx
        salq    %cl, %rdi
        testl   %esi, %esi
        cmovg   %rdi, %rax
        ret

Where is the branch?

hgm · Post by **hgm** » Sun Apr 07, 2013 4:12 pm

When I compile it, (gcc -O2 -S), I get

Code: Select all

	.file	"shift.c"
	.text
	.p2align 4,,15
.globl _shift
	.def	_shift;	.scl	2;	.type	32;	.endef
_shift:
	pushl	%ebp
	movl	%esp, %ebp
	movl	16(%ebp), %ecx
	movl	8(%ebp), %eax
	movl	12(%ebp), %edx
	testl	%ecx, %ecx
	jle	L2
	shldl	%cl,%eax, %edx
	sall	%cl, %eax
	testb	$32, %cl
	je	L3
	movl	%eax, %edx
	xorl	%eax, %eax
L3:
	popl	%ebp
	ret
	.p2align 4,,7
L2:
	negl	%ecx
	shrdl	%cl,%edx, %eax
	sarl	%cl, %edx
	testb	$32, %cl
	je	L3
	popl	%ebp
	movl	%edx, %eax
	sarl	$31, %edx
	ret

however. Even when you think away the pushing and argument handling, which is only there because it isn't inlined, it doesn't look so cool.

syzygy · Post by **syzygy** » Sun Apr 07, 2013 5:01 pm

AlvaroBegue wrote:
ZirconiumX wrote:It needs a branch to know whether to take the left-shift or the right-shift.
Not true. One can compute both results and use a conditional `mov' instruction to determine which value to return. The compiler is free to use the more naive branch or the conditional `mov'.

This is what g++ 4.6.3 does:
Code: Select all
        movl    %esi, %ecx
        movq    %rdi, %rax
        negl    %ecx
        shrq    %cl, %rax
        movl    %esi, %ecx
        salq    %cl, %rdi
        testl   %esi, %esi
        cmovg   %rdi, %rax
        ret
Where is the branch?

While you are correct, I think Matthew was responding to:

The compiler will inline this function. There will be no memory access (no "call", "no "push", no "ret") in most cases, especially x86-64 (extra registers). Everything is computed using registries directly.

So what do you mean by "branch" ?

HGM obviously referred to the "?" branch.

hgm wrote:When I compile it, (gcc -O2 -S), I get

That looks a lot like 32-bit code.

How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??

Re: How to Shift Left a Negative Number??