I don't know why you're all so obsessed with this left/right shift thing. People have gone as far as:
1/ duplicating the code for everything, for an alleged speed improvement. have they really measured it ? How much time is actually spent in these operations compared to all the other things an engine does ?
2/ using Ippolit style unguarded recursive inclusions, in order to emulate a template mechanism in plain C. This is ugly beyond belief...
uint64_t shift(uint64_t b, int i)
{ return i > 0 ? b << i : b >> -i; }
Has anyone actually measured how much they gain from duplicating all the code for white abd black (whether directly, or by using C++ templates, or by using "C templates" Ippolit-style) ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
lucasart wrote:I don't know why you're all so obsessed with this left/right shift thing.[...]
I agree!
I'm amazed this thread has gone to 3 pages. I expected a quick yes / no type answer. Nevertheless it's been interesting, although I doubt it has any impact on playing strength.
Actually this is quite revealing. There seem to "a school of though" that duplicating code is a great idea, because it supposedly improves performance.
1/ some people use templates everywhere (often C++ fans), instead of funtion parameters, especially in areas that are not even time critical. This leads to obscure templatized code, and bloatware (code is repeated all over the place in the executable). In fact it may even be slower if abused, as the size of the executable itself becomes large, and simply calling functions results in a cache miss...
2/ some people (often C fans) love to do loop unrolling. typically these people don't really use a profiler. they just "read somewhere" that it makes code faster. they do it in one place, and then they feel that "to be consistent" they should do it everywhere. and loop unrolling spreads all over their code like a cancer.
=> 1/ and 2/ are acute symptoms of brain damage.
There are cases where doing 1/ or 2/ in a FEW places can be useful, but you should always look at like that:
* first write the unoptimized version, keep it simple, general, maintainable, clean etc.
* then use a profiler, and detect the bottle necks. if there is indeed a trick like 1/ or 2/ that would make the bottleneck faster, and you can prove it by use of the profiler, then ask yourself another question
* is the remedy worse than the cure ? Is the pain/gain ratio worth it ?
* if it is, then do the 1/ or 2/ hack.
But in reality, if we come back to the negative shift problem, it's often impossible to measure. I don't know about other engines, but in mine (DiscoCheck) the few calls to shift_bit() are not critical in terms of time, and it's not even possible to measure if optimizing would be faster. The running time (even if your program does sth deterministic) is itself a random variable, with a variance, bigger than we tend to imagine (other OS processes are doing things at the same time, the extra layer added by the profiling etc.)
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
uint64_t shift(uint64_t b, int i)
{ return i > 0 ? b << i : b >> -i; }
It requires a branch.
Of course the effect of a single branch on the code as a whole will be hardly measurable, but it is more a matter of programming style. If you are generous with branches in all places they could be easily avoided, you would have hundreds of unnecessary branches. And that will hurt.
uint64_t shift(uint64_t b, int i)
{ return i > 0 ? b << i : b >> -i; }
It requires a branch.
Of course the effect of a single branch on the code as a whole will be hardly measurable, but it is more a matter of programming style. If you are generous with branches in all places they could be easily avoided, you would have hundreds of unnecessary branches. And that will hurt.
The compiler will inline this function. There will be no memory access (no "call", "no "push", no "ret") in most cases, especially x86-64 (extra registers). Everything is computed using registries directly.
So what do you mean by "branch" ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
A branch is a point where control flow can go two ways. (So that excludes calls and (direct) jumps, as control flow does always follow the same path there, and the only peculiarity is that this path is not contiguous in terms of memory addressing.)
Branches interfere with pipelining (on which the efficiency of modern CPUs depends), as there is no certainty what the CPU should start doing after the branch before the branch is resolved (i.e. the condition has been calculated and tested). And that happens only very close to the end of the pipeline. In the mean time the pipeline would be stalled.
Now this is ameliorated in modern CPUs by speculative execution of the instructions on one of the possible paths, combined with prediction of what this path would be (i.e. whether the branch will be taken or fall through). But predictions can be wrong, and it is not obvious at all that in the application at hand it can be better than random. And if the prediction is wrong the speculatively executed path was wasted effort, and has to be flushed out of the pipeline.
On the highly pipelined CPUs of today this can easily cost 11-12 clock cycles (the number of stages in the pipeline before the execute stage, which resolves the branch), each containing 3-4 instructions. (Even when the condition is immediately available for testing, like it probably is here; if it takes additional time to calculate the condition, e.g. because in the example i would not already be in a CPU register but has to come from a cache, or worse yet, from main memory, you could be even further down the speculative path before you discover that it was the wrong path.) So a mis-predicted branch can easily be as expensive as 30-40 normal instructions.
ZirconiumX wrote:It needs a branch to know whether to take the left-shift or the right-shift.
Not true. One can compute both results and use a conditional `mov' instruction to determine which value to return. The compiler is free to use the more naive branch or the conditional `mov'.
ZirconiumX wrote:It needs a branch to know whether to take the left-shift or the right-shift.
Not true. One can compute both results and use a conditional `mov' instruction to determine which value to return. The compiler is free to use the more naive branch or the conditional `mov'.
While you are correct, I think Matthew was responding to:
The compiler will inline this function. There will be no memory access (no "call", "no "push", no "ret") in most cases, especially x86-64 (extra registers). Everything is computed using registries directly.