So the code currently checks where to shift towards.
The programmer knows because both diagonal and anti-diagonal need to be generated. So the diagonal and anti-diagonal bits would both be in the appropriate function as constants. I hope that makes sense?
Edit: I'll try to get them working in my engine Bricabrac for the bishop. But as usual I'm not very speedy though I will get started on it today.
I guess that I just do not understand templates. So let's see how much I can deduce.
sq is passed to dir_D1(sq)
in the macro dir_D1(X) ((X & 7) - (X >> 3)) can be either negative or positive which creates a decision that needs to be made. So at this point the programmer does not know which function to call. Is this correct? So maybe this?
ranks = ((fs & 7) - (fs >> 3))
mask_shiftD1[ranks > 0](ranks);
What is the most efficient C++ code to generate all 4 rays inside the constructor? (no external lookup is allowed)
what's faster than simply packing the vectors north_east, north_west, south_east, south_west, or, north, south, east & west, with the 64-bit continuation for each of the 64 squares? Once computed it's a simple reference. How could anything be faster?
What is the most efficient C++ code to generate all 4 rays inside the constructor? (no external lookup is allowed)
what's faster than simply packing the vectors north_east, north_west, south_east, south_west, or, north, south, east & west, with the 64-bit continuation for each of the 64 squares? Once computed it's a simple reference. How could anything be faster?
On some architectures a simple lookup is really slow compared to calcuation. Cuda proof:
33.74 Billion [lookup]
92.32 Billion [direct calculation]
So if there was a way to optimize the calculation it would be really great - but I dont see one either!
Is this a good translation? And if it is how is occ used? And if is it is not a good translation then can you post the complete bishop code in standard C? Maybe then I can try to help!