So this sounds like BEXTR to me. But only between some bits. I dont understand it -.-tcusr wrote: ↑Mon Dec 13, 2021 9:03 pmtbh i have trouble understanding the code too, but i can trydangi12012 wrote: ↑Mon Dec 13, 2021 8:32 pmNah hgm also said on other threads with bitboards you can select capturing moves with a single "and operation" on the enemy occupation. So you just do atk & enemy and have the moves to search first.
Its that branching 24 times for a queen like you said above is much much much more expensive than looking her atk set up in 2 operations and then with one AND you get the right moves. I tried it because the recurive code in my signature would be perfect for that.
Anyways tcusr I need your help on understanding this line of code:
uint64_t blocked_down = 0x7FFFFFFFFFFFFFFFull >> std::countl_zero(block & mask | 1ull);
I made the inner core for arithmetic much cleaner now because I saw some intrinsics are hidden in there:
For example there was X & (1 << p) - 1 which perfecly maps to BZHI. Not all compilers find that. Now its another 5% faster and the code shrank by 5 lines.
If we find an elegant solution for blocked_down part it would be perfect. I can feel that there is an elegant solution hidden in there somewhere. I dont know where that 0x7FFFFFFFFFFFFFFFull comes from and why to shift by that amount. The lookup for any slider is getting close to 3 lines of code with a lookup table of 2Kb.
So a small table and short code without too many dependencies. I like that a lot.
Code: Select all
/* Start of code */ static const inline uint64_t slide_arithmetic(int p, uint64_t block) { //BZHI //[src & (1 << inx) - 1] ; // split the line into upper and lower rays uint64_t mask = _bzhi_u64(block, p); // for the bottom we use CLZ + a shift to fill in from the top uint64_t blocked_down = 0x7FFFFFFFFFFFFFFFull >> std::countl_zero(block & mask | 1ull); //_blsmsk_u64 = X^X-1 // the intersection of the two is the move set after masking with the line return (_blsmsk_u64(block & ~mask) ^ blocked_down); } static const inline uint64_t Queen(uint64_t s, uint64_t occ) { const uint64_t* r = rank_mask.data() + 4 * s; return slide_arithmetic(s, r[0] & occ) & r[0] ^ slide_arithmetic(s, r[1] & occ) & r[1] ^ slide_arithmetic(s, r[2] & occ) & r[2] ^ slide_arithmetic(s, r[3] & occ) & r[3]; }
this is 0x7FFFFFFFFFFFFFFFullby shifting right it makes sure to only fill the bottom of the first encountered piece (msb), but idk if it's rightCode: Select all
+---+---+---+---+---+---+---+---+ | X | X | X | X | X | X | X | | 8 +---+---+---+---+---+---+---+---+ | X | X | X | X | X | X | X | X | 7 +---+---+---+---+---+---+---+---+ | X | X | X | X | X | X | X | X | 6 +---+---+---+---+---+---+---+---+ | X | X | X | X | X | X | X | X | 5 +---+---+---+---+---+---+---+---+ | X | X | X | X | X | X | X | X | 4 +---+---+---+---+---+---+---+---+ | X | X | X | X | X | X | X | X | 3 +---+---+---+---+---+---+---+---+ | X | X | X | X | X | X | X | X | 2 +---+---+---+---+---+---+---+---+ | X | X | X | X | X | X | X | X | 1 +---+---+---+---+---+---+---+---+ a b c d e f g h
https://en.wikipedia.org/wiki/X86_Bit_m ... uction_set
block is the masked line
block & mask is the masked lower half of the line (positive ray)
block & mask | 1ull i dont understand
Isnt there a trick to get positive rays faster?