Hear me out

gxchess1 · Post by **gxchess1** » Sun Apr 23, 2023 11:13 pm

While I was thinking a few days ago I happend to think about an idea.

Magic bitboards create an index using the hashing function

Code: Select all

ax>>n

and index a lookup table using that index. What if we could multiply this index by another magic number, then mask the result with the attack set for an empty occupancy and get the attack set? Here's how this would look in code:

Edit: the attack set for an empty occupancy are what moves a slider would have if it was on that square on an empty board. Superset of all attack sets.

Code: Select all

u64 indexify(u64 occ, u64 mask, u64 magic, u64 shift) {
	return ((occ & mask) * magic) >> (64-shift);
}
u64 unindexify(u64 index, u64 magic, u64 full) {
	return (index * magic) & full;
}

u64 bishop_attacks(u64 occ, Magics &magics) {
	occ &= magics.mask;
	occ *= magics.factor;
	occ >>= 9;
	occ *= magic.post_factor;
	occ &= magic.postmask;
	return occ;
}
v64x8 bishop_attacks(v64x8 occ, v64x8 masks, v64x8 factors, v64x8 post_factors, v64x8 post_masks) {
	occ &= masks;
	occ *= factors;
	occ >>= 59;
	occ *= post_factors;
	occ &= post_masks;
	return occ;
};

That's right, lut-less with just a couple of operations which I can count on my hand. This doesn't use a lut so this can be vectorized to multiple squares (avx512 will be able to do 8x attack calculations in sub-ns speed).

Now the problem is finding the parameters for the magics. Unlike regular lookup tables, instead of having a narrower index range, it would be better to have less used entries (that is, we can use any range we want, but not consume a lot of entries) so when looking for post-magics, it would expontetially increase the chances for a post-magic to be one when we are searched for less entries.

For now, for the bishop square B2, I can find about 10K fixed-shift black magic-numbers which use 16 entries (and cant find 15). If use a variable shifts of 5, I can find black magics which use 14 entries with relative easy (but that's still 2.333x more than it's theoretically possible, because there are only 6 distinct attack sets!).

At first, I had trouble breaking 29 entries, but as I got the hang of it I learned some tricks. The main trick was looking at the pattern, some bits are ALWAYS off, some bits are ALWAYS on. I constructed a simple python scripts to look at patterns from magic numbers, a couple of iterations later, from generating 4 a second you generate 10k. (Maybe genetic algorithm to generate those? But it may get stuck on too many local minimums).

As for finding post magics, I use the following unindexify function:

Code: Select all

u64 unindexify(u64 index, u64 magic, u64 full) {
	return ~((index * magic) | ~full);
}

Which I guess have brought to me more success. But the thing is, I can only find post-magics which are refuted at the 7th entry (which should be ok if I find a magic where it to only 6 unique indices).

One shortcoming which I think is not a big deal is that many resulting attack bitboards have bits set in their low bits: but if the index is odd, it should have a 1-bit in the lowest bit, so the multiplication thing will figure it out nicely.

I post here because I can't find black magic fixed shift numbers which go below 16 entries, and I can't find black post-magic numbers which are ok for anything after the 7th entry.

Anyways any questions? Ideas? Not possible? Possible? Help me? I am un-understandable?

Mike Sherwin · Post by **Mike Sherwin** » Mon Apr 24, 2023 12:20 am

If you want to move something to AVX-512 why not start with something more simple that has already been proven to be a bit faster than Black Magic in C/C++?

Code: Select all

    static uint64_t Bishop(int sq, uint64_t occ)
    {
        return  dSubset[sq][(((occ & dMask[sq]) * file_b2_b7) >> 58)] |
                aSubset[sq][(((occ & aMask[sq]) * file_b2_b7) >> 58)];
    }

    static uint64_t Rook(int sq, uint64_t occ)
    {
        return hSubset[sq][(occ >> shift_horizontal_table[sq]) & 63] |
               vSubset[sq][((((occ >> (sq & 7)) & file_a2_a7) * diag_c2h7) >> 58)];
    }

gxchess1 · Post by **gxchess1** » Mon Apr 24, 2023 10:02 pm

I have seen this and this is fantastic!

However, I wonder if my idea is theoretically correct and if it is applicable. If it is actually applicable, this should perform any other technique because of being LUT-less and being also the best type of vectorizable (just a series of very basic arithmetic operations). It's just 2 multiplications a shift and a mask, what could be better?

However really no idea if it's correct even in theory, because I've yet to find the magic factors.

Mike Sherwin · Post by **Mike Sherwin** » Mon Apr 24, 2023 10:26 pm

Thanks!

I'm an old dog and AVX-512 is definitely a new trick.

Maybe post your question here. https://talkchess.com/forum3/viewtopic. ... 05#p916724

Hear me out

Hear me out

Re: Hear me out

Re: Hear me out

Re: Hear me out