Comparison of bitboard attack-getter variants

Henk · Post by **Henk** » Wed Jan 06, 2016 2:23 pm

For the rook I used this. So for now it does not use perfect hashing and magic numbers but instead looks it up in a Dictionary<ulong, ulong> (= sort of hash table).

But this solution is slightly slower than the one without bit boards. So if I use perfect hashing I doubt if there will be any gain. Looks like you can you only get magic numbers using trial and error. So I still have to write code for that.

Code: Select all

        public override bool CanCapture(ulong target)
        {
         
            return ((Location.StraightMovesDict[Location.StraightOcc & Board.Occupiers] & target) != 0);
        }

hgm · Post by **hgm** » Wed Jan 06, 2016 2:26 pm

Sven Schüle wrote:In "Plain" the square is actually used as an array index. None of these implementations uses HGM's proposal of a pointer array, though.

The code Gerd posts here uses a pointer rather than an integer offset...

Note that it is quite understandable/expected that it doesn't make much difference whether you use a compacted 800KB table through an extra level of irregular indirection, or a 2MB regular 2-dimensional array. The latter only wastes address space. It doesn't occupy any extra memory; it just leaves it unused. Unused memory does not cause cache pressure. As your program will hardly ever exactly need the available 4GB (or whatever people have nowadays), there will always be a lot of unused memory, and it doesn't make any difference whether all that unused memory is spread out in chunks within your table, or a contiguous range at the end of your program. Even a small Rook table for a single square takes 8KB, which completely fills two cache ways, so it is not like your array maps only in a sub-set of the cache, and leaves the rest unused. You don't even put more pressure on the TLB, when the table is page-aligned, as the boundaries between tables for squares coincide with page boundaries (for 4KB 'small' pages).

So the only difference is how to calculate the start address of the tables: load it from memory from the indirection table, or multiply. I think both operations have a maximum throughput of 1 per cycle, and a latency of 3 or 4 clocks. So it is really "lead for old iron", to use a Dutch expression.

Henk · Post by **Henk** » Wed Jan 06, 2016 9:23 pm

Trying to find magic numbers but it fails. It already fails for rook moves on a1.

pair.Key with value 282578800083326 and 282578783371646 always seem to collide no matter what magic number I use.

Code: Select all

            int nMoves = StraightMovesDict.Count;
            NBits = ((int)Math.Log(nMoves, 2)) + 2;
            int nBuckets = (int)(Math.Pow(2, NBits));

            StraightMoves = new ulong[nBuckets];
         
            ulong ii = 1;
          
            bool found = false;
            do
            {
                found = true;

                Magic = (ulong)(ii++) << (64 - NBits);

                for (int i = 0; i < nBuckets; i++)
                {
                    StraightMoves[i] = 999;
                }


                int k = 0;
                foreach (var pair in StraightMovesDict)
                {
                    k++;
                    int index = MagicKey(pair.Key);
                    if (StraightMoves[index] != 999)
                    {
                        found = false;
                        break;
                    }
                    else StraightMoves[index] = pair.Key;
                }
            }
            while (!found);


        int MagicKey(ulong occupancy)
        {
            ulong ind = occupancy * Magic;
            int index = (int)(ind >> (64 - NBits));
            return index;
        }

So the code above always breaks for the same value of k (= 10) and loop never ends.

Now I'm looking for aspirins.

Sven · Post by **Sven** » Wed Jan 06, 2016 9:46 pm

Henk wrote:Trying to find magic numbers but it fails. It already fails for rook moves on a1.

pair.Key with value 282578800083326 and 282578783371646 always seem to collide no matter what magic number I use.

Code: Select all

            int nMoves = StraightMovesDict.Count;
            NBits = ((int)Math.Log(nMoves, 2)) + 2;
            int nBuckets = (int)(Math.Pow(2, NBits));

            StraightMoves = new ulong[nBuckets];
         
            ulong ii = 1;
          
            bool found = false;
            do
            {
                found = true;

                Magic = (ulong)(ii++) << (64 - NBits);

                for (int i = 0; i < nBuckets; i++)
                {
                    StraightMoves[i] = 999;
                }


                int k = 0;
                foreach (var pair in StraightMovesDict)
                {
                    k++;
                    int index = MagicKey(pair.Key);
                    if (StraightMoves[index] != 999)
                    {
                        found = false;
                        break;
                    }
                    else StraightMoves[index] = pair.Key;
                }
            }
            while (!found);


        int MagicKey(ulong occupancy)
        {
            ulong ind = occupancy * Magic;
            int index = (int)(ind >> (64 - NBits));
            return index;
        }

So the code above always breaks for the same value of k (= 10) and loop never ends.

Now I'm looking for aspirins.

No, look for magics

Here, for instance ... The algorithm proposed a long time ago by Tord Romstad is very simple and works at once. I think you should not spend too much on that part.

Henk · Post by **Henk** » Wed Jan 06, 2016 9:56 pm

Perhaps there were missing some (ulong) casts. So I added a few. Now there is some progress but it is much too slow: After five minutes I arrive at square c1.

Code: Select all

Magic = ((ulong)ii * 10000000) + (ulong)((ulong)(ii++) << (64 - NBits -3));

I don't understand some statements of Tords code for instance where is this for.

Code: Select all

   if(count_1s((mask * magic) & 0xFF00000000000000ULL) < 6) continue;

hgm · Post by **hgm** » Wed Jan 06, 2016 10:30 pm

Henk wrote:Looks like you can you only get magic numbers using trial and error.

Not really. If you forget about sub-minimal magics (i.e. use a table of size 2^N when there are N relevant occupancy bits) it is rather straightforward to design them by hand. It is just that there are many (2x64) that makes this unattractive. E.g. if you use a rank scan with a1 in the MSB, the mask for the relevant Rook bits for d3 (f.e.) in binary is

Code: Select all

00000000 000a0000 0bc0def0 000g0000 000h0000 000i0000 000j0000 00000000

That is 10 relevant bits. To collect these in the uppermost 10 bits, you can shift left:

Code: Select all

0000a00000bc0def0 000g0000 000h0000 000i0000 000j0000 00000000 (<< 7, catches a)
00bc0def0000g0000 000h0000 000i0000 000j0000 00000000 (<< 15, catches bcdef)
g0000000h0000 000i0000 000j0000 00000000 (<< 27, catches gh)
0i0000000j0000 00000000 (<< 42, catches ij)
______________________________________________________________________ +
gibcadefhj bcgdefi0...

So no bits collide with these shifts in the region of interest, and addition will never cause a carry there. Somewhere low down in the region that will be right-shifted out of the word there are collisions, but the carry they generate will never propagate into the high-order 10 bits do to the zeroes that separate them. The shifts + adds correspond to a multiplication with (1<<7 | 1<<15 | 1<< 27 | 1<<42) =

Code: Select all

00000000 00000000 00000100 00000000 00001000 00000000 10000000 10000000
=0x0000040008008080ull

So that is a Rook magic for c3.

Henk · Post by **Henk** » Wed Jan 06, 2016 10:46 pm

Ok so no trial and error needed.

Before your reply I already copied a piece of Tord' s code and translated it to C# and it already solved the problem.

Code: Select all

       ulong random_uint64()
        {
            Random r = new Random();
            const int max = 100000;

            ulong u1, u2, u3, u4;
            u1 = (ulong)(r.Next(max)) & 0xFFFF;
            u2 = (ulong)(r.Next(max)) & 0xFFFF;
            u3 = (ulong)(r.Next(max)) & 0xFFFF;
            u4 = (ulong)(r.Next(max)) & 0xFFFF;
            return u1 | (u2 << 16) | (u3 << 32) | (u4 << 48);
        }
 ..
       Magic = random_uint64();

Took a few seconds to compute magic numbers for rook. So maybe he uses more tricks in his code to make it run faster.

Henk · Post by **Henk** » Wed Jan 06, 2016 11:07 pm

Henk wrote:For the rook I used this. So for now it does not use perfect hashing and magic numbers but instead looks it up in a Dictionary<ulong, ulong> (= sort of hash table).

But this solution is slightly slower than the one without bit boards. So if I use perfect hashing I doubt if there will be any gain. Looks like you can you only get magic numbers using trial and error. So I still have to write code for that.
Code: Select all
        public override bool CanCapture(ulong target)
        {
         
            return ((Location.StraightMovesDict[Location.StraightOcc & Board.Occupiers] & target) != 0);
        }

Now implementation is slightly faster say 2%.
Took whole day (or perhaps more) to implement it but it was really worth while (ahum)

Code: Select all

     public override bool CanCapture(ulong target)
     {
         return ((Location.StraightMovesArr[Location.MagicKey(Location.StraightOcc & Board.Occupiers)] & target) != 0);
     }

Sven · Post by **Sven** » Wed Jan 06, 2016 11:19 pm

Henk wrote:Now implementation is slightly faster say 2%.
Took whole day (or perhaps more) to implement it but it was really worth while (ahum)

Yeah, almost the first time something works for you - so the best post of the day in my view, and a nice start of the new year!

hgm · Post by **hgm** » Wed Jan 06, 2016 11:29 pm

Actually the method used above for d3 works for all 3rd-rank inner squares. You always needs shifts 7 and 15 to shift the 3rd rank to bits 56-61, and fill the hole in it with the 2nd-rank target square. Then you have to fudge in the file bits for rank 4-7, and shifting one 15 places with respect to the other maps rabk 6 & 7 just right of the bits in rank 4 & 5, with a 6-wide hole between them that can embrace the rank 2 & 3 bits we already caught. So you would just have to shift that left 3 ranks (24 bits), and then again enough to get the file into the MSB. For the d-file (file number 3 if we start file counting at 0) that would be 3 more.

So 3rd-rank Rook magics are 0x8080 + (0x8001 << 24 + fileNr). That gives us 6 in one blow.

For the 2nd-rank inner squares the 0x8080 part would also bring rank 2 & 3 in bits 56-61, so they can use the same magics! That is 6 more. For the higher ranks the 0x8080 part would have to be left-shifted another 16 or 32 bits to position it, and you would have to calculate how much shift the remaining two rank pairs need to get them into bit 63+55 and bit 62-54. That is all.

I guess magic is easy, if you are a wizard!

Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants