Final '32 bit friendly' attack getters

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Final '32 bit friendly' attack getters

Post by Michael Sherwin »

Code: Select all

If anyone is interested in trying these 32 bit friendly sliding-piece 'folded' bitboard move generators then I suggest this as a starting point. The reason that they only return an index is so, it can be used to lookup mobility as well as an attack set. Also the index can be saved if it might be needed later.

Please note that there is not one single 64 bit access in either routine. It is optimized/targeted for 32 bit machines. People with 64 bit machines that do not mind having two sets of routines for both 32 bit and 64 bit machines are welcome to create the 64 bit counter parts.

u32 BishopIndex(square *sq, u64*occ) {                                       // b1
  u32 index  = (u32)*occ | (*(occ + 4) >> 1);                                // b2
      index |= (index & 0xffff0000) >> sq->bishopShift;                      // b3
      return sq->bPerfect1[index & 255] | sq->rPerfect2[(index >> 8) & 255]; // b4
}

U32 RookIndex(square *sq, u64 *occ) {                                        // r1
  u32 occ1   = (u32)*(occ + sq->off1);                                       // r2 
  u32 occ2   = (u32)*(occ + sq->off2);                                       // r3
  u32 index  = (occ1  & sq->rankMask) >> sq->rankShift;                      // r4
      index |= (occ1  & sq->colMask1) >> sq->colShift1;                      // r5
      index |= (occ2  & sq->colMask2) >> sq->colShift2;                      // r6
      index |= (index &   0xffff0000) >> sq->rookShift;                      // r7
      return sq->rPerfect1[index & 255] | sq->rPerfect2[(index >> 8) & 255]; // r8
}

b1) pointers are only 32 bit, so it saves one instruction to do it this way.
b2) using 32 bit instructions only retrieve both parts of occ and combine them.
b3) fold the upper half of index into lower two rows, now you have a dual index.
b4) return the combined minimal perfect index to the bishp attack/mobility table.

r1) same as b1.
r2) load occ1 with the half of the occupied bitboard that contains the rank.
r3) load occ2 with the other half.
r4) load the rank bits shifted to bit 0 of index.
r5) load the column bits of occ1 shifted into place in the index.
r6) load the column bits of occ2 shifted into place in the index.
r7) same as b3.
r8) same as b4, except it is for rooks.

* ?Perfect1 for each square is initialized with the address of that squares subtable.
* ?Perfect1 then is further initialized with the row1 bits for a subtable within the subtable.
* ?Perfect2 is initialized with the remaining bits.
* when ?Perfect1 and ?Perfect2 are or'd together they form the complete index.

I am going to use these routines in my new chess program regardless of speed, because I believe that they are more than fast enough. And the bishop is probably the fastest yet developed in 32 bit or even 64 bit. The rook may or may not be fastest in 32 bit, but it is definitly very competitive. For 64 bit I reccommend Gerd's 'Kindergarden' RookAttacks(). Or just use 'kindergarden' for rooks and 'folded' for bishops.

I will now complete the initialization code and make a follow-up post.

I hope that this code proves useful! 

Mike
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Final '32 bit friendly' attack getters

Post by Michael Sherwin »

While I was sleeping it came to me that the bishopBits were not masked off in the BishopIndex() routine. I will correct the code when I am more rested.

In the RookIndex() routine the rookBits are included in the indivigual mask that pick up various groups of bits. No change needed there.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Final '32 bit friendly' attack getters

Post by Michael Sherwin »

return sq->bPerfect1[index & 255] ^ sq->rPerfect2[(index >> 8) & 255];

return sq->rPerfect1[index & 255] ^ sq->rPerfect2[(index >> 8) & 255];

By changing the combining operation to '^' instead of '|' it is possible to make a perfect hash (or maybe a minimal perfect hash) to access the attack sets.

A rook on A1 neds 4096 entries if a pure 12 bit index is generated.

However, a rook on A1 only has 49 unique attack sets.

so, by assigning 6 bit random numbers to each bit combination on rows 1 and 2 and exclusive oring them together as in the code above an index to a 64 element subtable is created.

If randoms can be found so that there are no collisions then you have a perfect hash.

Hmmm, 4096 reduced to 64 sounds good to me!

If all bit combinations have no collisions and map to indexs 0-48 of the subtable then you have a minimal perfect hash.

Hmmm, 4096 reduced to 49, even better! :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Final '32 bit friendly' attack getters

Post by Gerd Isenberg »

Michael Sherwin wrote:return sq->bPerfect1[index & 255] ^ sq->rPerfect2[(index >> 8) & 255];

return sq->rPerfect1[index & 255] ^ sq->rPerfect2[(index >> 8) & 255];

By changing the combining operation to '^' instead of '|' it is possible to make a perfect hash (or maybe a minimal perfect hash) to access the attack sets.

A rook on A1 neds 4096 entries if a pure 12 bit index is generated.

However, a rook on A1 only has 49 unique attack sets.

so, by assigning 6 bit random numbers to each bit combination on rows 1 and 2 and exclusive oring them together as in the code above an index to a 64 element subtable is created.

If randoms can be found so that there are no collisions then you have a perfect hash.

Hmmm, 4096 reduced to 64 sounds good to me!

If all bit combinations have no collisions and map to indexs 0-48 of the subtable then you have a minimal perfect hash.

Hmmm, 4096 reduced to 49, even better! :D
I told you this is over my head ;-)

There seems to be one fundamental problem to me. Both sub-indices (or 6,7 or 8-bit indices) i1, i2 are only unique in combination i2*K+i1. While traversing all relevant occupied sets, i1 occurs number of combinations(i2) times the same and i2 number of combinations(i1) times. Despite the redundant bits of the positive collisions, where different occupancies map same attacksets, i guess it is hard to initialze the rPerfect arrays, so that

Code: Select all

 sq->rPerfect1[i1] ^(|+) sq->rPerfect2[i2]
leaves a dense unique index range far below i2*K+i1 with rPerfect1 and rPerfect2 have common bits.

Gerd
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Final '32 bit friendly' attack getters

Post by Michael Sherwin »

Gerd Isenberg wrote:
Michael Sherwin wrote:return sq->bPerfect1[index & 255] ^ sq->rPerfect2[(index >> 8) & 255];

return sq->rPerfect1[index & 255] ^ sq->rPerfect2[(index >> 8) & 255];

By changing the combining operation to '^' instead of '|' it is possible to make a perfect hash (or maybe a minimal perfect hash) to access the attack sets.

A rook on A1 neds 4096 entries if a pure 12 bit index is generated.

However, a rook on A1 only has 49 unique attack sets.

so, by assigning 6 bit random numbers to each bit combination on rows 1 and 2 and exclusive oring them together as in the code above an index to a 64 element subtable is created.

If randoms can be found so that there are no collisions then you have a perfect hash.

Hmmm, 4096 reduced to 64 sounds good to me!

If all bit combinations have no collisions and map to indexs 0-48 of the subtable then you have a minimal perfect hash.

Hmmm, 4096 reduced to 49, even better! :D
I told you this is over my head ;-)

There seems to be one fundamental problem to me. Both sub-indices (or 6,7 or 8-bit indices) i1, i2 are only unique in combination i2*K+i1. While traversing all relevant occupied sets, i1 occurs number of combinations(i2) times the same and i2 number of combinations(i1) times. Despite the redundant bits of the positive collisions, where different occupancies map same attacksets, i guess it is hard to initialze the rPerfect arrays, so that

Code: Select all

 sq->rPerfect1[i1] ^(|+) sq->rPerfect2[i2]
leaves a dense unique index range far below i2*K+i1 with rPerfect1 and rPerfect2 have common bits.

Gerd
Hi Gerd,

No, unfortuately, it is my head that it is over. I just can not keep enough of it crammed into the tiny space that is my mind to get things right. I realized after I made this last post that what I posted was garbage. So, if I want to save memory then I will have to fall back to my Attack superset idea. At least I know that that works, because Zack Wegner tried it on a different attempt at a move generator and it worked, but, it was slow on 32 bits. Not tested on 64 bits. That method required the anding together of six supersets for the bishop and eight supersets for the rook. However, for this folding method only two supersets need to be anded for both the bishop and the rook, so it may be good for both 32 bit and 64 bit machines.

For those of you that do not know what an attack superset is then imagine a blocking pattern that only has two blockers one in each row or index portion. If a blocker is the only blocker then there is a specific attack set that would be generated. Same for the bit in the other index portion. Each attack set is a superset, because in reality the piece in question can not move to all those positions. However, when both supersets are anded together then all that is left is the real attack set. Combinations of blocking bits that occure in either index portion are preanded together and placed into the lookup tables.

These then return a u64 attack set.

return sq->bSuperSet1[index & 255] & sq->bSuperSet2[(index >> 8) & 255];

return sq->rSuperSet1[index & 255] & sq->rSuperSet2[(index >> 8) & 255];

The memory savings is substantial as no further lookup tables are needed. And it requires one less indirection to retrieve the attack set.

Anyway untill I actually get something up and running as well as working correctly I am not going to waste peoples time by making any further post. :oops: I'll wait untill it's working before I waste anyones time!
:D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through