Plain and fancy magic on modern hardware

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Plain and fancy magic on modern hardware

Post by Houdini »

If memory footprint is an issue, one can always apply a compacting scheme to the rook and bishop tables.

In the "fancy" magic scheme the rook table contains about 100k 64-bit entries (around 800 kB), but in reality there are only 4900 different attacking bitboards possible for a rook, so there is an important redundancy.

One can easily construct the look-up table of 4900 different attacking bitboards (about 40 kB), and then replace the 100 k rook table entries with a 16-bit index in the look-up table. This leads to a memory footprint of 40 kB + 2 * 100 kB or about 240 kB instead of the initial 800 kB.

The next step is to realize that there are maximum 144 rook attacking bitboards for a given square. This means that one can replace the 16-bit index with an 8-bit index combined with a fixed offset per square. This leads to a memory footprint of 40 kB + 100 kB or about 140 kB instead of the initial 800 kB.

In the "plain" magic scheme the savings would be even more important. The 2048 kB rook table becomes 40 kB + 256 kB or about 300 kB.
The plain magic code shown above would simply become:

Code: Select all

byte mBishopAttacks[64] [512]; // 32 K 
byte mRookAttacks  [64][4096]; // 256 K 

U64 mRookLookup[4900];  // 39 K
U64 mBishopLookup[1428];  // 11 K

struct SMagic { 
   U64 mask;  // to mask relevant squares of both lines (no outer squares) 
   U64 magic; // magic 64-bit factor 
   int offset;   // offset into lookup table
}; 

SMagic mBishopTbl[64]; 
SMagic mRookTbl  [64]; 

U64 bishopAttacks(U64 occ, enumSquare sq) { 
   occ &= mBishopTbl[sq].mask; 
   occ *= mBishopTbl[sq].magic; 
   occ >>= 64-9; 
   return mBishopLookup[mBishopTbl[sq].offset + mBishopAttack[sq][occ]];
}


For the bishop table there are 1428 attacking bitboards, so the 40 kB (fancy) bishop table compacts to a 11 kB look-up table and a 5 kB index table = 16 kB. Probably less useful.

I've tested this compacting scheme successfully in Houdini, but I'm still not sure whether it's really beneficial or not.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Plain and fancy magic on modern hardware

Post by Gerd Isenberg »

Houdini wrote:If memory footprint is an issue, one can always apply a compacting scheme to the rook and bishop tables.

In the "fancy" magic scheme the rook table contains about 100k 64-bit entries (around 800 kB), but in reality there are only 4900 different attacking bitboards possible for a rook, so there is an important redundancy.

One can easily construct the look-up table of 4900 different attacking bitboards (about 40 kB), and then replace the 100 k rook table entries with a 16-bit index in the look-up table. This leads to a memory footprint of 40 kB + 2 * 100 kB or about 240 kB instead of the initial 800 kB.

The next step is to realize that there are maximum 144 rook attacking bitboards for a given square. This means that one can replace the 16-bit index with an 8-bit index combined with a fixed offset per square. This leads to a memory footprint of 40 kB + 100 kB or about 140 kB instead of the initial 800 kB.

In the "plain" magic scheme the savings would be even more important. The 2048 kB rook table becomes 40 kB + 256 kB or about 300 kB.
The plain magic code shown above would simply become:

Code: Select all

byte mBishopAttacks[64] [512]; // 32 K 
byte mRookAttacks  [64][4096]; // 256 K 

U64 mRookLookup[4900];  // 39 K
U64 mBishopLookup[1428];  // 11 K

struct SMagic { 
   U64 mask;  // to mask relevant squares of both lines (no outer squares) 
   U64 magic; // magic 64-bit factor 
   int offset;   // offset into lookup table
}; 

SMagic mBishopTbl[64]; 
SMagic mRookTbl  [64]; 

U64 bishopAttacks(U64 occ, enumSquare sq) { 
   occ &= mBishopTbl[sq].mask; 
   occ *= mBishopTbl[sq].magic; 
   occ >>= 64-9; 
   return mBishopLookup[mBishopTbl[sq].offset + mBishopAttack[sq][occ]];
}


For the bishop table there are 1428 attacking bitboards, so the 40 kB (fancy) bishop table compacts to a 11 kB look-up table and a 5 kB index table = 16 kB. Probably less useful.

I've tested this compacting scheme successfully in Houdini, but I'm still not sure whether it's really beneficial or not.
Another variation of the memory size versus computation theme. The 16-bit indirection was tested worse by Pradu and others, maybe things changed with recent processors. Your byte lookup idea with offset is interesting, 4 KByte per rook square instead of 32K, but two dependent "semi-huge" lookups.

I am not that memory and cache expert with nehalem and its huge smart caches of 6 MByte and more. And what are the typical hit/miss ratios and miss costs of various caches - if we lookup sparse entries from 2MB (32K per square) versus 256 KByte ( 4K per square) during a "real" search. And what benefits (if any) huge pages make here.
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Plain and fancy magic on modern hardware

Post by Houdini »

Gerd Isenberg wrote:The 16-bit indirection was tested worse by Pradu and others
Any reference to their tests?
Thanks.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Plain and fancy magic on modern hardware

Post by Gerd Isenberg »

Houdini wrote:
Gerd Isenberg wrote:The 16-bit indirection was tested worse by Pradu and others
Any reference to their tests?
Thanks.
I can not found the original CCC post by Pradu with his results any longer (seems somehow lost), it was about to define PERFECT_MAGIC_HASH in Pradu's source (Download Magic Move Bitboard Generator). It was discussed and was slightly slower. Here is another thread on that topic.
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Plain and fancy magic on modern hardware

Post by Houdini »

Gerd Isenberg wrote:I can not found the original CCC post by Pradu with his results any longer (seems somehow lost), it was about to define PERFECT_MAGIC_HASH in Pradu's source (Download Magic Move Bitboard Generator).
Thanks!
You're correct, inspection of Pradu's code shows that the 16-bit index scheme I suggested is identical to Pradu's PERFECT_MAGIC_HASH option.
The 8-bit index with offset per square is the logical next step, as it further halves the rook tables from 200 kB to 100 kB at a very minimal cost.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Plain and fancy magic on modern hardware

Post by Gerd Isenberg »

Houdini wrote:
Gerd Isenberg wrote:I can not found the original CCC post by Pradu with his results any longer (seems somehow lost), it was about to define PERFECT_MAGIC_HASH in Pradu's source (Download Magic Move Bitboard Generator).
Thanks!
You're correct, inspection of Pradu's code shows that the 16-bit index scheme I suggested is identical to Pradu's PERFECT_MAGIC_HASH option.
The 8-bit index with offset per square is the logical next step, as it further halves the rook tables from 200 kB to 100 kB at a very minimal cost.
Yes, that could be very competitive. With post-mask one can even half the array of distinct rook attacks, even shared by bishop attacks with a resulting mRookBishopLookup of less than 20 KByte.
micron
Posts: 155
Joined: Mon Feb 15, 2010 9:33 am
Location: New Zealand

Re: Plain and fancy magic on modern hardware

Post by micron »

Houdini wrote:
Gerd Isenberg wrote:I can not found the original CCC post by Pradu with his results any longer (seems somehow lost), it was about to define PERFECT_MAGIC_HASH in Pradu's source (Download Magic Move Bitboard Generator).
Thanks!
You're correct, inspection of Pradu's code shows that the 16-bit index scheme I suggested is identical to Pradu's PERFECT_MAGIC_HASH option.
The 8-bit index with offset per square is the logical next step, as it further halves the rook tables from 200 kB to 100 kB at a very minimal cost.
The "plain with fixed-shift tweak" attack generator as discussed earlier in this thread is obtained in Pradu's magicmoves.h with three macros commented out:
//#define MINIMIZE_MAGIC
//#define PERFECT_MAGIC_HASH unsigned short
//#define VARIABLE_SHIFT

Robert P.