On Rook tables in magic move generation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
vittyvirus
Posts: 646
Joined: Wed Jun 18, 2014 2:30 pm
Full name: Fahad Syed

On Rook tables in magic move generation

Post by vittyvirus »

Whoa! 800 kB for rooks, and less than _50_ kB for bishops!
I think better technique is to use a separate table for Rank and File Attacks. Rank attacks need no multiplication, and I've figured out that with a clever implementation, they take only 0.5 kB. File attacks need magic multiplication but constant shifts with 57, and they're are only 8 different magic no.s you need. Besides, they can be computed at compile time, because successive magic is just two multiplied by previous magic. With this, rook tables need less than 0.5 + 8 * ( 8 * 64) = 4.5 kB.
What do you think?
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: On Rook tables in magic move generation

Post by hgm »

This technique is known as 'kindergarten bitboards'. It requires more operations, because you have to calculate ranks and files separately. But you can hope to earn that back by using smaller tables, which fit in the L1 cache, so that you can do the multiple accesses (for ranks and files) to them in parallel, which you could not do to tables in L2.

An even less memory-consuming technique is to use 8-fold rotated bitboard, where the various rays are always stored contiguously in the direction of travel, and then use carry propagation to generate the accessible squares. This doesn't require any table lookups at all.
User avatar
vittyvirus
Posts: 646
Joined: Wed Jun 18, 2014 2:30 pm
Full name: Fahad Syed

Re: On Rook tables in magic move generation

Post by vittyvirus »

hgm wrote:This technique is known as 'kindergarten bitboards'. It requires more operations, because you have to calculate ranks and files separately. But you can hope to earn that back by using smaller tables, which fit in the L1 cache, so that you can do the multiple accesses (for ranks and files) to them in parallel, which you could not do to tables in L2.

An even less memory-consuming technique is to use 8-fold rotated bitboard, where the various rays are always stored contiguously in the direction of travel, and then use carry propagation to generate the accessible squares. This doesn't require any table lookups at all.
Then I'm yet again in a quest for an original idea ;)
I find hyperbola quintessence with reverse bits algorithm far the most easiest way to generate attack masks. But reversing bits is a slow task and hardware intrinsics aren't available for it, but reversing is only required once in rank attacks.
I think now I'd find an equation whose root is attack mask of a given piece and occupancy. Then use Newton's method or householder's iteration. Just kidding. :)
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: On Rook tables in magic move generation

Post by mvk »

hgm wrote:An even less memory-consuming technique is to use 8-fold rotated bitboard, where the various rays are always stored contiguously in the direction of travel, and then use carry propagation to generate the accessible squares. This doesn't require any table lookups at all.
I was figuring that two orientations suffice for this, because you can ripple over intermediate bits if you clear (or set) them first. Still feels like a hassle.
[Account deleted]
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: On Rook tables in magic move generation

Post by hgm »

Interesting idea. What I like is that it would also work for oblique directions, e.g. Nightriders. And it isn't that much of a hassle. It just needs to apply the usual mask before as well as after the subtraction. And it is relatively cheap to apply to larger boards.
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: On Rook tables in magic move generation

Post by hgm »

The way I had in mind (but so fare never applied) was to store the boards as individully addressible rays in 4 different directions. This would not put such a cramped limit on board size. Each ray would contain both a normal and bit-reversed copy of the occupation. Even on a 32-bit machine this could work upto 16x16 boards. The O^(O-2*R) trick could then be applied in both directions along the ray simultaneously. There would never be any need to actually reverse bits.

You would just need 4 board-size tables to indicate which rays from the ray set go through each square, and whne aan occupation changes you would have to update all rays through it.