Comparison of bitboard attack-getter variants

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Comparison of bitboard attack-getter variants

Post by ZirconiumX »

Sven, if you don't mind me asking, could you run a quick comparison of if the extra bit twiddling of this article is worth the unified implementation.

Gerd might find the article useful too, although whether it can be suitably vectorized is not in my field of expertise (I'm a musician...).

Matthew:out
Some believe in the almighty dollar.

I believe in the almighty printf statement.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Comparison of bitboard attack-getter variants

Post by Gerd Isenberg »

Sven Schüle wrote: The following five approaches were tested:
- Magics
- Hyperbola Quintessence
- Dumb7Fill
- Obstruction Difference
- and an unnamed approach which I simply call "No Headache" (at least the author knows why).

A more realistic test would of course including a "real" static evaluation function, use SEE, improve the search (to reach much bigger search depths) and more. Although I doubt that this would change the results significantly, I plan to repeat the same test after improving the engine significantly. Also other attack-getter implementations are on my ToDo list, including Kindergarten bitboards and a few more.

Test result overview:

Code: Select all

Attack-getter variant   Perft NPS  -% vs. Magics  Search NPS  -% vs. Magics
===========================================================================
Magics                    1509538                    4689055
Obstruction Difference    1305917         -13,5%     4480245          -4,5%
"No Headache"             1290733         -14,5%     4411828          -5,9%
Hyperbola Quintessence    1327537         -12,1%     4380272          -6,6%
Dumb7fill                 1174150         -22,2%     4102345         -12,5%
My brief summary: Magics are still on rank 1. Several other techniques are not far away from Magics, and they might serve as a good starting point if program speed is not (yet) an issue.
Hi Sven,

ahh, attack-getters! Nice test. Interesting that beside the huge memory approach of magic bitboards as expected winner (inlined?), "No Headache" with loops and branches is faster than expected by me. Ok, both approaches may decline if more memory (magics) or branch resources ("No Headache") are used in other areas of a further developed engine.

Concerning Haralds former test and Rotated Indices, it is somehow difficult to compare approaches with different incremental update versus on the fly computation costs. Rotated Indices have higher costs in make/unmake or copy/make than rotated bitboards, which have higher incremental update costs than none-rotated approaches. Otoh, this may pay of if that deconcentrated (rotated or reversed) occupancy is probed often per node. But if you use more often temporarily changed occupancies per node in getting (xray) attacks per node, it is more work on the fly.

Of course, fillstuff like dump7fill is not intended to use for single sliding pieces, but for sets of multiple pieces in an directionwise approach, specially with AVX2 in mind à la DirGolem, or if you like to get progressive mobility, i.e. all squares a bishop may attack in two moves. Anyway, is Kogge Stone faster than Dumb7fill in your engine?

Thanks,
Gerd
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Comparison of bitboard attack-getter variants

Post by hgm »

Sven Schüle wrote:System environment
- PC: Lenovo Laptop W530 running Windows 7 Pro SP1
- CPU: Intel Core i7-3610QM CPU @2.30 GHz
Unfortunately this makes the results a bit difficult to interpret on an absolute scale, as this i7 model has the 'turboboost' feature, so that in cases where not all its cores are fully loaded, the clock frequency might rise upto 3.30 GHz. And when doing a single-core test this is likely to happen.

Just to put your results in perspective, could you measure the nps of micro-Max 1.6 (WB version), which is also an engine with material-only eval and no hash table, on your machine? On my 2.4GHz i3 id did 3.17Mnps, (on the opening position), but the i3 does not have turboboost. If I would convert that to 3.3GHz I would get 4.36Mnps.

Micro-Max has no piece list, though, so it has to scan the board to find his own pieces, which of course is not very good for its nps.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Comparison of bitboard attack-getter variants

Post by Gerd Isenberg »

Thanks for the link. Funny, but looks a bit expensive to me.

Code: Select all

>>  rank*8
* 0x0101010101010
& 0x804020100804020
usual HQ attack from dia
* 0x010101010101010
/ 0x0100000000000000 hopefully replaced by >> 56
<< rank*8
compared to a "usual" rank-getter with 512 Byte lookup

Code: Select all

BYTE arrFirstRankAttacks64x8&#91;64*8&#93;; // 512 Bytes = 1/2KByte
 
U64 rankAttacks&#40;U64 occ, enumSquare sq&#41; &#123;
   unsigned int file = sq &  7;
   unsigned int rkx8 = sq & 56; // rank * 8
   occ = &#40;occ >> rkx8&#41; & 2*63;
   U64 attacks = arrFirstRankAttacks64x8&#91;4*occ + file&#93;;
   return attacks << rkx8;
&#125;
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Comparison of bitboard attack-getter variants

Post by Sven »

ZirconiumX wrote:Sven, if you don't mind me asking, could you run a quick comparison of if the extra bit twiddling of this article is worth the unified implementation.

Gerd might find the article useful too, although whether it can be suitably vectorized is not in my field of expertise (I'm a musician...).
I'll try that as soon as I can. Might take a few days, though, until then.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Comparison of bitboard attack-getter variants

Post by Sven »

hgm wrote:
Sven Schüle wrote:System environment
- PC: Lenovo Laptop W530 running Windows 7 Pro SP1
- CPU: Intel Core i7-3610QM CPU @2.30 GHz
Unfortunately this makes the results a bit difficult to interpret on an absolute scale, as this i7 model has the 'turboboost' feature, so that in cases where not all its cores are fully loaded, the clock frequency might rise upto 3.30 GHz. And when doing a single-core test this is likely to happen.

Just to put your results in perspective, could you measure the nps of micro-Max 1.6 (WB version), which is also an engine with material-only eval and no hash table, on your machine? On my 2.4GHz i3 id did 3.17Mnps, (on the opening position), but the i3 does not have turboboost. If I would convert that to 3.3GHz I would get 4.36Mnps.

Micro-Max has no piece list, though, so it has to scan the board to find his own pieces, which of course is not very good for its nps.
I'll try that as soon as possible.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Comparison of bitboard attack-getter variants

Post by Sven »

Sven Schüle wrote:
hgm wrote:
Sven Schüle wrote:System environment
- PC: Lenovo Laptop W530 running Windows 7 Pro SP1
- CPU: Intel Core i7-3610QM CPU @2.30 GHz
Unfortunately this makes the results a bit difficult to interpret on an absolute scale, as this i7 model has the 'turboboost' feature, so that in cases where not all its cores are fully loaded, the clock frequency might rise upto 3.30 GHz. And when doing a single-core test this is likely to happen.

Just to put your results in perspective, could you measure the nps of micro-Max 1.6 (WB version), which is also an engine with material-only eval and no hash table, on your machine? On my 2.4GHz i3 id did 3.17Mnps, (on the opening position), but the i3 does not have turboboost. If I would convert that to 3.3GHz I would get 4.36Mnps.

Micro-Max has no piece list, though, so it has to scan the board to find his own pieces, which of course is not very good for its nps.
I'll try that as soon as possible.
On my machine, the speed of umax1_6w.exe seems to be comparable to that of my bitboard engine:

Code: Select all

tellics say         micro-Max 1.6w
tellics say         by H.G. Muller
new
level 40 120 0
post
go
 0      0        0          1 a8a8
 1     15        0         22 c2c4
 2      0        0        142 c2c4
 3     15        0       1784 c2c4
 4      0        1      15572 d2d4
 5     15        4     143800 b1c3
 6      0       43    1637236 b1c3
 7     14      291   13173109 e2e3
 8      2     3864  168516567 d2d4
 9     12    23255 1060333645 g1f3
move g1f3
13173109 / 2.91 = 4526842
168516567 / 38.64 = 4361195
1060333645 / 232.55 = 4559594

so it is roughly 4.4 .. 4.5 Mnps.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Comparison of bitboard attack-getter variants

Post by hgm »

Seems your i7 is running in full turbo mode (3.3GHz) in these tests.

And not bad for a slow mailbox (0x88 instead of boundary guards, no piece list).
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Comparison of bitboard attack-getter variants

Post by Henk »

How many moves do you need to store to make magic bit boards work ? I think only storing the bit boards that contains the moves will be too slow. For you still have to extract the moves out of these bit boards.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Comparison of bitboard attack-getter variants

Post by Sven »

Henk wrote:How many moves do you need to store to make magic bit boards work ? I think only storing the bit boards that contains the moves will be too slow. For you still have to extract the moves out of these bit boards.
When generating moves I always store the moves themselves in the move list, not just their bitboards which may indicate more than one move at once. There is one simple reason for it: move ordering would become quite difficult without a per-move scoring. I don't know exactly what you mean by "how many moves do you need to store ..." but I guess the answer is: "all". Performance might be another reason but in my opinion move ordering is the main problem here.