Discussion of chess software programming and technical issues.
Moderators: Harvey Williamson, bob, hgm
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

mcostalba
 Posts: 2684
 Joined: Sat Jun 14, 2008 7:17 pm
Post
by mcostalba » Sat May 09, 2009 5:12 pm
In Stockfish mobility computation is a very performance critical part.
Code: Select all
// Mobility
int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
: count_1s(bb & ~p.pieces_of_color(us)));
Bit count routines are the well known:
Code: Select all
inline int count_1s(Bitboard b) {
unsigned w = unsigned(b >> 32), v = unsigned(b);
v = (v >> 1) & 0x55555555; // 02 in 2 bits
w = (w >> 1) & 0x55555555;
v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 04 in 4 bits
w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
v = ((v >> 4) + v) & 0x0F0F0F0F; // 08 in 8 bits
v += (((w >> 4) + w) & 0x0F0F0F0F); // 016 in 8 bits
v *= 0x01010101; // mul is fast on amd procs
return int(v >> 24);
}
inline int count_1s_max_15(Bitboard b) {
unsigned w = unsigned(b >> 32), v = unsigned(b);
v = (v >> 1) & 0x55555555; // 02 in 2 bits
w = (w >> 1) & 0x55555555;
v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 04 in 4 bits
w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
v += w; // 08 in 4 bits
v *= 0x11111111;
return int(v >> 28);
}
Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.
What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.
So I was thinking in hashing the mobility bitboard 'b':
Code: Select all
int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?
Thanks
Marco

bob
 Posts: 20918
 Joined: Mon Feb 27, 2006 6:30 pm
 Location: Birmingham, AL
Post
by bob » Sat May 09, 2009 5:36 pm
mcostalba wrote:In Stockfish mobility computation is a very performance critical part.
Code: Select all
// Mobility
int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
: count_1s(bb & ~p.pieces_of_color(us)));
Bit count routines are the well known:
Code: Select all
inline int count_1s(Bitboard b) {
unsigned w = unsigned(b >> 32), v = unsigned(b);
v = (v >> 1) & 0x55555555; // 02 in 2 bits
w = (w >> 1) & 0x55555555;
v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 04 in 4 bits
w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
v = ((v >> 4) + v) & 0x0F0F0F0F; // 08 in 8 bits
v += (((w >> 4) + w) & 0x0F0F0F0F); // 016 in 8 bits
v *= 0x01010101; // mul is fast on amd procs
return int(v >> 24);
}
inline int count_1s_max_15(Bitboard b) {
unsigned w = unsigned(b >> 32), v = unsigned(b);
v = (v >> 1) & 0x55555555; // 02 in 2 bits
w = (w >> 1) & 0x55555555;
v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 04 in 4 bits
w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
v += w; // 08 in 4 bits
v *= 0x11111111;
return int(v >> 28);
}
Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.
What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.
So I was thinking in hashing the mobility bitboard 'b':
Code: Select all
int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?
Thanks
Marco
You should look at what we do in Crafty. We have a very small sort of hash/cache table for mobility values, where for a given square, we can determine if the mobility of that piece has changed since the last calculation and avoid the PopCnt() stuff if it has not...

mcostalba
 Posts: 2684
 Joined: Sat Jun 14, 2008 7:17 pm
Post
by mcostalba » Sat May 09, 2009 5:48 pm
bob wrote:mcostalba wrote:In Stockfish mobility computation is a very performance critical part.
Code: Select all
// Mobility
int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
: count_1s(bb & ~p.pieces_of_color(us)));
Bit count routines are the well known:
Code: Select all
inline int count_1s(Bitboard b) {
unsigned w = unsigned(b >> 32), v = unsigned(b);
v = (v >> 1) & 0x55555555; // 02 in 2 bits
w = (w >> 1) & 0x55555555;
v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 04 in 4 bits
w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
v = ((v >> 4) + v) & 0x0F0F0F0F; // 08 in 8 bits
v += (((w >> 4) + w) & 0x0F0F0F0F); // 016 in 8 bits
v *= 0x01010101; // mul is fast on amd procs
return int(v >> 24);
}
inline int count_1s_max_15(Bitboard b) {
unsigned w = unsigned(b >> 32), v = unsigned(b);
v = (v >> 1) & 0x55555555; // 02 in 2 bits
w = (w >> 1) & 0x55555555;
v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 04 in 4 bits
w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
v += w; // 08 in 4 bits
v *= 0x11111111;
return int(v >> 28);
}
Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.
What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.
So I was thinking in hashing the mobility bitboard 'b':
Code: Select all
int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?
Thanks
Marco
You should look at what we do in Crafty. We have a very small sort of hash/cache table for mobility values, where for a given square, we can determine if the mobility of that piece has changed since the last calculation and avoid the PopCnt() stuff if it has not...
Yes,already tried,hit count is around 50%, to low to have a performance increase considering the added cache housekeeping overhead.

Gerd Isenberg
 Posts: 2208
 Joined: Wed Mar 08, 2006 7:47 pm
 Location: Hattingen, Germany
Post
by Gerd Isenberg » Sat May 09, 2009 8:42 pm
mcostalba wrote:In Stockfish mobility computation is a very performance critical part.
Code: Select all
// Mobility
int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
: count_1s(bb & ~p.pieces_of_color(us)));
Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.
What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.
So I was thinking in hashing the mobility bitboard 'b':
Code: Select all
int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?
Thanks
Marco
Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Do you also intend to exclude taboo squares (f.i. defended by opponent pawns)? Or do you only consider the 144 (3*4*3*4) different attack sets per center square of a rook?

mcostalba
 Posts: 2684
 Joined: Sat Jun 14, 2008 7:17 pm
Post
by mcostalba » Sat May 09, 2009 8:58 pm
Gerd Isenberg wrote:
Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Why not possible with 16 ? currently I use count_max_15 for sliders but the queen. Mobility count is always below 15 for the above sliders.
Gerd Isenberg wrote:
Do you also intend to exclude taboo squares (f.i. defended by opponent pawns)? Or do you only consider the 144 (3*4*3*4) different attack sets per center square of a rook?
that's what I call "A QUESTION" !! Yes, you raised a sensible point, actually I understund without taboo squares is much more simple, but to adapt to current stockfish I need to include also "taboo squares" so that we end up with non contiguos attacks sets.
Another thing, the code could be:
Code: Select all
int mob = (hashKeys[pieceType][square] * b) % 16; // 32 for queen
Not what I have posted before.

Gerd Isenberg
 Posts: 2208
 Joined: Wed Mar 08, 2006 7:47 pm
 Location: Hattingen, Germany
Post
by Gerd Isenberg » Sat May 09, 2009 9:31 pm
mcostalba wrote:Gerd Isenberg wrote:
Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Why not possible with 16 ? currently I use count_max_15 for sliders but the queen. Mobility count is always below 15 for the above sliders.
mod 16 is the same as "and 15" and will mask the four least significant bits, and you will lose all upper relevant bits of the "magic" product

mcostalba
 Posts: 2684
 Joined: Sat Jun 14, 2008 7:17 pm
Post
by mcostalba » Sat May 09, 2009 10:38 pm
Gerd Isenberg wrote:mcostalba wrote:Gerd Isenberg wrote:
Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Why not possible with 16 ? currently I use count_max_15 for sliders but the queen. Mobility count is always below 15 for the above sliders.
mod 16 is the same as "and 15" and will mask the four least significant bits, and you will lose all upper relevant bits of the "magic" product
Now I have understood, the important bits are the most significant ones. I was thinking that the multiplication could yield different LSB according to the bitboard count.

Ryan Benitez
 Posts: 719
 Joined: Thu Mar 09, 2006 12:21 am
 Location: Portland Oregon
Post
by Ryan Benitez » Sat May 09, 2009 10:48 pm
Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))
The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.

mcostalba
 Posts: 2684
 Joined: Sat Jun 14, 2008 7:17 pm
Post
by mcostalba » Sat May 09, 2009 10:52 pm
Ryan Benitez wrote:Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))
The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.
I don't think is supported on 32bit systems, where the plain count() routine is very slow, on 64bits the count function is already faster.

Zach Wegner
 Posts: 1922
 Joined: Wed Mar 08, 2006 11:51 pm
 Location: Earth

Contact:
Post
by Zach Wegner » Sat May 09, 2009 11:04 pm
Ryan Benitez wrote:Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))
The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.
I thought the popcnt instruction was only available on Core i7s?
Anyways, the idea is workable IMO. The magic would be something like shifting the attack bit for every attackable square up to bit 60, and then the sum would be in the top 4 bits. Overflows make it considerably more complicated, but adapting the existing magic finders to do this shouldn't be too hard.