Page 1 of 3

fast mobility count through hashing

Posted: Sat May 09, 2009 7:12 pm
by mcostalba
In Stockfish mobility computation is a very performance critical part.

Code: Select all

    // Mobility
    int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
                              : count_1s(bb & ~p.pieces_of_color(us)));
Bit count routines are the well known:

Code: Select all

inline int count_1s(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v = ((v >> 4) + v) & 0x0F0F0F0F; // 0-8 in 8 bits
  v += (((w >> 4) + w) & 0x0F0F0F0F);  // 0-16 in 8 bits
  v *= 0x01010101; // mul is fast on amd procs
  return int(v >> 24);
}

inline int count_1s_max_15(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v += w; // 0-8 in 4 bits
  v *= 0x11111111;
  return int(v >> 28);
}

Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.

What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.

So I was thinking in hashing the mobility bitboard 'b':

Code: Select all

int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?

Thanks
Marco

Re: fast mobility count through hashing

Posted: Sat May 09, 2009 7:36 pm
by bob
mcostalba wrote:In Stockfish mobility computation is a very performance critical part.

Code: Select all

    // Mobility
    int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
                              : count_1s(bb & ~p.pieces_of_color(us)));
Bit count routines are the well known:

Code: Select all

inline int count_1s(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v = ((v >> 4) + v) & 0x0F0F0F0F; // 0-8 in 8 bits
  v += (((w >> 4) + w) & 0x0F0F0F0F);  // 0-16 in 8 bits
  v *= 0x01010101; // mul is fast on amd procs
  return int(v >> 24);
}

inline int count_1s_max_15(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v += w; // 0-8 in 4 bits
  v *= 0x11111111;
  return int(v >> 28);
}

Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.

What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.

So I was thinking in hashing the mobility bitboard 'b':

Code: Select all

int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?

Thanks
Marco
You should look at what we do in Crafty. We have a very small sort of hash/cache table for mobility values, where for a given square, we can determine if the mobility of that piece has changed since the last calculation and avoid the PopCnt() stuff if it has not...

Re: fast mobility count through hashing

Posted: Sat May 09, 2009 7:48 pm
by mcostalba
bob wrote:
mcostalba wrote:In Stockfish mobility computation is a very performance critical part.

Code: Select all

    // Mobility
    int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
                              : count_1s(bb & ~p.pieces_of_color(us)));
Bit count routines are the well known:

Code: Select all

inline int count_1s(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v = ((v >> 4) + v) & 0x0F0F0F0F; // 0-8 in 8 bits
  v += (((w >> 4) + w) & 0x0F0F0F0F);  // 0-16 in 8 bits
  v *= 0x01010101; // mul is fast on amd procs
  return int(v >> 24);
}

inline int count_1s_max_15(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v += w; // 0-8 in 4 bits
  v *= 0x11111111;
  return int(v >> 28);
}

Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.

What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.

So I was thinking in hashing the mobility bitboard 'b':

Code: Select all

int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?

Thanks
Marco
You should look at what we do in Crafty. We have a very small sort of hash/cache table for mobility values, where for a given square, we can determine if the mobility of that piece has changed since the last calculation and avoid the PopCnt() stuff if it has not...

Yes,already tried,hit count is around 50%, to low to have a performance increase considering the added cache housekeeping overhead.

Re: fast mobility count through hashing

Posted: Sat May 09, 2009 10:42 pm
by Gerd Isenberg
mcostalba wrote:In Stockfish mobility computation is a very performance critical part.

Code: Select all

    // Mobility
    int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
                              : count_1s(bb & ~p.pieces_of_color(us)));

Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.

What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.

So I was thinking in hashing the mobility bitboard 'b':

Code: Select all

int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?

Thanks
Marco
Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.

Do you also intend to exclude taboo squares (f.i. defended by opponent pawns)? Or do you only consider the 144 (3*4*3*4) different attack sets per center square of a rook?

Re: fast mobility count through hashing

Posted: Sat May 09, 2009 10:58 pm
by mcostalba
Gerd Isenberg wrote: Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Why not possible with 16 ? currently I use count_max_15 for sliders but the queen. Mobility count is always below 15 for the above sliders.

Gerd Isenberg wrote: Do you also intend to exclude taboo squares (f.i. defended by opponent pawns)? Or do you only consider the 144 (3*4*3*4) different attack sets per center square of a rook?
that's what I call "A QUESTION" !! Yes, you raised a sensible point, actually I understund without taboo squares is much more simple, but to adapt to current stockfish I need to include also "taboo squares" so that we end up with non contiguos attacks sets.

Another thing, the code could be:

Code: Select all

int mob = (hashKeys[pieceType][square] * b) % 16; // 32 for queen
Not what I have posted before.

Re: fast mobility count through hashing

Posted: Sat May 09, 2009 11:31 pm
by Gerd Isenberg
mcostalba wrote:
Gerd Isenberg wrote: Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Why not possible with 16 ? currently I use count_max_15 for sliders but the queen. Mobility count is always below 15 for the above sliders.
mod 16 is the same as "and 15" and will mask the four least significant bits, and you will lose all upper relevant bits of the "magic" product ;-)

Re: fast mobility count through hashing

Posted: Sun May 10, 2009 12:38 am
by mcostalba
Gerd Isenberg wrote:
mcostalba wrote:
Gerd Isenberg wrote: Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Why not possible with 16 ? currently I use count_max_15 for sliders but the queen. Mobility count is always below 15 for the above sliders.
mod 16 is the same as "and 15" and will mask the four least significant bits, and you will lose all upper relevant bits of the "magic" product ;-)
Now I have understood, the important bits are the most significant ones. I was thinking that the multiplication could yield different LSB according to the bitboard count.

Re: fast mobility count through hashing

Posted: Sun May 10, 2009 12:48 am
by Ryan Benitez
Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))

The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.

Re: fast mobility count through hashing

Posted: Sun May 10, 2009 12:52 am
by mcostalba
Ryan Benitez wrote:Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))

The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.
I don't think is supported on 32bit systems, where the plain count() routine is very slow, on 64bits the count function is already faster.

Re: fast mobility count through hashing

Posted: Sun May 10, 2009 1:04 am
by Zach Wegner
Ryan Benitez wrote:Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))

The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.
I thought the popcnt instruction was only available on Core i7s?

Anyways, the idea is workable IMO. The magic would be something like shifting the attack bit for every attackable square up to bit 60, and then the sum would be in the top 4 bits. Overflows make it considerably more complicated, but adapting the existing magic finders to do this shouldn't be too hard.