fast mobility count through hashing

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

fast mobility count through hashing

Post by mcostalba »

In Stockfish mobility computation is a very performance critical part.

Code: Select all

    // Mobility
    int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
                              : count_1s(bb & ~p.pieces_of_color(us)));
Bit count routines are the well known:

Code: Select all

inline int count_1s(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v = ((v >> 4) + v) & 0x0F0F0F0F; // 0-8 in 8 bits
  v += (((w >> 4) + w) & 0x0F0F0F0F);  // 0-16 in 8 bits
  v *= 0x01010101; // mul is fast on amd procs
  return int(v >> 24);
}

inline int count_1s_max_15(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v += w; // 0-8 in 4 bits
  v *= 0x11111111;
  return int(v >> 28);
}

Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.

What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.

So I was thinking in hashing the mobility bitboard 'b':

Code: Select all

int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?

Thanks
Marco
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: fast mobility count through hashing

Post by bob »

mcostalba wrote:In Stockfish mobility computation is a very performance critical part.

Code: Select all

    // Mobility
    int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
                              : count_1s(bb & ~p.pieces_of_color(us)));
Bit count routines are the well known:

Code: Select all

inline int count_1s(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v = ((v >> 4) + v) & 0x0F0F0F0F; // 0-8 in 8 bits
  v += (((w >> 4) + w) & 0x0F0F0F0F);  // 0-16 in 8 bits
  v *= 0x01010101; // mul is fast on amd procs
  return int(v >> 24);
}

inline int count_1s_max_15(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v += w; // 0-8 in 4 bits
  v *= 0x11111111;
  return int(v >> 28);
}

Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.

What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.

So I was thinking in hashing the mobility bitboard 'b':

Code: Select all

int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?

Thanks
Marco
You should look at what we do in Crafty. We have a very small sort of hash/cache table for mobility values, where for a given square, we can determine if the mobility of that piece has changed since the last calculation and avoid the PopCnt() stuff if it has not...
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: fast mobility count through hashing

Post by mcostalba »

bob wrote:
mcostalba wrote:In Stockfish mobility computation is a very performance critical part.

Code: Select all

    // Mobility
    int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
                              : count_1s(bb & ~p.pieces_of_color(us)));
Bit count routines are the well known:

Code: Select all

inline int count_1s(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v = ((v >> 4) + v) & 0x0F0F0F0F; // 0-8 in 8 bits
  v += (((w >> 4) + w) & 0x0F0F0F0F);  // 0-16 in 8 bits
  v *= 0x01010101; // mul is fast on amd procs
  return int(v >> 24);
}

inline int count_1s_max_15(Bitboard b) {
  unsigned w = unsigned(b >> 32), v = unsigned(b);
  v -= (v >> 1) & 0x55555555; // 0-2 in 2 bits
  w -= (w >> 1) & 0x55555555;
  v = ((v >> 2) & 0x33333333) + (v & 0x33333333); // 0-4 in 4 bits
  w = ((w >> 2) & 0x33333333) + (w & 0x33333333);
  v += w; // 0-8 in 4 bits
  v *= 0x11111111;
  return int(v >> 28);
}

Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.

What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.

So I was thinking in hashing the mobility bitboard 'b':

Code: Select all

int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?

Thanks
Marco
You should look at what we do in Crafty. We have a very small sort of hash/cache table for mobility values, where for a given square, we can determine if the mobility of that piece has changed since the last calculation and avoid the PopCnt() stuff if it has not...

Yes,already tried,hit count is around 50%, to low to have a performance increase considering the added cache housekeeping overhead.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: fast mobility count through hashing

Post by Gerd Isenberg »

mcostalba wrote:In Stockfish mobility computation is a very performance critical part.

Code: Select all

    // Mobility
    int mob = (Piece != QUEEN ? count_1s_max_15(bb & ~p.pieces_of_color(us))
                              : count_1s(bb & ~p.pieces_of_color(us)));

Thinking about a way to improve (and trying to aoid bitcount() intrinsics) I come up with an observation.

What we need to count is only a very small subset of the all possible ocmbinations. As example for a rook mobility bitboard will occupy only ONE rank + ONE file, for a bishop ONLY two diagonals.

So I was thinking in hashing the mobility bitboard 'b':

Code: Select all

int mob = (hashKeys[square] * b) % 16; // 32 for queen
Does anybody has experience on this?

Thanks
Marco
Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.

Do you also intend to exclude taboo squares (f.i. defended by opponent pawns)? Or do you only consider the 144 (3*4*3*4) different attack sets per center square of a rook?
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: fast mobility count through hashing

Post by mcostalba »

Gerd Isenberg wrote: Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Why not possible with 16 ? currently I use count_max_15 for sliders but the queen. Mobility count is always below 15 for the above sliders.

Gerd Isenberg wrote: Do you also intend to exclude taboo squares (f.i. defended by opponent pawns)? Or do you only consider the 144 (3*4*3*4) different attack sets per center square of a rook?
that's what I call "A QUESTION" !! Yes, you raised a sensible point, actually I understund without taboo squares is much more simple, but to adapt to current stockfish I need to include also "taboo squares" so that we end up with non contiguos attacks sets.

Another thing, the code could be:

Code: Select all

int mob = (hashKeys[pieceType][square] * b) % 16; // 32 for queen
Not what I have posted before.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: fast mobility count through hashing

Post by Gerd Isenberg »

mcostalba wrote:
Gerd Isenberg wrote: Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Why not possible with 16 ? currently I use count_max_15 for sliders but the queen. Mobility count is always below 15 for the above sliders.
mod 16 is the same as "and 15" and will mask the four least significant bits, and you will lose all upper relevant bits of the "magic" product ;-)
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: fast mobility count through hashing

Post by mcostalba »

Gerd Isenberg wrote:
mcostalba wrote:
Gerd Isenberg wrote: Not possible with mod 16. Guess you mean mod 15 or 17 upwards, but no power of two? I fear it is not that simple and you will end up with some magic bitboard like additional indirection with more than 4 bits needed as index.
Why not possible with 16 ? currently I use count_max_15 for sliders but the queen. Mobility count is always below 15 for the above sliders.
mod 16 is the same as "and 15" and will mask the four least significant bits, and you will lose all upper relevant bits of the "magic" product ;-)
Now I have understood, the important bits are the most significant ones. I was thinking that the multiplication could yield different LSB according to the bitboard count.
Ryan Benitez
Posts: 719
Joined: Thu Mar 09, 2006 1:21 am
Location: Portland Oregon

Re: fast mobility count through hashing

Post by Ryan Benitez »

Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))

The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: fast mobility count through hashing

Post by mcostalba »

Ryan Benitez wrote:Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))

The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.
I don't think is supported on 32bit systems, where the plain count() routine is very slow, on 64bits the count function is already faster.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: fast mobility count through hashing

Post by Zach Wegner »

Ryan Benitez wrote:Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))

The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.
I thought the popcnt instruction was only available on Core i7s?

Anyways, the idea is workable IMO. The magic would be something like shifting the attack bit for every attackable square up to bit 60, and then the sum would be in the top 4 bits. Overflows make it considerably more complicated, but adapting the existing magic finders to do this shouldn't be too hard.