32 bit versions for bitscan64

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

32 bit versions for bitscan64

Post by Desperado »

Hi everyone,

because i know http://chessprogramming.wikispaces.com/BitScan where all sorts of bitscans are descriped, i missed the following
approaches.(using _32bit intrinsics_) (i also couldnt find it elsewhere)

so i thought i will post this.

Code: Select all


//****************************************************************************
//* DESCRIPTION	:	32Bit Version for _BitScanForward64						 *
//****************************************************************************

ULONG bsf64(BTB_T bb)
	{
	 UI_32 *const ptr = (UI_32*)&bb;
	 ULONG id=64;

	 _BitScanForward(&id,*(ptr+1));
	 id+=32;
	 _BitScanForward(&id,*ptr);

	 return(id);
	}

//****************************************************************************
//* DESCRIPTION	:	32Bit Version for _BitScanReverse64						 *
//****************************************************************************

ULONG bsr64(BTB_T bb)
	{
	 UI_32 *const ptr = (UI_32*)&bb;
	 ULONG id=64;

	 _BitScanReverse(&id,*(ptr+1)) ? id+=32 : _BitScanReverse(&id,*ptr);

	 return(id);
	}
These functions doing (much) better than debruijn or double conversion (and some others) _even_ on my amd K8 (where bitscans seem to be horror :shock: )

Cheers
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: 32 bit versions for bitscan64

Post by Desperado »

or even faster forward scan...

Code: Select all


//****************************************************************************
//* DESCRIPTION	:	32Bit Version for _BitScanForward64						 *
//****************************************************************************

ULONG bsf64(BTB_T bb)
	{
	 UI_32 *const ptr = (UI_32*)&bb;
	 ULONG id=64;

	 _BitScanForward(&id,*ptr) ? id : id += 31 + _BitScanForward(&id,*(ptr+1));

	 return(id);
	}

This can avoid the second scan...

with ASSERT(bb!=0) in all cases.

:wink:
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: 32 bit versions for bitscan64

Post by mcostalba »

For 32 bit we use the following:

Code: Select all

Square first_1(Bitboard b) {
  b ^= (b - 1);
  uint32_t fold = int(b) ^ int(b >> 32);
  return Square(BitTable[(fold * 0x783a9b23) >> 26]);
}
Instead when you need to pop first bit, i.e. scan and reset then:

Code: Select all

// Use type-punning
union b_union {

    Bitboard b;
    struct {
        uint32_t l;
        uint32_t h;
    } dw;
};

// WARNING: Needs -fno-strict-aliasing compiler option
Square pop_1st_bit(Bitboard* bb) {

  b_union u;
  uint32_t b;

  u.b = *bb;

  if (u.dw.l)
  {
      b = u.dw.l;
      *((uint32_t*)bb) = b & (b - 1);
      b ^= (b - 1);
  }
  else
  {
      b = u.dw.h;
      *((uint32_t*)bb+1) = b & (b - 1); // Little endian only?
      b = ~(b ^ (b - 1));
  }
  return Square(BitTable[(b * 0x783a9b23) >> 26]);
}

Do you think yours are faster ?
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: 32 bit versions for bitscan64

Post by Desperado »

Hi Marco,

nice to meet you again.

Code: Select all

//****************************************************************************
//* PROCEDURE	: bsf64														 *
//* DESCRIPTION	: 32 bit version (Matt Taylor's Folding trick)				 *
//****************************************************************************

ULONG bsf64_MattTaylor(BTB_T bb)
	{
	 const int lsb_64_table[64] =
		{
		 63, 30,  3, 32, 59, 14, 11, 33,
		 60, 24, 50,  9, 55, 19, 21, 34,
		 61, 29,  2, 53, 51, 23, 41, 18,
		 56, 28,  1, 43, 46, 27,  0, 35,
		 62, 31, 58,  4,  5, 49, 54,  6,
		 15, 52, 12, 40,  7, 42, 45, 16,
		 25, 57, 48, 13, 10, 39,  8, 44,
		 20, 47, 38, 22, 17, 37, 36, 26
		};

	 bb ^= (bb- 1);
     unsigned int folded = (int) bb ^ (bb >> 32);
   
	 return lsb_64_table[folded * 0x78291ACF >> 26];
	}

Code: Select all


//****************************************************************************
//* PROCEDURE	: bsf64														 *
//* DESCRIPTION	: 32 bit version											 *
//****************************************************************************

ULONG bsf64(BTB_T bb)
	{
	 UI_32 *const ptr = (UI_32*)&bb;
	 ULONG id=64;

	 _BitScanForward(&id,*ptr) ? id : id += 31 + (bool)_BitScanForward(&id,*(ptr+1));

	 return(id);
	}
At least for factor 3x faster than Matt, i would say almost factor 4.
And that on my _anti_ bitscan machine! (i hope i didnt do something totally wrong!, but i dont think so).

As you can see, i put a bool typecast before the bitscan, because the ms-reference tells _BitScan is returning 0 or _nonzero_ value, and it has to be _1_ of course.

...
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: 32 bit versions for bitscan64

Post by mcostalba »

Hi Michael,

actually the bitscan alone is not so important, it is called seldom.

Much more important is the pop_1st_bit() that is the _real_ routine used at 99.9% of cases.

How would you write that in this case ? I can speed test mine and the yours if you post it.

Thanks
Marco
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: 32 bit versions for bitscan64

Post by Desperado »

Hi Marco,

first i have to correct my last post !!!

Putting the table to global scope, everything changes!

The result seems to be equal(about)....(for _bitscan alone_)
BUT on _anti_ bitscan machine.

i will have a closer look on the pop_1st_bit routine, and post my results.

cheers
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: 32 bit versions for bitscan64

Post by Aleks Peshkov »

Code: Select all

    inline U32 lo&#40;U64 b&#41; &#123; return small_cast<U32>&#40;b&#41;; &#125;
    inline U32 hi&#40;U64 b&#41; &#123; return static_cast<U32>&#40;b >> 32&#41;; &#125;

    inline index_t bsf&#40;U64 value&#41; &#123;
        U32 lo = &#58;&#58;lo&#40;value&#41;;
        return bsf&#40;lo ? lo &#58; hi&#40;value&#41;) + &#40;lo ? 0 &#58; 32&#41;;
    &#125;
Sorry, for ++-ish, pure C-version is trivial.
Smart compiler can generate branchless x86-code.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: 32 bit versions for bitscan64

Post by mcostalba »

Aleks Peshkov wrote:

Code: Select all

    inline U32 lo&#40;U64 b&#41; &#123; return small_cast<U32>&#40;b&#41;; &#125;
    inline U32 hi&#40;U64 b&#41; &#123; return static_cast<U32>&#40;b >> 32&#41;; &#125;

    inline index_t bsf&#40;U64 value&#41; &#123;
        U32 lo = &#58;&#58;lo&#40;value&#41;;
        return bsf&#40;lo ? lo &#58; hi&#40;value&#41;) + &#40;lo ? 0 &#58; 32&#41;;
    &#125;
Sorry, for ++-ish, pure C-version is trivial.
Smart compiler can generate branchless x86-code.
Thanks for your version.

I am very interested in pop_first_1() function competitive with the above I posted. Have you some suggestion in this regard ?

Thanks in advance
Marco


BTW, why you didn't write directly ?

Code: Select all

    inline U32 lo&#40;U64 b&#41; &#123; return small_cast<U32>&#40;b&#41;; &#125;
    inline U32 hi&#40;U64 b&#41; &#123; return static_cast<U32>&#40;b >> 32&#41;; &#125;

    inline index_t bsf&#40;U64 value&#41; &#123;
        U32 lo = &#58;&#58;lo&#40;value&#41;;
        return lo ? bsf&#40;lo&#41; &#58; 32 + bsf&#40;hi&#40;value&#41;);
    &#125;
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: 32 bit versions for bitscan64

Post by Aleks Peshkov »

I am using C++ streams-like operator overloading

Code: Select all

    Self& operator += &#40;SelfArg a&#41; &#123; assert ((*this & a&#41;.none&#40;)); return *this ^= a; &#125;
    Self& operator -= &#40;SelfArg a&#41; &#123; assert ((*this & a&#41; == a&#41;;   return *this ^= a; &#125;
    Self& operator /= &#40;SelfArg a&#41; &#123; *this |= a; return *this -= a; &#125; //b = b & ~a

    friend bool operator >> &#40;Self& self, Index& i&#41; &#123;
        if &#40;self.none&#40;)) &#123; return false; &#125;
        i = self.first&#40;);
        self -= i;
        return true;
    &#125;

Generated code is far from optimal, but I like to manage compact STL-like code:

Code: Select all

    PieceSet pieces = op.nonPawns&#40;);
    while &#40;pieces >> moved_piece&#41; &#123;
        move_from = op&#91;moved_piece&#93;;
        BitBoard nonCaptures = op.movesOf&#40;moved_piece&#41; / occupied; //my and-not operator
        while &#40;nonCaptures >> move_to&#41; &#123;
            CUT&#40;makeMove&#40;));
        &#125;
    &#125;
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: 32 bit versions for bitscan64

Post by Desperado »

Aleks Peshkov wrote:

Code: Select all

    inline U32 lo&#40;U64 b&#41; &#123; return small_cast<U32>&#40;b&#41;; &#125;
    inline U32 hi&#40;U64 b&#41; &#123; return static_cast<U32>&#40;b >> 32&#41;; &#125;

    inline index_t bsf&#40;U64 value&#41; &#123;
        U32 lo = &#58;&#58;lo&#40;value&#41;;
        return bsf&#40;lo ? lo &#58; hi&#40;value&#41;) + &#40;lo ? 0 &#58; 32&#41;;
    &#125;
Sorry, for ++-ish, pure C-version is trivial.
Smart compiler can generate branchless x86-code.
Hi Alex,

maybe my mind is totally blocked, but i have some questions to
the above code.

1. why does bsf call itself ?
2. i cannot find a small_cast operator (although i know what it should do)
3. and what is the issue between the _trivial_ c version and the c++ version ? In other words, what makes the c++ version more complex than
the pure c version?
4.i thought explicit putting the (reduced)type in front of more complex type is enough for type conversion downwards.

Hope this text is understandable

Thx


EDIT: saw at the moment you last post, so dont need any answer for 3. :lol: