pdep/pext for 128-bit integers and bit-arrays

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

pdep/pext for 128-bit integers and bit-arrays

Post by Rein Halbersma »

How can one write efficient versions of pdep_u128 and pext_u128? I'm aware of the serial versions for pdep_u64 and pext_u64:
https://www.chessprogramming.org/BMI2#S ... ementation
https://www.chessprogramming.org/BMI2#S ... entation_2

On platforms where there are 128-bit integers, all the arithmetic operations are available so the serial implementations should extend to 128-bit integers as well. But this isn't as straightforward for u64[2] or the more general version for bit-arrays u64[N].

So I wonder if it's possible to combine the 64-bit intrinsics into combined 128-bit versions (and suitable generalizations for bit-arrays). For things like popcount/countl_zero/countr_zero it's relatively easy to write bit-array versions as simple loops over the bit-arrays. But pdep/pext seem to have "carry" properties much like multiplication.

Did anyone ever experiment with this?
Sopel
Posts: 389
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: pdep/pext for 128-bit integers and bit-arrays

Post by Sopel »

I'm not certain, but these should work. Untested.

Code: Select all

void pext128(int128_t a, int128_t b)
{
    int128_t out0 = pext(a.low64, b.low64);
    int128_t out1 = pext(a.high64, b.high64);
    return out0 | (out1 << popcount(b.low64));
}

Code: Select all

void pdep128(int128_t a, int128_t b)
{
    int128_t out0 = pdep(a.low64, b.low64);
    int128_t out1 = pdep((a >> popcount(b.low64)).low64, b.high64);
    return out0 | (out1 << 64);
}
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: pdep/pext for 128-bit integers and bit-arrays

Post by Rein Halbersma »

Sopel wrote: Fri Dec 03, 2021 12:32 pm I'm not certain, but these should work. Untested.

Code: Select all

void pext128(int128_t a, int128_t b)
{
    int128_t out0 = pext(a.low64, b.low64);
    int128_t out1 = pext(a.high64, b.high64);
    return out0 | (out1 << popcount(b.low64));
}

Code: Select all

void pdep128(int128_t a, int128_t b)
{
    int128_t out0 = pdep(a.low64, b.low64);
    int128_t out1 = pdep((a >> popcount(b.low64)).low64, b.high64);
    return out0 | (out1 << 64);
}
Interesting ideas! However, suppose a.low64 has more bits than b.low64 can accomodate, then the remaining bits in a.low64 should have priority to fill b.high64 over the bits in a.high64, no?
Sopel
Posts: 389
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: pdep/pext for 128-bit integers and bit-arrays

Post by Sopel »

Rein Halbersma wrote: Fri Dec 03, 2021 1:55 pm
Sopel wrote: Fri Dec 03, 2021 12:32 pm I'm not certain, but these should work. Untested.

Code: Select all

void pext128(int128_t a, int128_t b)
{
    int128_t out0 = pext(a.low64, b.low64);
    int128_t out1 = pext(a.high64, b.high64);
    return out0 | (out1 << popcount(b.low64));
}

Code: Select all

void pdep128(int128_t a, int128_t b)
{
    int128_t out0 = pdep(a.low64, b.low64);
    int128_t out1 = pdep((a >> popcount(b.low64)).low64, b.high64);
    return out0 | (out1 << 64);
}
Interesting ideas! However, suppose a.low64 has more bits than b.low64 can accomodate, then the remaining bits in a.low64 should have priority to fill b.high64 over the bits in a.high64, no?
I assume you mean in pdep? That's I shift the a by the amount of bits taken according to b.low64, such that the next bits to take start from the least significant bit. To clarify, b is the mask in this code.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: pdep/pext for 128-bit integers and bit-arrays

Post by dangi12012 »

Rein Halbersma wrote: Fri Dec 03, 2021 1:55 pm
Sopel wrote: Fri Dec 03, 2021 12:32 pm I'm not certain, but these should work. Untested.

Code: Select all

void pext128(int128_t a, int128_t b)
{
    int128_t out0 = pext(a.low64, b.low64);
    int128_t out1 = pext(a.high64, b.high64);
    return out0 | (out1 << popcount(b.low64));
}

Code: Select all

void pdep128(int128_t a, int128_t b)
{
    int128_t out0 = pdep(a.low64, b.low64);
    int128_t out1 = pdep((a >> popcount(b.low64)).low64, b.high64);
    return out0 | (out1 << 64);
}
Interesting ideas! However, suppose a.low64 has more bits than b.low64 can accomodate, then the remaining bits in a.low64 should have priority to fill b.high64 over the bits in a.high64, no?
Sopels solution is sound. You have 2x 64 bit masks and the solution is to pext/pedp the lower part - and the higher part needs to be shifted by the bits you already extracted/deposited. It would be clearer if inedexes are used and not the names.
Its the perfect solution.

I personally would use __m128 registers unioned with uint64_t generally as you also get access to other cool instructions there - but i am biased towards desktop x86-x64 environments.

Code: Select all

typedef union __declspec(intrin_type) __declspec(align(16)) __m128 {
     float               m128_f32[4];
     unsigned __int64    m128_u64[2];
     __int8              m128_i8[16];
     __int16             m128_i16[8];
     __int32             m128_i32[4];
     __int64             m128_i64[2];
     unsigned __int8     m128_u8[16];
     unsigned __int16    m128_u16[8];
     unsigned __int32    m128_u32[4];
 } __m128;
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: pdep/pext for 128-bit integers and bit-arrays

Post by Rein Halbersma »

Sopel wrote: Fri Dec 03, 2021 2:09 pm
Rein Halbersma wrote: Fri Dec 03, 2021 1:55 pm
Interesting ideas! However, suppose a.low64 has more bits than b.low64 can accomodate, then the remaining bits in a.low64 should have priority to fill b.high64 over the bits in a.high64, no?
I assume you mean in pdep? That's I shift the a by the amount of bits taken according to b.low64, such that the next bits to take start from the least significant bit. To clarify, b is the mask in this code.
I think you're right. I'll have time to test it later this weekend hopefully.