Discussion of chess software programming and technical issues.
Moderator: Ras
dangi12012
Posts: 1062 Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr
Post
by dangi12012 » Sun Apr 17, 2022 8:27 pm
Yet again I need to warm this thread up because many sliding piece algorithm improvements rely on these 4 masks:
This is the optimal form to generate all 4 rays without using any lookups:
Code: Select all
#define dir_HO(X) (0xFFull << (X & 56))
#define dir_VE(X) (0x0101010101010101ull << (X & 7))
#define dir_D1(X) (mask_shift<0x8040201008040201ull>((X & 7) - (X >> 3)))
#define dir_D2(X) (mask_shift<0x0102040810204080ull>(7 - (X & 7) - (X >> 3)))
template<uint64_t bb>
static constexpr uint64_t mask_shift(int ranks) {
return ranks > 0 ? bb >> (ranks << 3) : bb << -(ranks << 3);
}
Is there any way to have X be a Bitmask (a single set bit corresponding to 1ull << square) and still generate these masks efficiently.
So not a solution would be these two:
kogge-stone(1ull << square)
std::countlzero(1ull << square) == square
My question for everyone here:
Do you see any way to generate these 4 masks starting from a single set bit as efficiently as in the macro?
tcusr
Posts: 325 Joined: Tue Aug 31, 2021 10:32 pm
Full name: Mateo
Post
by tcusr » Sun Apr 17, 2022 9:42 pm
dangi12012 wrote: ↑ Sun Apr 17, 2022 8:27 pm
Yet again I need to warm this thread up because many sliding piece algorithm improvements rely on these 4 masks:
This is the optimal form to generate all 4 rays without using any lookups:
Code: Select all
#define dir_HO(X) (0xFFull << (X & 56))
#define dir_VE(X) (0x0101010101010101ull << (X & 7))
#define dir_D1(X) (mask_shift<0x8040201008040201ull>((X & 7) - (X >> 3)))
#define dir_D2(X) (mask_shift<0x0102040810204080ull>(7 - (X & 7) - (X >> 3)))
template<uint64_t bb>
static constexpr uint64_t mask_shift(int ranks) {
return ranks > 0 ? bb >> (ranks << 3) : bb << -(ranks << 3);
}
Is there any way to have X be a Bitmask (a single set bit corresponding to 1ull << square) and still generate these masks efficiently.
So not a solution would be these two:
kogge-stone(1ull << square)
std::countlzero(1ull << square) == square
My question for everyone here:
Do you see any way to generate these 4 masks starting from a single set bit as efficiently as in the macro?
https://www.chessprogramming.org/Genera ... iplication
https://www.chessprogramming.org/Kindergarten_Bitboards
this may be useful but i don't think you're going to have much luck finding ways to generate other rays
dangi12012
Posts: 1062 Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr
Post
by dangi12012 » Mon Apr 18, 2022 12:56 am
Multiplication could indeed work - but a IMUL64 is many times slower than normal bitwise arithmetic.
Lets bring out the big guns: (Release date 1988)
Code: Select all
# UC Berkeley, Espresso Version #2.3, Release date 01/31/88
# PLA is function.txt with 64 inputs and 64 outputs
# ON-set cost is c=64(64) in=4096 out=448 tot=4544
# OFF-set cost is c=64(64) in=4096 out=3648 tot=7744
# DC-set cost is c=2017(1074) in=4096 out=129088 tot=133184
# ESPRESSO Time was 0.00 sec, cost is c=64(0) in=64 out=448 tot=512
.i 64
.o 64
.ilb a63 a62 a61 a60 a59 a58 a57 a56 a55 a54 a53 a52 a51 a50 a49 a48 a47 a46 a45 a44 a43 a42 a41 a40 a39 a38 a37 a36 a35 a34 a33 a32 a31 a30 a29 a28 a27 a26 a25 a24 a23 a22 a21 a20 a19 a18 a17 a16 a15 a14 a13 a12 a11 a10 a09 a08 a07 a06 a05 a04 a03 a02 a01 a00
.ob b63 b62 b61 b60 b59 b58 b57 b56 b55 b54 b53 b52 b51 b50 b49 b48 b47 b46 b45 b44 b43 b42 b41 b40 b39 b38 b37 b36 b35 b34 b33 b32 b31 b30 b29 b28 b27 b26 b25 b24 b23 b22 b21 b20 b19 b18 b17 b16 b15 b14 b13 b12 b11 b10 b09 b08 b07 b06 b05 b04 b03 b02 b01 b00
.p 64
--------------------------------------------------------1------- 1000000010000000100000001000000010000000100000001000000000000000
---------------------------------------------------------1------ 0100000001000000010000000100000001000000010000000100000000000000
----------------------------------------------------------1----- 0010000000100000001000000010000000100000001000000010000000000000
-----------------------------------------------------------1---- 0001000000010000000100000001000000010000000100000001000000000000
------------------------------------------------------------1--- 0000100000001000000010000000100000001000000010000000100000000000
-------------------------------------------------------------1-- 0000010000000100000001000000010000000100000001000000010000000000
--------------------------------------------------------------1- 0000001000000010000000100000001000000010000000100000001000000000
---------------------------------------------------------------1 0000000100000001000000010000000100000001000000010000000100000000
------------------------------------------------1--------------- 1000000010000000100000001000000010000000100000000000000010000000
----------------------------------------1----------------------- 1000000010000000100000001000000010000000000000001000000010000000
--------------------------------1------------------------------- 1000000010000000100000001000000000000000100000001000000010000000
------------------------1--------------------------------------- 1000000010000000100000000000000010000000100000001000000010000000
----------------1----------------------------------------------- 1000000010000000000000001000000010000000100000001000000010000000
--------1------------------------------------------------------- 1000000000000000100000001000000010000000100000001000000010000000
1--------------------------------------------------------------- 0000000010000000100000001000000010000000100000001000000010000000
-------------------------------------------------1-------------- 0100000001000000010000000100000001000000010000000000000001000000
-----------------------------------------1---------------------- 0100000001000000010000000100000001000000000000000100000001000000
---------------------------------1------------------------------ 0100000001000000010000000100000000000000010000000100000001000000
-------------------------1-------------------------------------- 0100000001000000010000000000000001000000010000000100000001000000
-----------------1---------------------------------------------- 0100000001000000000000000100000001000000010000000100000001000000
---------1------------------------------------------------------ 0100000000000000010000000100000001000000010000000100000001000000
-1-------------------------------------------------------------- 0000000001000000010000000100000001000000010000000100000001000000
--------------------------------------------------1------------- 0010000000100000001000000010000000100000001000000000000000100000
------------------------------------------1--------------------- 0010000000100000001000000010000000100000000000000010000000100000
----------------------------------1----------------------------- 0010000000100000001000000010000000000000001000000010000000100000
--------------------------1------------------------------------- 0010000000100000001000000000000000100000001000000010000000100000
------------------1--------------------------------------------- 0010000000100000000000000010000000100000001000000010000000100000
----------1----------------------------------------------------- 0010000000000000001000000010000000100000001000000010000000100000
--1------------------------------------------------------------- 0000000000100000001000000010000000100000001000000010000000100000
---------------------------------------------------1------------ 0001000000010000000100000001000000010000000100000000000000010000
-------------------------------------------1-------------------- 0001000000010000000100000001000000010000000000000001000000010000
-----------------------------------1---------------------------- 0001000000010000000100000001000000000000000100000001000000010000
---------------------------1------------------------------------ 0001000000010000000100000000000000010000000100000001000000010000
-------------------1-------------------------------------------- 0001000000010000000000000001000000010000000100000001000000010000
-----------1---------------------------------------------------- 0001000000000000000100000001000000010000000100000001000000010000
---1------------------------------------------------------------ 0000000000010000000100000001000000010000000100000001000000010000
----------------------------------------------------1----------- 0000100000001000000010000000100000001000000010000000000000001000
--------------------------------------------1------------------- 0000100000001000000010000000100000001000000000000000100000001000
------------------------------------1--------------------------- 0000100000001000000010000000100000000000000010000000100000001000
----------------------------1----------------------------------- 0000100000001000000010000000000000001000000010000000100000001000
--------------------1------------------------------------------- 0000100000001000000000000000100000001000000010000000100000001000
------------1--------------------------------------------------- 0000100000000000000010000000100000001000000010000000100000001000
----1----------------------------------------------------------- 0000000000001000000010000000100000001000000010000000100000001000
-----------------------------------------------------1---------- 0000010000000100000001000000010000000100000001000000000000000100
---------------------------------------------1------------------ 0000010000000100000001000000010000000100000000000000010000000100
-------------------------------------1-------------------------- 0000010000000100000001000000010000000000000001000000010000000100
-----------------------------1---------------------------------- 0000010000000100000001000000000000000100000001000000010000000100
---------------------1------------------------------------------ 0000010000000100000000000000010000000100000001000000010000000100
-------------1-------------------------------------------------- 0000010000000000000001000000010000000100000001000000010000000100
-----1---------------------------------------------------------- 0000000000000100000001000000010000000100000001000000010000000100
------------------------------------------------------1--------- 0000001000000010000000100000001000000010000000100000000000000010
----------------------------------------------1----------------- 0000001000000010000000100000001000000010000000000000001000000010
--------------------------------------1------------------------- 0000001000000010000000100000001000000000000000100000001000000010
------------------------------1--------------------------------- 0000001000000010000000100000000000000010000000100000001000000010
----------------------1----------------------------------------- 0000001000000010000000000000001000000010000000100000001000000010
--------------1------------------------------------------------- 0000001000000000000000100000001000000010000000100000001000000010
------1--------------------------------------------------------- 0000000000000010000000100000001000000010000000100000001000000010
-------------------------------------------------------1-------- 0000000100000001000000010000000100000001000000010000000000000001
-----------------------------------------------1---------------- 0000000100000001000000010000000100000001000000000000000100000001
---------------------------------------1------------------------ 0000000100000001000000010000000100000000000000010000000100000001
-------------------------------1-------------------------------- 0000000100000001000000010000000000000001000000010000000100000001
-----------------------1---------------------------------------- 0000000100000001000000000000000100000001000000010000000100000001
---------------1------------------------------------------------ 0000000100000000000000010000000100000001000000010000000100000001
-------1-------------------------------------------------------- 0000000000000001000000010000000100000001000000010000000100000001
Alternative form:
Code: Select all
b63 = (a07) | (a15) | (a23) | (a31) | (a39) | (a47) | (a55);
b62 = (a06) | (a14) | (a22) | (a30) | (a38) | (a46) | (a54);
b61 = (a05) | (a13) | (a21) | (a29) | (a37) | (a45) | (a53);
b60 = (a04) | (a12) | (a20) | (a28) | (a36) | (a44) | (a52);
b59 = (a03) | (a11) | (a19) | (a27) | (a35) | (a43) | (a51);
b58 = (a02) | (a10) | (a18) | (a26) | (a34) | (a42) | (a50);
b57 = (a01) | (a09) | (a17) | (a25) | (a33) | (a41) | (a49);
b56 = (a00) | (a08) | (a16) | (a24) | (a32) | (a40) | (a48);
b55 = (a07) | (a15) | (a23) | (a31) | (a39) | (a47) | (a63);
b54 = (a06) | (a14) | (a22) | (a30) | (a38) | (a46) | (a62);
b53 = (a05) | (a13) | (a21) | (a29) | (a37) | (a45) | (a61);
b52 = (a04) | (a12) | (a20) | (a28) | (a36) | (a44) | (a60);
b51 = (a03) | (a11) | (a19) | (a27) | (a35) | (a43) | (a59);
b50 = (a02) | (a10) | (a18) | (a26) | (a34) | (a42) | (a58);
b49 = (a01) | (a09) | (a17) | (a25) | (a33) | (a41) | (a57);
b48 = (a00) | (a08) | (a16) | (a24) | (a32) | (a40) | (a56);
b47 = (a07) | (a15) | (a23) | (a31) | (a39) | (a55) | (a63);
b46 = (a06) | (a14) | (a22) | (a30) | (a38) | (a54) | (a62);
b45 = (a05) | (a13) | (a21) | (a29) | (a37) | (a53) | (a61);
b44 = (a04) | (a12) | (a20) | (a28) | (a36) | (a52) | (a60);
b43 = (a03) | (a11) | (a19) | (a27) | (a35) | (a51) | (a59);
b42 = (a02) | (a10) | (a18) | (a26) | (a34) | (a50) | (a58);
b41 = (a01) | (a09) | (a17) | (a25) | (a33) | (a49) | (a57);
b40 = (a00) | (a08) | (a16) | (a24) | (a32) | (a48) | (a56);
b39 = (a07) | (a15) | (a23) | (a31) | (a47) | (a55) | (a63);
b38 = (a06) | (a14) | (a22) | (a30) | (a46) | (a54) | (a62);
b37 = (a05) | (a13) | (a21) | (a29) | (a45) | (a53) | (a61);
b36 = (a04) | (a12) | (a20) | (a28) | (a44) | (a52) | (a60);
b35 = (a03) | (a11) | (a19) | (a27) | (a43) | (a51) | (a59);
b34 = (a02) | (a10) | (a18) | (a26) | (a42) | (a50) | (a58);
b33 = (a01) | (a09) | (a17) | (a25) | (a41) | (a49) | (a57);
b32 = (a00) | (a08) | (a16) | (a24) | (a40) | (a48) | (a56);
b31 = (a07) | (a15) | (a23) | (a39) | (a47) | (a55) | (a63);
b30 = (a06) | (a14) | (a22) | (a38) | (a46) | (a54) | (a62);
b29 = (a05) | (a13) | (a21) | (a37) | (a45) | (a53) | (a61);
b28 = (a04) | (a12) | (a20) | (a36) | (a44) | (a52) | (a60);
b27 = (a03) | (a11) | (a19) | (a35) | (a43) | (a51) | (a59);
b26 = (a02) | (a10) | (a18) | (a34) | (a42) | (a50) | (a58);
b25 = (a01) | (a09) | (a17) | (a33) | (a41) | (a49) | (a57);
b24 = (a00) | (a08) | (a16) | (a32) | (a40) | (a48) | (a56);
b23 = (a07) | (a15) | (a31) | (a39) | (a47) | (a55) | (a63);
b22 = (a06) | (a14) | (a30) | (a38) | (a46) | (a54) | (a62);
b21 = (a05) | (a13) | (a29) | (a37) | (a45) | (a53) | (a61);
b20 = (a04) | (a12) | (a28) | (a36) | (a44) | (a52) | (a60);
b19 = (a03) | (a11) | (a27) | (a35) | (a43) | (a51) | (a59);
b18 = (a02) | (a10) | (a26) | (a34) | (a42) | (a50) | (a58);
b17 = (a01) | (a09) | (a25) | (a33) | (a41) | (a49) | (a57);
b16 = (a00) | (a08) | (a24) | (a32) | (a40) | (a48) | (a56);
b15 = (a07) | (a23) | (a31) | (a39) | (a47) | (a55) | (a63);
b14 = (a06) | (a22) | (a30) | (a38) | (a46) | (a54) | (a62);
b13 = (a05) | (a21) | (a29) | (a37) | (a45) | (a53) | (a61);
b12 = (a04) | (a20) | (a28) | (a36) | (a44) | (a52) | (a60);
b11 = (a03) | (a19) | (a27) | (a35) | (a43) | (a51) | (a59);
b10 = (a02) | (a18) | (a26) | (a34) | (a42) | (a50) | (a58);
b09 = (a01) | (a17) | (a25) | (a33) | (a41) | (a49) | (a57);
b08 = (a00) | (a16) | (a24) | (a32) | (a40) | (a48) | (a56);
b07 = (a15) | (a23) | (a31) | (a39) | (a47) | (a55) | (a63);
b06 = (a14) | (a22) | (a30) | (a38) | (a46) | (a54) | (a62);
b05 = (a13) | (a21) | (a29) | (a37) | (a45) | (a53) | (a61);
b04 = (a12) | (a20) | (a28) | (a36) | (a44) | (a52) | (a60);
b03 = (a11) | (a19) | (a27) | (a35) | (a43) | (a51) | (a59);
b02 = (a10) | (a18) | (a26) | (a34) | (a42) | (a50) | (a58);
b01 = (a09) | (a17) | (a25) | (a33) | (a41) | (a49) | (a57);
b00 = (a08) | (a16) | (a24) | (a32) | (a40) | (a48) | (a56);
Now if someone knows how to convert a truth table or the second eqntott form into efficient C++ I would be glad to hear it!
a40 would be the 40th bit of the input.
What is visible in above output is that we have shifts + masks that *should* produce the correct output - but what is the solution?
For example b03 = 11th bit or 19th bit etc. of the input is set. We have 64 bit registers so we can solve all 64 output bits at once.
dangi12012
Posts: 1062 Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr
Post
by dangi12012 » Mon Apr 18, 2022 1:38 am
More clearer its to see the horizontal when (1ull << square) is also set:
The "-" means any of those are 1 then the output on the right is the correct one:
Code: Select all
--------00000000000000000000000000000000000000000000000000000000 1111111100000000000000000000000000000000000000000000000000000000
00000000--------000000000000000000000000000000000000000000000000 0000000011111111000000000000000000000000000000000000000000000000
0000000000000000--------0000000000000000000000000000000000000000 0000000000000000111111110000000000000000000000000000000000000000
000000000000000000000000--------00000000000000000000000000000000 0000000000000000000000001111111100000000000000000000000000000000
00000000000000000000000000000000--------000000000000000000000000 0000000000000000000000000000000011111111000000000000000000000000
0000000000000000000000000000000000000000--------0000000000000000 0000000000000000000000000000000000000000111111110000000000000000
000000000000000000000000000000000000000000000000--------00000000 0000000000000000000000000000000000000000000000001111111100000000
00000000000000000000000000000000000000000000000000000000-------- 0000000000000000000000000000000000000000000000000000000011111111
But how would a c++ function without any conditions achieve this?
dangi12012
Posts: 1062 Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr
Post
by dangi12012 » Mon Apr 18, 2022 7:37 pm
Code: Select all
........................................................1111111.
........................................................111111.1
........................................................11111.11
........................................................1111.111
........................................................111.1111
........................................................11.11111
........................................................1.111111
.........................................................1111111
................................................1111111.........
................................................111111.1........
................................................11111.11........
................................................1111.111........
................................................111.1111........
................................................11.11111........
................................................1.111111........
.................................................1111111........
........................................1111111.................
........................................111111.1................
........................................11111.11................
........................................1111.111................
........................................111.1111................
........................................11.11111................
........................................1.111111................
.........................................1111111................
................................1111111.........................
................................111111.1........................
................................11111.11........................
................................1111.111........................
................................111.1111........................
................................11.11111........................
................................1.111111........................
.................................1111111........................
........................1111111.................................
........................111111.1................................
........................11111.11................................
........................1111.111................................
........................111.1111................................
........................11.11111................................
........................1.111111................................
.........................1111111................................
................1111111.........................................
................111111.1........................................
................11111.11........................................
................1111.111........................................
................111.1111........................................
................11.11111........................................
................1.111111........................................
.................1111111........................................
........1111111.................................................
........111111.1................................................
........11111.11................................................
........1111.111................................................
........111.1111................................................
........11.11111................................................
........1.111111................................................
.........1111111................................................
1111111.........................................................
111111.1........................................................
11111.11........................................................
1111.111........................................................
111.1111........................................................
11.11111........................................................
1.111111........................................................
.1111111........................................................
Problem is I can build code with that. Its just not very efficient. The trick would be to find a function that would shift + and to solve all bits at once with a few lines of branchless code...
So how can this be optimize into a branchless function?
Code: Select all
constexpr uint64_t dir_HO(uint64_t SquareBit){
uint64_t result = 0ull;
if((SquareBit & 1ull)) result |= 254ull;
if((SquareBit & 2ull)) result |= 253ull;
if((SquareBit & 4ull)) result |= 251ull;
if((SquareBit & 8ull)) result |= 247ull;
if((SquareBit & 16ull)) result |= 239ull;
if((SquareBit & 32ull)) result |= 223ull;
if((SquareBit & 64ull)) result |= 191ull;
if((SquareBit & 128ull)) result |= 127ull;
if((SquareBit & 256ull)) result |= 65024ull;
if((SquareBit & 512ull)) result |= 64768ull;
if((SquareBit & 1024ull)) result |= 64256ull;
if((SquareBit & 2048ull)) result |= 63232ull;
if((SquareBit & 4096ull)) result |= 61184ull;
if((SquareBit & 8192ull)) result |= 57088ull;
if((SquareBit & 16384ull)) result |= 48896ull;
if((SquareBit & 32768ull)) result |= 32512ull;
if((SquareBit & 65536ull)) result |= 16646144ull;
if((SquareBit & 131072ull)) result |= 16580608ull;
if((SquareBit & 262144ull)) result |= 16449536ull;
if((SquareBit & 524288ull)) result |= 16187392ull;
if((SquareBit & 1048576ull)) result |= 15663104ull;
if((SquareBit & 2097152ull)) result |= 14614528ull;
if((SquareBit & 4194304ull)) result |= 12517376ull;
if((SquareBit & 8388608ull)) result |= 8323072ull;
if((SquareBit & 16777216ull)) result |= 4261412864ull;
if((SquareBit & 33554432ull)) result |= 4244635648ull;
if((SquareBit & 67108864ull)) result |= 4211081216ull;
if((SquareBit & 134217728ull)) result |= 4143972352ull;
if((SquareBit & 268435456ull)) result |= 4009754624ull;
if((SquareBit & 536870912ull)) result |= 3741319168ull;
if((SquareBit & 1073741824ull)) result |= 3204448256ull;
if((SquareBit & 2147483648ull)) result |= 2130706432ull;
if((SquareBit & 4294967296ull)) result |= 1090921693184ull;
if((SquareBit & 8589934592ull)) result |= 1086626725888ull;
if((SquareBit & 17179869184ull)) result |= 1078036791296ull;
if((SquareBit & 34359738368ull)) result |= 1060856922112ull;
if((SquareBit & 68719476736ull)) result |= 1026497183744ull;
if((SquareBit & 137438953472ull)) result |= 957777707008ull;
if((SquareBit & 274877906944ull)) result |= 820338753536ull;
if((SquareBit & 549755813888ull)) result |= 545460846592ull;
if((SquareBit & 1099511627776ull)) result |= 279275953455104ull;
if((SquareBit & 2199023255552ull)) result |= 278176441827328ull;
if((SquareBit & 4398046511104ull)) result |= 275977418571776ull;
if((SquareBit & 8796093022208ull)) result |= 271579372060672ull;
if((SquareBit & 17592186044416ull)) result |= 262783279038464ull;
if((SquareBit & 35184372088832ull)) result |= 245191092994048ull;
if((SquareBit & 70368744177664ull)) result |= 210006720905216ull;
if((SquareBit & 140737488355328ull)) result |= 139637976727552ull;
if((SquareBit & 281474976710656ull)) result |= 71494644084506624ull;
if((SquareBit & 562949953421312ull)) result |= 71213169107795968ull;
if((SquareBit & 1125899906842624ull)) result |= 70650219154374656ull;
if((SquareBit & 2251799813685248ull)) result |= 69524319247532032ull;
if((SquareBit & 4503599627370496ull)) result |= 67272519433846784ull;
if((SquareBit & 9007199254740992ull)) result |= 62768919806476288ull;
if((SquareBit & 18014398509481984ull)) result |= 53761720551735296ull;
if((SquareBit & 36028797018963968ull)) result |= 35747322042253312ull;
if((SquareBit & 72057594037927936ull)) result |= 18302628885633695744ull;
if((SquareBit & 144115188075855872ull)) result |= 18230571291595767808ull;
if((SquareBit & 288230376151711744ull)) result |= 18086456103519911936ull;
if((SquareBit & 576460752303423488ull)) result |= 17798225727368200192ull;
if((SquareBit & 1152921504606846976ull)) result |= 17221764975064776704ull;
if((SquareBit & 2305843009213693952ull)) result |= 16068843470457929728ull;
if((SquareBit & 4611686018427387904ull)) result |= 13763000461244235776ull;
if((SquareBit & 9223372036854775808ull)) result |= 9151314442816847872ull;
return result;
}
}
The general question for everyone here is "How to transform a truth table into C++".
I dont see an easy answer
tcusr
Posts: 325 Joined: Tue Aug 31, 2021 10:32 pm
Full name: Mateo
Post
by tcusr » Mon Apr 18, 2022 8:28 pm
...
tcusr
Posts: 325 Joined: Tue Aug 31, 2021 10:32 pm
Full name: Mateo
Post
by tcusr » Mon Apr 18, 2022 9:13 pm
tcusr wrote: ↑ Mon Apr 18, 2022 8:28 pm ...
i had posted an incorrect version, this works
Code: Select all
uint64_t dir_HO(uint64_t n){
uint64_t result = 0ull;
for (int i = 0; i < 8; ++i) {
auto byte = (n >> 8 * i) & 255;
if (byte)
result |= (255 - byte) << 8 * i;
}
return result;
}
the branch is needed because if the "ith" byte is empty then we should OR with 0 and not 255
tcusr
Posts: 325 Joined: Tue Aug 31, 2021 10:32 pm
Full name: Mateo
Post
by tcusr » Mon Apr 18, 2022 9:59 pm
wait, i was so focused on making it work that i forgot that n only contains 1 bit. now i'll try to see other ways
dangi12012
Posts: 1062 Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr
Post
by dangi12012 » Mon Apr 18, 2022 10:09 pm
tcusr wrote: ↑ Mon Apr 18, 2022 9:59 pm
wait, i was so focused on making it work that i forgot that n only contains 1 bit. now i'll try to see other ways
Thanks - in the meantime I found these solutions - but they are not solutions because they all contain std::countr_zero / countl_zero.
I would like to avoid these because they have high latency
Code: Select all
//Solved! : (mask ^ (0xFFull << (countr_zero(mask) & countl_zero(0xFFull)))) #1702759519
//Solved!: (mask ^ (0xFFull << (countr_zero(mask) & -popcount(0xFFull)))) #2239433823
//Solved!: ((0xFFull << (countl_zero(0xFFull) & countr_zero(mask))) - mask) #22934905037
//Solved!: ((0xFFull << (countl_zero(0xFFull) & countr_zero(mask))) ^ mask) #22934905039
//Solved!: (mask ^ (0xFFull << (countl_zero(0xFFull) & countr_zero(mask)))) #23361096799
//Solved!: (mask ^ (0xFFull << (countr_zero(mask) & -popcount(0x0102040810204080ull)))) #15124335711
//Solved!: (mask ^ (0xFFull << (countr_zero(mask) & -popcount(0x0101010101010101ull)))) #6534401119
//Solved!: (mask ^ (0xFFull << (countr_zero(mask) & (countl_zero(0xFFull) & 0xFFull)))) #29620046943
//Solved!: (mask ^ (0xFFull << (countr_zero(mask) & -(popcount(0xFFull) & 0xFFull)))) #38209784927
//Solved!: (mask ^ (0xFFull << ((countr_zero(mask) & -popcount(0xFFull)) & 0xFFull))) #35831811167
//Solved!: (mask ^ (0xFFull << ((countr_zero(mask) & countl_zero(0xFFull)) & 0xFFull))) #27245022303
A bit native algorithm would be able to generate all 4 masks without using the instructions above (BSF, BSR which is - countr_zero/ countl_zero) because they are quite slow. Modern processors can execute many many of the simpler shift, and, or, xor, operations per single clock tick.
So im looking forward if you find something
But it could be that such a algo does not exists without having the square in 0...64 form. If thats the case it could be that a simple perfect hash is fastest. ([64 single set bitmasks] * magic >> (64 - 6)) = index into a struct array which contains all 4 masks. Might be faster than a single countlzero.
tcusr
Posts: 325 Joined: Tue Aug 31, 2021 10:32 pm
Full name: Mateo
Post
by tcusr » Mon Apr 18, 2022 10:55 pm
i found the pattern
Code: Select all
def find_byte(n):
for i in range(0, 8):
if (n >> 8 * i) & 0xff:
return i + 1
def dir_HO(n):
b = find_byte(n)
return (1 << b * 8) - n - (1 << (b - 1) * 8)
for n in range(0, 64):
print(1 << n, " -> ", dir_HO(1 << n))
the only problem now is that we have to find a fast function that finds the byte in which a bit is placed