Hypercube Slider Lookup - New Sliding Piece Algorithm [RELEASE]

dangi12012 · Post by **dangi12012** » Sat Jan 01, 2022 11:51 pm

Sven wrote: ↑Sat Jan 01, 2022 3:12 pm The range of values returned by rand() depends on the constant RAND_MAX. "This value is library-dependent, but is guaranteed to be at least 32767 on any standard library implementation." https://www.cplusplus.com/reference/cstdlib/RAND_MAX/ So even the assumption of having 31 "real" bits can be wrong.

This code does not use rand() at all but the very good and well known xorshift algorithm with the same seed per algorithm to obtain the same occupancy for a fair comparison.

dangi12012 · Post by **dangi12012** » Sun Jan 02, 2022 12:16 am

Update - Here is the compiler explorer version:
https://godbolt.org/z/MqxanTecG

Also for quick understanding of this movegen:
The idea is that each square can have a configuration associated with it. Because each square can be a rook, bishop, or a knight (pawns and king are a subset) - this is every square that could ever be relevant for any piece thats there (knight is omitted for simplicity):

So we only need to have one static lookup array per piece - and 64 variable configuration numbers.
When a piece moves we just update the relevant configuration numbers.
This is literally calling 2x into a jumptable with this code per move:

Code: Select all

        and     byte ptr [rip + cfgR], -9
        and     byte ptr [rip + cfgR+2], -5
        and     byte ptr [rip + cfgR+4], -5
        and     byte ptr [rip + cfgR+6], -5
        and     byte ptr [rip + cfgR+10], -9
        and     byte ptr [rip + cfgR+12], -9
        and     byte ptr [rip + cfgR+14], -9

And now the hypercube lookup enables us to get the rook and the bishop on any square on the board for the lookup cost of a knight!

Code: Select all

uint64_t Rook(int sq) {
	return RookMoves[sq][cfgR[sq]];
}
        

Rook(int):                               # @Rook(int)
mov     rax, qword ptr [rip + RookMoves]
movsxd  rcx, edi
mov     rax, qword ptr [rax + 8*rcx]
movzx   ecx, word ptr [rcx + rcx + cfgR]
mov     rax, qword ptr [rax + 8*rcx]
ret

The additional advantage is that this is infinitely expandable. So for example the color of the squares can also easily taken into account for the configuration number. Then the lookup can contain only taking moves or whatever you wish...
But for that the generator code is needed which takes in a std::function and spits out the c++ code for the constexpr void SetSquare(int sq) function. This is also the way I generated it and why this is binary compatible with PEXT.

mar · Post by **mar** » Sun Jan 02, 2022 3:26 am

if you want to save extra bounds check on each switch at runtime, make the default case unreachable.
for gcc/clang it's __builtin_unreachable() and for msvc it's __assume(0)
not that it would help with the performance much...

also how big are the rook/bishop tables? since they consume over 90% of the source code
why I'm asking: could the tables be computed at runtime when the engine starts? because nobody would be willing
to include 4MB worth of source code.

your approach might still be interesting, like when computing mobility in HCE engines - might be worth trying it in an actual engine and then
measure the final performance to see how it fares against magic/pext

dangi12012 · Post by **dangi12012** » Thu Jan 06, 2022 3:40 pm

Question:
Who do I contact for an entry in the wiki for this sliding lookup?

I will release the generator code soon. - This will be best for mailbox engines as this removes any and all branches against any and all squares for occupancy and enemy color.

So you can define a lambda that takes color and occupancy -> and it will generate C++ code you can insert into makemove that incrementally generates the offsets into the lookup table of your choice. (like the incremental branchless pext i chose here).

Also I changed to an array of function pointers - i am not fond of __assume(0) and switches.

dangi12012 · Post by **dangi12012** » Fri Jul 21, 2023 1:43 pm

Looked through some discord and forums and this algorithm has not been understood AT ALL.

We have algorithms that calculate an index like pext or fancy magic. We have algorithms that calculate bitboards like literally all others.
But there is a different kind, very similar to PST or Zobrist this is an incrementally updated movegenerator which means the input is not the board, but the input needs to be the move taken.

Hypercube internally maintains the list of 64 pext indices (yes the same as the fastest known PEXT movegenerator) but without the use of the pext instruction. Each bit of a pext has a dependency or "sensitivity list" that is defined in the mask. Now the result of PEXT(value, mask) can also be constructed beforehand when moving a piece. This list can be seen as lines that emerge from the source square and target 14 squares. Mapped for all 64 squares gives a cube like appearance if we were to render that. Thats where the name comes from.

If we move a piece we update up to 14 bits inside all 64 rooks and bishop pext indices. This can be seen here: https://raw.githubusercontent.com/Gigan ... rchess.hpp

Actually this implementation is extremely crude and nowadays I would unconditionally set the bits with __m256_or and __m256_and meaning we can also optimize the switch in the code away and end up at 8 branchless AVX2 instructions.

And here comes the final part and why this is so cool. Once you have moved the piece now you have 64 finished indices meaning you can query all 64 squares for a queen for FREE. With 8 unconditional masking instructions upon a single move you now have all sliders solved wherever they are on the board and you can query for FREE.

There is no other algorithm like this. Every other algorithm needs to calculate something at least from OCC and sq. See in the code we dont need OCC at all anymore. Its because we dont work in the space of occupation but in the space of all previous moves.

Code: Select all

static constexpr uint64_t Rook(int sq) {
   return RookMoves[sq][cfgR[sq]];
}

Most use masks and bit twiddling to produce the results but here I created an algorithm that can in a binary compatible way produce the result of PEXT(occ, mask) for 64 squares all at once. With 8x8 chess the switch implementation is borderline but this approach scales! imagine 16x16 chess we could have dozens of sliders and 4x the amount of squares but the cost of one move only increases linearly. 64 squares solved in 8 ops. 256 squares solved in 16 ops and so on. It is truly O(1) even more so that constant work here means no work and literally looking up the result:

RookMoves[sq][cfgR[sq]];

The idea of incrementality is seen in evals() as well where nnue and pst are also incrementally updated because that approach is plain and simply more efficient. Hypercube incrementally maintains 64 pext indices in the only and truly - optimal way.

Hypercube Slider Lookup - New Sliding Piece Algorithm [RELEASE]

Re: Hypercube Slider Lookup - New Sliding Piece Algorithm [RELEASE]

Re: Hypercube Slider Lookup - New Sliding Piece Algorithm [RELEASE]

Re: Hypercube Slider Lookup - New Sliding Piece Algorithm [RELEASE]

Re: Hypercube Slider Lookup - New Sliding Piece Algorithm [RELEASE]

Re: Hypercube Slider Lookup - New Sliding Piece Algorithm [RELEASE]