Rotated Bitboards Part II:

dangi12012 · Post by **dangi12012** » Tue Feb 14, 2023 3:31 pm

Reading further into what AVX512 actually can provide for chess:
Rotated bitboards that normally need to maintain 4-8 rotations can be done for free (1 instruction maybe?)

With avx512 a 64x8bit mailslot fits entirely into one register. These can be rotated cheaply via 8 compiletime lookup tables.

Code: Select all

struct IndexArray
{
	Vec64c Brd;
	Vec64c LU;

	IndexArray() : LU(
		0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
		32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63),
		Brd('.')
	{
	}

	void Vertical()
	{
		LU ^= Vec64c(56);
	}

	void Horizontal()
	{
		LU ^= Vec64c(7);
	}

	Vec64c Board() {
		return lookup64(LU, Brd);
	}

	std::string to_string()
	{
		auto b = Board();
		std::stringstream ss;
		for (int y = 0; y < 8; y++) {
			for (int x = 0; x < 8; x++) {
				int i = y * 8 + x;
				ss << b[i] << ' ';
			}
			ss << '\n';
		}
		return ss.str();
	}
};

Look at the code for Agner Fog Vector Library.
Lookup64 is this and can rotate/mirror for almost free (1 instruction):

Code: Select all

#ifdef  __AVX512VBMI__   // AVX512VBMI instruction set not supported yet (April 2019)
    return _mm512_permutexvar_epi8(index, table);
#else

Transforming the indices of 64x8bit to a Bitboard is this (1 instruction)
_mm512_movepi8_mask

Resolving locations also can be done for 8 Bitboards at once:
_mm512_lzcnt_epi64

So getting a HV mirrored rotated bitboard from a mailslot represenation takes 2 instructions.

Interesting stuff and all of that will be available going forward because Intel and AMD will carry support for __AVX512VBMI__ as it looks now.

dangi12012 · Post by **dangi12012** » Wed Feb 15, 2023 1:29 pm

Thinking more on this topic - is that there are many many chess algorithms and ideas that are only applicable when rotation by 90, 45, 180, mirroring is very cheap which is not the case for normal x86-64.

But it really is for AVX512. For example it looks like rotation of 45° of all bytes in a mailslot representation looks like this (and it will never ever get cheaper than 1 instruction!)

Code: Select all

board = _mm512_permutexvar_epi8(board, rotate45)

Extracting 64x8 bytes into a specific bitboard is also 1 instruction. (for example extracting uint64_t for black knights)

Code: Select all

_mm512_movepi8_mask

So going forward you can have best of both worlds - the simpleness of mailslot together with fast algorithms of bitboards.
If I get my hands on Zen4 in the future I will for sure share some proof of conecpt code here.

Rotated Bitboards Part II:

Rotated Bitboards Part II:

Re: Rotated Bitboards Part II: