Question about Gigantua (and compile time)

Discussion of chess software programming and technical issues.

Moderator: Ras

chessbit
Posts: 9
Joined: Fri Dec 29, 2023 4:47 pm
Location: Belgium
Full name: thomas albert

Question about Gigantua (and compile time)

Post by chessbit »

Hi,

I just discovered Gigantua which inspired me to further optimize my engine with compile time functions. But as I'm a newbie I don't grasp everything just yet. At the moment my engine runs about 100% slower than Gigantua but fully run time so I'm hoping I can further improve this.
I was going through the source code and the "compile time" pext lookup grabbed my attention.
What makes it work for the compiler to allow it to produce a constant expression?

What is missing from this simple function call to make it a constant expression? Just like Gigantua, the parameters are initiliazed as const, and all my arrays are constexpr defined in a file.

Code: Select all

__forceinline static constexpr U64 getRookAttacksC(int square, U64 occupancy) {
    return ROOK_ATTACKS[ROOK_OFFSETS[square] + _pext_u64(occupancy, ROOK_MASKS[square])];
}
Below is the relevant implementation in Gigantua, with a struct holding a pointer to the attack array, and a mask. Then another array is defined, which for every square, initializes an instance of this struct. Is the secret to write explicitly the instantiation 64 times?

Struct (removed some irrelevant code)

Code: Select all

struct SliderPext_t
{
	const uint64_t* AttackPtr;
	const uint64_t Mask;

	constexpr SliderPext_t(int offset, uint64_t mask) : AttackPtr(SliderPext + offset), Mask(mask) {

	}

	_ForceInline constexpr uint64_t operator[](const uint64_t blocker) const
	{
		if (std::is_constant_evaluated()) {
			return AttackPtr[_pext_u64_emulated(blocker, Mask)];
		}
		else {
			return AttackPtr[_pext_u64(blocker, Mask)];
		}
	}
};
The array (only two first rows to keep it compact)

Code: Select all

static const SliderPext_t Pext_RookAttacks[64] = {
	SliderPext_t(RookOffset_Pext[0], rmask[0]),
	SliderPext_t(RookOffset_Pext[1], rmask[1]),
       ......
And the way it is called

Code: Select all

_Compiletime uint64_t Rook(uint64_t square, uint64_t occupy) {
	return Pext_RookAttacks[square][occupy];
}
Also a side question, is there a "catch" with this approach, that it wouldn't work on every machine or that it would not be suited for a full engine for some reason?
smatovic
Posts: 3220
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Question about Gigantua (and compile time)

Post by smatovic »

chessbit wrote: Wed Jan 03, 2024 9:36 pm ....
Also a side question, is there a "catch" with this approach, that it wouldn't work on every machine or that it would not be suited for a full engine for some reason?
https://en.wikipedia.org/wiki/X86_Bit_m ... ion_Set_2)
Intel introduced BMI2 together with BMI1 in its line of Haswell processors. Only AMD has produced processors supporting BMI1 without BMI2; BMI2 is supported by AMDs Excavator architecture and newer.[10]
https://en.wikipedia.org/wiki/X86_Bit_m ... nd_extract
AMD processors before Zen 3[12] that implement PDEP and PEXT do so in microcode, with a latency of 18 cycles[13] rather than (Zen 3) 3 cycles.[14] As a result it is often faster to use other instructions on these processors.[15]
AFAIK ARM v7/v8 have no PEXT equivalent, dunno about v9:

Equivalent of PEXT instruction on ARM
https://stackoverflow.com/questions/700 ... ion-on-arm

--
Srdja