Move structure and hash tables

Sopel · Post by **Sopel** » Sat Oct 23, 2021 6:16 pm

hgm wrote: ↑Sat Oct 23, 2021 4:33 pm You mean like warning people who call posters names just because these have an opinion they don't agree with?

Saying that 30% of stockfish's runtime is spent on move generation is not an opinion. It's a wrong statement trying to derail the discussion towards his perft generator (which is completely irrelevant to this thread). He does it in every thread possible, and it is killing the forum. "Go into their discord and look it up - or ask them." is also a useless and offensive response.

hgm · Post by **hgm** » Sat Oct 23, 2021 6:48 pm

Well, so you just point out that is for perft, and nobody cares about that, which would make him look pretty stupid. A handfull of tangential postings is not 'derailing a discussion', and a single wrong or easily misunderstood claim in a posting that did address the topic (move encoding) even less so.. If it gets too much follow-up it becomes thread hijacking, and the discussion thread can be split. But as it was the discussion continued on the original topic. 'Go look it up' is not an insult in the sense of the charter.

dangi12012 · Post by **dangi12012** » Sat Oct 23, 2021 7:03 pm

chrisw wrote: ↑Sat Oct 23, 2021 5:14 pm
gaard wrote: ↑Wed Oct 20, 2021 9:24 pm I see a lot of effort going towards packing a move in 16 bits to save space in the hash table or even for general use. But then you need special functions to determine, with the board state, whether a move is castling (although this one may be as simple as abs(orig - dest) == 2), en passant, capturing, promotion piece, etc. I suppose some of those can be encoded in the remaining 4 bits of a 16-bit move, but usually not all of them. An extra 16 bits also leaves room for flags like direct check, reveal check, etc. Has anyone every measured performance between the two schemes, that is, 16-bit move versus 32-bit. Is there any reason to believe the difference is worth more than a couple of Elo?
It makes little difference to performance.
Another disadvantage of 16-bit, apart from having to write more fiddly code for determining the characteristics of the move, is that you also start running into problems between finding move character before MakeMove() and after it, because you’re dependent on board state (after MakeMove, the moving piece type had to be got from sq-to, not sq-from and so on and so on). Or maybe for example you need to know something about the move at nply-1, or nply-2, well, you’ll also need a boardstate history, not just a move history.
For an engine in development where you probably can’t predict all possible move informations you might need, then 32 bit with OTT info would seem desirable. After the engine is fully complete, then think about cutting back to 16-bit, but I’ld predict it would by way down the list of important things to try and optimize.

I also like to add that the difference between 16bit or a bigger format will not make any difference. Since you probably need to store other metadata like the evaluation etc as well.

What many people dont think about is that X86 also has "Non Temporal" memory store which bypasses the memory hirarchy. I once wrote a raytracer where storing to memory is 2x faster and I know from a colleague that BLAS uses these as well:

MOVDQA xmmi, m128 vs MOVNTDQA xmmi, m128
https://lwn.net/Articles/255364/

So if you store - and not immediately read back from the same memory location this is a huge performance opportunity.

hgm · Post by **hgm** » Sat Oct 23, 2021 9:45 pm

It would require some redesign of TT architecture to make effective use of nontemporal memory access. Schemes that can only store a TT entry in a single location do not perform very well, as these offer no way for protecting the more valuable entries with large search depth. So an entry can be in multiple locations, meaning that multiple locations have to be probed. These are then all mapped to the same cache line. So the entire TT bucket will already be loaded into cache during the probing. And then a nontemporal store cannot bypass it, but will just dirty the cache line.

Another problem is that entries usually do not fit into a single machine word.

A possible solution is to store part of the hash signatures (e.g. 8 bits), possibly together with depth and aging info, in a single word. Then this word could be tested with a nontemporal read, and it would make it clear whether the sought entry can be in the table and where, or which entry should be replaced. This entry coul then be read and written nontemporally.

dangi12012 · Post by **dangi12012** » Sat Oct 23, 2021 10:59 pm

hgm wrote: ↑Sat Oct 23, 2021 9:45 pm So the entire TT bucket will already be loaded into cache during the probing. And then a nontemporal store cannot bypass it, but will just dirty the cache line.

Correct on all points except that non temporal writes are made to bypass any and all cache hirarchy and go directly to memory. Exactly for the reason to never evict good data because you write other unrelated data.
Non temporal is applicable where you write data and read back from that buffer later in the code. ie. not immediately.

An excellent summary on that topic can be found here.
http://www.nic.uoregon.edu/~khuck/ts/ac ... 05s03.html

Its a way of talking to the processor directly:
"a non-temporal store also hints to the processor that the program intends to write the entire cache line, completely replacing the current content, so that the cache line does not need to first be fetched from memory"

gaard · Post by **gaard** » Sat Oct 23, 2021 11:10 pm

Henk wrote: ↑Fri Oct 22, 2021 8:34 pm Yes I regret storing move in 16 bits. Just make debugging as easy as possible. At least annoying.
Why all these tricks for a few elo. If you can't invent something better than just resign.

I decided to shave the key size down to 48 bits using the u64 key : 48; trick and it seems to be working. I'm sure there some hidden overhead but it's just a few more lines of code and it looks like it gets me down to a 16-byte tt entry from 18. I suppose storing the whole key is redundant anyways, when the lower bits are already used as index.

gaard · Post by **gaard** » Sun Oct 24, 2021 3:51 am

gaard wrote: ↑Sat Oct 23, 2021 11:10 pm
Henk wrote: ↑Fri Oct 22, 2021 8:34 pm Yes I regret storing move in 16 bits. Just make debugging as easy as possible. At least annoying.
Why all these tricks for a few elo. If you can't invent something better than just resign.
I decided to shave the key size down to 48 bits using the u64 key : 48; trick and it seems to be working. I'm sure there some hidden overhead but it's just a few more lines of code and it looks like it gets me down to a 16-byte tt entry from 18. I suppose storing the whole key is redundant anyways, when the lower bits are already used as index.

Gcc lete get away with it, even allowing me to initialize the fields on construction, but msvc said no way. Something about bit fields of different types. Is there any real grave danger to just using a 32-bit key without clusters?

hgm · Post by **hgm** » Sun Oct 24, 2021 10:51 am

dangi12012 wrote: ↑Sat Oct 23, 2021 10:59 pm Correct on all points except that non temporal writes are made to bypass any and all cache hirarchy and go directly to memory. Exactly for the reason to never evict good data because you write other unrelated data.
Non temporal is applicable where you write data and read back from that buffer later in the code. ie. not immediately.

If the data is already cached the write cannot alter the memory, because then the cache woul hold an obsolete value. Which would be returned instead of the correct one on a later read.

Ras · Post by **Ras** » Sun Oct 24, 2021 11:02 am

gaard wrote: ↑Sun Oct 24, 2021 3:51 amGcc lete get away with it, even allowing me to initialize the fields on construction, but msvc said no way. Something about bit fields of different types.

Can you show the code for the type declaration of the bitfield and the assignment?

gaard · Post by **gaard** » Sun Oct 24, 2021 4:05 pm

Ras wrote: ↑Sun Oct 24, 2021 11:02 am
gaard wrote: ↑Sun Oct 24, 2021 3:51 amGcc lete get away with it, even allowing me to initialize the fields on construction, but msvc said no way. Something about bit fields of different types.
Can you show the code for the type declaration of the bitfield and the assignment?

Something like

struct TTEntry {
u64 key : 48 = 0;
u8 bound : 2 = 0;
u8 age : 6 = 0;
u8 depth;
i16 score, value;
u32 move;
}

Move structure and hash tables

Re: Move structure and hash tables

Re: Move structure and hash tables

Re: Move structure and hash tables

Re: Move structure and hash tables

Re: Move structure and hash tables

Re: Move structure and hash tables

Re: Move structure and hash tables

Re: Move structure and hash tables

Re: Move structure and hash tables

Re: Move structure and hash tables