Saying that 30% of stockfish's runtime is spent on move generation is not an opinion. It's a wrong statement trying to derail the discussion towards his perft generator (which is completely irrelevant to this thread). He does it in every thread possible, and it is killing the forum. "Go into their discord and look it up - or ask them." is also a useless and offensive response.
Move structure and hash tables
Moderator: Ras
-
Sopel
- Posts: 392
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Move structure and hash tables
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
hgm
- Posts: 28443
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Move structure and hash tables
Well, so you just point out that is for perft, and nobody cares about that, which would make him look pretty stupid. A handfull of tangential postings is not 'derailing a discussion', and a single wrong or easily misunderstood claim in a posting that did address the topic (move encoding) even less so.. If it gets too much follow-up it becomes thread hijacking, and the discussion thread can be split. But as it was the discussion continued on the original topic. 'Go look it up' is not an insult in the sense of the charter.
-
dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Move structure and hash tables
I also like to add that the difference between 16bit or a bigger format will not make any difference. Since you probably need to store other metadata like the evaluation etc as well.chrisw wrote: ↑Sat Oct 23, 2021 5:14 pmIt makes little difference to performance.gaard wrote: ↑Wed Oct 20, 2021 9:24 pm I see a lot of effort going towards packing a move in 16 bits to save space in the hash table or even for general use. But then you need special functions to determine, with the board state, whether a move is castling (although this one may be as simple as abs(orig - dest) == 2), en passant, capturing, promotion piece, etc. I suppose some of those can be encoded in the remaining 4 bits of a 16-bit move, but usually not all of them. An extra 16 bits also leaves room for flags like direct check, reveal check, etc. Has anyone every measured performance between the two schemes, that is, 16-bit move versus 32-bit. Is there any reason to believe the difference is worth more than a couple of Elo?
Another disadvantage of 16-bit, apart from having to write more fiddly code for determining the characteristics of the move, is that you also start running into problems between finding move character before MakeMove() and after it, because you’re dependent on board state (after MakeMove, the moving piece type had to be got from sq-to, not sq-from and so on and so on). Or maybe for example you need to know something about the move at nply-1, or nply-2, well, you’ll also need a boardstate history, not just a move history.
For an engine in development where you probably can’t predict all possible move informations you might need, then 32 bit with OTT info would seem desirable. After the engine is fully complete, then think about cutting back to 16-bit, but I’ld predict it would by way down the list of important things to try and optimize.
What many people dont think about is that X86 also has "Non Temporal" memory store which bypasses the memory hirarchy. I once wrote a raytracer where storing to memory is 2x faster and I know from a colleague that BLAS uses these as well:
MOVDQA xmmi, m128 vs MOVNTDQA xmmi, m128
https://lwn.net/Articles/255364/
So if you store - and not immediately read back from the same memory location this is a huge performance opportunity.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
hgm
- Posts: 28443
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Move structure and hash tables
It would require some redesign of TT architecture to make effective use of nontemporal memory access. Schemes that can only store a TT entry in a single location do not perform very well, as these offer no way for protecting the more valuable entries with large search depth. So an entry can be in multiple locations, meaning that multiple locations have to be probed. These are then all mapped to the same cache line. So the entire TT bucket will already be loaded into cache during the probing. And then a nontemporal store cannot bypass it, but will just dirty the cache line.
Another problem is that entries usually do not fit into a single machine word.
A possible solution is to store part of the hash signatures (e.g. 8 bits), possibly together with depth and aging info, in a single word. Then this word could be tested with a nontemporal read, and it would make it clear whether the sought entry can be in the table and where, or which entry should be replaced. This entry coul then be read and written nontemporally.
Another problem is that entries usually do not fit into a single machine word.
A possible solution is to store part of the hash signatures (e.g. 8 bits), possibly together with depth and aging info, in a single word. Then this word could be tested with a nontemporal read, and it would make it clear whether the sought entry can be in the table and where, or which entry should be replaced. This entry coul then be read and written nontemporally.
-
dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Move structure and hash tables
Correct on all points except that non temporal writes are made to bypass any and all cache hirarchy and go directly to memory. Exactly for the reason to never evict good data because you write other unrelated data.
Non temporal is applicable where you write data and read back from that buffer later in the code. ie. not immediately.
An excellent summary on that topic can be found here.
http://www.nic.uoregon.edu/~khuck/ts/ac ... 05s03.html
Its a way of talking to the processor directly:
"a non-temporal store also hints to the processor that the program intends to write the entire cache line, completely replacing the current content, so that the cache line does not need to first be fetched from memory"
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
gaard
- Posts: 463
- Joined: Mon Jun 07, 2010 3:13 am
- Location: Holland, MI
- Full name: Martin W
Re: Move structure and hash tables
I decided to shave the key size down to 48 bits using the u64 key : 48; trick and it seems to be working. I'm sure there some hidden overhead but it's just a few more lines of code and it looks like it gets me down to a 16-byte tt entry from 18. I suppose storing the whole key is redundant anyways, when the lower bits are already used as index.
-
gaard
- Posts: 463
- Joined: Mon Jun 07, 2010 3:13 am
- Location: Holland, MI
- Full name: Martin W
Re: Move structure and hash tables
Gcc lete get away with it, even allowing me to initialize the fields on construction, but msvc said no way. Something about bit fields of different types. Is there any real grave danger to just using a 32-bit key without clusters?gaard wrote: ↑Sat Oct 23, 2021 11:10 pmI decided to shave the key size down to 48 bits using the u64 key : 48; trick and it seems to be working. I'm sure there some hidden overhead but it's just a few more lines of code and it looks like it gets me down to a 16-byte tt entry from 18. I suppose storing the whole key is redundant anyways, when the lower bits are already used as index.
-
hgm
- Posts: 28443
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Move structure and hash tables
If the data is already cached the write cannot alter the memory, because then the cache woul hold an obsolete value. Which would be returned instead of the correct one on a later read.dangi12012 wrote: ↑Sat Oct 23, 2021 10:59 pm Correct on all points except that non temporal writes are made to bypass any and all cache hirarchy and go directly to memory. Exactly for the reason to never evict good data because you write other unrelated data.
Non temporal is applicable where you write data and read back from that buffer later in the code. ie. not immediately.
-
Ras
- Posts: 2730
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: Move structure and hash tables
Can you show the code for the type declaration of the bitfield and the assignment?
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
gaard
- Posts: 463
- Joined: Mon Jun 07, 2010 3:13 am
- Location: Holland, MI
- Full name: Martin W