What do you mean with signature? Storing parts of the hash key? Remember that Stockfish has way bigger hash tables than I have (on the target platform), I only have 4096 entries for each of the main hash tables. That gives only 11 implicit key bits in the index. If you have 8 bytes per entry and use 128 MB, that's 16.7 million entries with 24 implicit bits. But I have to store the 13 bits difference explicitely.hgm wrote:I certainly would not waste precious bits on extending the signature. Stockfish gets away with a 16-bit signature!
I want to have at least 48 bits total hash key part because of collisions (referring to Bob's paper). The alpha/beta/exact flag needs two bits, so I fiddle in 6 additional key bits there, and I'm storing the upper uint32_t of the position index in a uint32_t. Plus the 11 implicit bits, that gives 49 bits, which is good because with small hash tables, there are a lot more index collisions than usually which open up the possibility of full collisions.
I think that even on a PC, 16 bits signature plus 24 implicit bits are too few.
If a move is present when probing the hash tables, I'm also doing some basic sanity checking, e.g. bishops have to move diagonally. If that doesn't match, I treat the lookup as "no match", also for the position value.
16 bits or 13 bits don't matter because it won't get useful anyway. I'll store from and to as 6 bits each and then 4 status bits. Your bit counter already fits into the depth with takes 6 bits.Note that for encoding Chess moves in a not-very-cumbersome way 13 bits would be sufficient
What I intend to do is sweeping through the table before each move and deleting every entry with the oldest modulo-counter. So, taking the current counter (i.e. the one that the upcoming search is about to use), adding up one (modulo 4) and deleting every entry that has this counter. This spares the overhead when hash probing in the search. I will have to benchmark, however, whether that approach becomes a killer with 1 GB hash.
An interesting side fact, I did some benchmarking and profiling for the pawn hash table to get real numbers:
I have 2048 entries and get 90.8% hit rate over the game, pretty consistent even in the opening. Doubling it to 4096 gives 92.8% hit rate. So there isn't much to gain.
Profiling shows that with 2048 entries, the pawn eval function takes 1.7% of the computing time (the mcount profiler function already calculated away to refer to real computing time). I could reduce that by 22% if I double the size. That would increase the overall speed by 0.4%, which isn't really useful.
Plus that part of that would be eaten up because I would need to split up the position hash uint32_t into to uint16_t to avoid struct padding (packed structs are a hardware dependency I want to avoid), and I'd have to reassemble things. Plus that the move compression and decompression would also be there.
So the overall gain would be even smaller than 0.4%. I will have to think about what to do with 16k additional RAM, or if I just leave it as growth potential (in which case I can defer the implementation).
What it also shows that for the PC version, I can limit the pawn hash table to maybe 10k entries and better use the rest of the configured memory usage for the actual hash tables.
The cache on the target works very differently anyway, referring to the flash ROM to make up for the wait states. RAM operates at full core speed and thus doesn't need cache. The whole hash table design, even before the port, is treating the memory as uniform.You could even consider using a 24-bit signature, so that you can pack 8 entries in a 64-byte cache line.