hgm wrote:I decided to put a Pawn hash in my new engine, so that I can quickly hack in some passer recognition through a dumb algorithm, without having to worry about efficiency. I never used Pawn hash before. Does the following seem a reasonable implemetation:
I keep an extra 32-bit hash key, which uses the same randoms as the main hash key for Pawns, but zero keys for all other pieces. (In fact I derive them by ANDing the randoms fetched for the main hash with -IS_PAWN(piece), which seemed faster than setting up seprate tables.)
The Pawn hash itself is an array of integers, indexed by the low-order 18 bits of the key. It stores the hashed evaluation score in the 10 low-order bits, and the 22 high-order key bits in the rest, as signature.
In this scheme bits 10-17 are redundant, and do not contribute to avoiding collisions, as they were already implied by the address. I guess I could XOR these with 8 corresponding bits from my main hash key, if I make those bits only depend on Pawns there too (i.e. zero them for all other pieces). Would this degrade the opertion of the main hash table? I guess it would if I take bits for it that are used drectly in the index, but I could use bits from the signature. As I store only a 32-bits signature in the main hash, I guess there are some unused bits, which I could shift to the proper location for this purpose.
Would this be worth it, or would an 18-bit address and (effectively) a 14-bit signature be good enough?
I did a study a few years ago with 32 bit keys vs 64 keys and found no measurable difference in playing strength. I used a similar scheme where part of the 32 bit was also the address key and played something like 30,000 games.
However, that was a few years and that makes a world of difference. You will get lots of collisions, but the issue for playing strength is how often do those collisions actually affect the move choice? That is worth a few bits of "extra key" but with highly selective programs I suspect that it's not worth what it used to be worth a few years ago.
On the other hand, undetected collisions with the pawn structure hash is less likely to have a seriously negative affect that having these collisions with the full position signature.
My "intuition", for what (little) it's worth is that you can probably get away with this - a few undetected collision in the pawn structure hash is not going to cost you hardly anything in terms of raw ELO and mis-evaluations. However, I am quite sure you will get a fair number of collisions - and I'm sure you know how to make that calculation. But that calculation does not tell you how much ELO you will sacrifice.
If you are using this information to find passed pawns and/or to do something dynamic with them or some other aspect of pawn structure, your program better not make any assumptions because as we know you will get a lot of mis-evaluation. Komodo for instance stores the locations of the passed pawns computed from the pawn structure hash and does stuff with it during the slower evaluation.
Of course it's better if you can find a way to economically compute a separate address hash - but I know you are a minimalists. My feeling is that what you have described will be indistiguishable from a more anal approach. In fact if you do more and it costs you 1% slowdown it may actually be a slight weakening.