Pawn Hash

bob · Post by **bob** » Mon Mar 22, 2010 2:25 pm

hgm wrote:I wonder if including Bishop color in the Pawn hash key would be a good idea. This would only require two extra Zobrist keys per side, one for the white squares, and one for the black squares, and the status would be very stable through the tree. So it might not degrade the hit-rate too much.

You could then put the good Bishop / bad Bishop distinction in the hashed pawn scores. You could also reduce the value of an extra pawn when there are unlike Bishops.

It doesn't take much. I'd suspect you lose 50% of the hits, roughly, although I have not done the math explicitly. But if you have a bishop, you can certainly trade it in enough places, and pawn structure with or without that bishop is pretty much identical, but would be lost in the hash table.

My approach has been fairly simply. For pawn-only scoring, hash the score. For pawn scoring that depends on pieces, hash whatever is needed so that it can be used in the piece scoring (non-hashed) but without having to do all the pawn-specific stuff again. King safety is a simple example. I only consider 3 king positions, kingside, queenside and center. So I evaluate all 3 areas for pawn shelter stuff and then hash it and use the one that applies when I evaluate the king safety stuff. Other ideas such as rook behind a passed pawn are done the same way, an 8-bit value indicating which files have passed pawns for one side. Easy to then see if one of your rooks is on one of those files and add a bonus if you so choose...

marcelk · Post by **marcelk** » Wed Mar 24, 2010 9:17 pm

bob wrote:
Dann Corbit wrote: Many programs include the king position in the pawn hash.
I hope not, it absolutely kills hash hit rate. In Cray Blitz we actually did a king safety hash, which included pawns + K in a separate table. The hit rate was never even 50%, whereas the normal pawn hash hit rate (for me) is always > 99%. most pawn scoring has nothing to do with king position. Giving up 50% hits could make a program many times slower than it needs to be.

Something seems wrong if you always got a lower than 50% hit rate. While including the king with the pawn hash can have drastic effects on the hit rate when almost all other pieces are gone, it should still be typically <5% all nodes that have a miss. I did a run on a test set which has both middle game and end game positions. Results in the plot below. The hit ratios vary a lot, but it doesn't come near to the 1 out of 2 miss rate of Cray Blitz.

The question whether or not the program is slower than it needs to be depends on how much work is done in the calculation that you prevent do by hashing vs. the cost of doing in another way. This is dependent on the evaluation strategy the program uses. The bad cases are further mitigated by the observation that the poorer hit ratios occur only in positions where the rest of the program is fast anyway. So, "it all depends", it may not be a bad deal at all.

bob · Post by **bob** » Wed Mar 24, 2010 11:02 pm

I may have missed it, but what is that showing? Hits vs probes or hits vs total nodes searched? Note that we had +both+ kings in the hash signature, plus all pawns (both sides). Which is just like pawn hash without the kings folded in.

marcelk · Post by **marcelk** » Wed Mar 24, 2010 11:13 pm

Nodes per pawnking hash miss. High is better. Low is poor.

hgm · Post by **hgm** » Wed Mar 24, 2010 11:26 pm

So if I read the graph correctly, the hit rate was between 95% and 99.95%? How many entries was this hash table?

marcelk · Post by **marcelk** » Wed Mar 24, 2010 11:46 pm

hgm wrote:So if I read the graph correctly, the hit rate was between 95% and 99.95%? How many entries was this hash table?

You are correct. The table size is 2**18 pawnking slots with 2 potential slots per hash. The replacement scheme for slot selection is 'random' (actually it is based on node counter's lowest bit). To avoid branch mis-predictions on probing, after a hit on the 2nd candidate slot, I swap it with the 1st candidate.

For completenesss I put the test set in the same directory.

I must say that I wasn't aware of the huge variation until I started looking at it this way. So for me, keeping the kings in there is still something I one day may reconsider. However, in my case, the penalty for redoing the calculation is quite low, because there is another layer of hashing directly behind a pawnking miss. So I might decide to keep it.

PS: I like your idea about including the bishop square colors a lot. I will play with it as well.

bob · Post by **bob** » Thu Mar 25, 2010 12:18 am

marcelk wrote:Nodes per pawnking hash miss. High is better. Low is poor.

Yes, but if it is real "nodes" then your numbers can't be compared to what I wrote. I said 50% hits, which means for every 2 king safety hash probes, I got 1 hit. I have just a few search results that are hard-copy, and the hits went from a high of 70% right on down to simple endings where it was extremely low.

marcelk · Post by **marcelk** » Thu Mar 25, 2010 12:43 am

bob wrote:
marcelk wrote:Nodes per pawnking hash miss. High is better. Low is poor.
Yes, but if it is real "nodes" then your numbers can't be compared to what I wrote. I said 50% hits, which means for every 2 king safety hash probes, I got 1 hit. I have just a few search results that are hard-copy, and the hits went from a high of 70% right on down to simple endings where it was extremely low.

In this case I did 1 pawnking probe per node, so the numbers can be compared as far as I can judge.

bob · Post by **bob** » Thu Mar 25, 2010 6:50 pm

marcelk wrote:
bob wrote:
marcelk wrote:Nodes per pawnking hash miss. High is better. Low is poor.
Yes, but if it is real "nodes" then your numbers can't be compared to what I wrote. I said 50% hits, which means for every 2 king safety hash probes, I got 1 hit. I have just a few search results that are hard-copy, and the hits went from a high of 70% right on down to simple endings where it was extremely low.
In this case I did 1 pawnking probe per node, so the numbers can be compared as far as I can judge.

One other thing I didn't think about. In the CB logs (printed) that I have, a typical middlegame search would reach 9-10 plies with queens still on. Compared to 20+ today. That also probably makes a big difference.

Don · Post by **Don** » Fri Mar 26, 2010 2:03 am

hgm wrote:I decided to put a Pawn hash in my new engine, so that I can quickly hack in some passer recognition through a dumb algorithm, without having to worry about efficiency. I never used Pawn hash before. Does the following seem a reasonable implemetation:

I keep an extra 32-bit hash key, which uses the same randoms as the main hash key for Pawns, but zero keys for all other pieces. (In fact I derive them by ANDing the randoms fetched for the main hash with -IS_PAWN(piece), which seemed faster than setting up seprate tables.)

The Pawn hash itself is an array of integers, indexed by the low-order 18 bits of the key. It stores the hashed evaluation score in the 10 low-order bits, and the 22 high-order key bits in the rest, as signature.

In this scheme bits 10-17 are redundant, and do not contribute to avoiding collisions, as they were already implied by the address. I guess I could XOR these with 8 corresponding bits from my main hash key, if I make those bits only depend on Pawns there too (i.e. zero them for all other pieces). Would this degrade the opertion of the main hash table? I guess it would if I take bits for it that are used drectly in the index, but I could use bits from the signature. As I store only a 32-bits signature in the main hash, I guess there are some unused bits, which I could shift to the proper location for this purpose.

Would this be worth it, or would an 18-bit address and (effectively) a 14-bit signature be good enough?

I did a study a few years ago with 32 bit keys vs 64 keys and found no measurable difference in playing strength. I used a similar scheme where part of the 32 bit was also the address key and played something like 30,000 games.

However, that was a few years and that makes a world of difference. You will get lots of collisions, but the issue for playing strength is how often do those collisions actually affect the move choice? That is worth a few bits of "extra key" but with highly selective programs I suspect that it's not worth what it used to be worth a few years ago.

On the other hand, undetected collisions with the pawn structure hash is less likely to have a seriously negative affect that having these collisions with the full position signature.

My "intuition", for what (little) it's worth is that you can probably get away with this - a few undetected collision in the pawn structure hash is not going to cost you hardly anything in terms of raw ELO and mis-evaluations. However, I am quite sure you will get a fair number of collisions - and I'm sure you know how to make that calculation. But that calculation does not tell you how much ELO you will sacrifice.

If you are using this information to find passed pawns and/or to do something dynamic with them or some other aspect of pawn structure, your program better not make any assumptions because as we know you will get a lot of mis-evaluation. Komodo for instance stores the locations of the passed pawns computed from the pawn structure hash and does stuff with it during the slower evaluation.

Of course it's better if you can find a way to economically compute a separate address hash - but I know you are a minimalists. My feeling is that what you have described will be indistiguishable from a more anal approach. In fact if you do more and it costs you 1% slowdown it may actually be a slight weakening.

Pawn Hash

Re: Pawn Hash

Re: Pawn Hash

Re: Pawn Hash

Re: Pawn Hash

Re: Pawn Hash

Re: Pawn Hash

Re: Pawn Hash

Re: Pawn Hash

Re: Pawn Hash

Re: Pawn Hash