most similar hashes of two positions

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 3:03 pm
Location: British Columbia, Canada

Re: most similar hashes of two positions

Post by wgarvin » Mon Aug 17, 2015 11:09 pm

I guess you could use a set of 64 keys, one per square, and rotate them by 4*P bits, for a piece type P in [1..12]. The table of keys would then be 512 bytes. Not as tiny as some of the other "rotated key" ideas, but still pretty small, the entire table would fit in 8 cache lines. It would work well on non-x86 architectures too, since it doesn't use any misaligned access.

Compilers usually recognize some idiom for rotate and turn it into a single instruction. The part that can be tricky, is to get it to generate a single instruction, while avoiding undefined behavior from zero- or 64-bit shifts). I haven't tried it myself on any modern compiler.
You, sir, are one strange individual.
I know.. :lol:

Aleks Peshkov
Posts: 870
Joined: Sun Nov 19, 2006 8:16 pm
Location: Russia

Re: most similar hashes of two positions

Post by Aleks Peshkov » Tue Aug 18, 2015 1:06 pm

hgm wrote:The enhanced collision rate I observed might indeed very well be a property of the PRNG I was using. I did not really get to the bottom of this.
Obviously there are better keys then a pure luck of random sample of 7.
Now keys that have the same upper 40 bits only cause a collision if the PST evaluation of the positions they represent is also exactly the same.

The cost is just a single addition of a cached variable.
Thank you, now I see the idea. Nice trick to fill otherwise unused part of the hash entry. But it is better not to store unused hash part at all.

User avatar
hgm
Posts: 23490
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: most similar hashes of two positions

Post by hgm » Wed Aug 19, 2015 5:30 am

I am not sure what unused part you refer to. This is just a trick of making the key effectively longer, no matter what size hash entry you use, by considering the incremental PstEval as part of the key.

In fact by using an additive hash key, instead of an XOR-based, you could pick the 64-bit Zobrist keys such that their lower 32 bits are the PST values (16 bits end-game, 16 bits opening), so that you don't have to update a separate PST eval. Use the high part of the key for deriving the index, and use the XOR of high and low part for signature.

Aleks Peshkov
Posts: 870
Joined: Sun Nov 19, 2006 8:16 pm
Location: Russia

Re: most similar hashes of two positions

Post by Aleks Peshkov » Wed Aug 19, 2015 9:22 am

hgm wrote:I am not sure what unused part you refer to. This is just a trick of making the key effectively longer, no matter what size hash entry you use, by considering the incremental PstEval as part of the key.
Full key is 64-bit, the least 24-bits are unused (always the same) in the same hash slot. If you want real key you may calculate 128-bit key (that can be actually a speed gain on Intel platform) and use some bits for index and another for lock. PST eval is a poor man's hash substitution. But it seems that extra work is unneeded in practice.

Material imbalance hash or only pawns hash can be calculated together with the full position hash in one register. But I doubt that such micromanagement is worse itself.
In fact by using an additive hash key, instead of an XOR-based, you could pick the 64-bit Zobrist keys such that their lower 32 bits are the PST values (16 bits end-game, 16 bits opening), so that you don't have to update a separate PST eval.
It is ugly to use signed arithmetic in a part of general purpose register. PST and material values are always positive only if you calculate them separately per each side.

I think that pieces counters (per piece type for end-game knowledge) or material-based game stage or the simplest (1-3-3-5-9) material eval for SEE are better candidates to join with PST values. And better to use 128-bit datatype to extend Intel register limit and gain from parallel calculation units.

User avatar
hgm
Posts: 23490
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: most similar hashes of two positions

Post by hgm » Wed Aug 19, 2015 8:33 pm

Aleks Peshkov wrote:PST eval is a poor man's hash substitution.
My point was that it is better than using the 24 bits that were already implied by the index.
It is ugly to use signed arithmetic in a part of general purpose register. PST and material values are always positive only if you calculate them separately per each side.
I don;t see anything ugly about that. The PSTs of my new engine 'Simple' are filled with (opening-endgame)*N + endgame (where N is 2^16). The game phase is kept ofsetted by N, i.e. phase = 16*N + truePhase (because phase is in units of 1/16). Then * can simply caluclate phase*PST = truePhase*endgame + N*16*endgame +N*truePhase*(opening-endgame) + {N*N*16*(opening-endgame)}, where the part between braces is disappears because of overflow when you assign it to an int. Then all I have to do is shift right by 20 (instead of by 4) to get rid of the factor N, to be left with truePhase/16 * opening + (1 - truePhase/16) * endgame. The endgame * truePhase term is just negligible noise. If you are worried about the single unit of rounding error it could contribute to the result, you could add 8*N before the right-shift. That sitll makes for a very fast interpolation:

const int round = 1 << 19

PST * phase + round >> 20

That is just 3 operations: multiply, add, shift, where the add is close to paranoia. PST could be (int)hashKey which is an effortless cast.

brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 3:02 pm
Contact:

Re: most similar hashes of two positions

Post by brtzsnr » Sun Aug 30, 2015 7:53 am

Here are the conclusions after more testing:

1. There are (obviously) many positions with the same Polyglot key. Peter has proved that before and he has given some examples.
2. The best I could find starting from the initial position were two positions with hamming distance 2.
3. After searching many hours for an invalid hash move due to a collision (using 32 bits keys) I could not find any in real game play. I think in most searched positions the score is returned so the hash is not used. It's also possible that in real game play that positions in the same search tree have lots of moves in common so the hash move happens to be valid, nevertheless.
4. I added hash move validation to my engine, but after 46000 games the score was 16534 - 16669 - 12797, almost equal (if anything a small regression).
5. However, Steven has found a 64 bits collision after 18 billion+ searched nodes (http://talkchess.com/forum/viewtopic.php?t=57307) so he wins that second category.

User avatar
hgm
Posts: 23490
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: most similar hashes of two positions

Post by hgm » Sun Aug 30, 2015 9:09 am

wgarvin wrote:I guess you could use a set of 64 keys, one per square, and rotate them by 4*P bits, for a piece type P in [1..12]. The table of keys would then be 512 bytes. Not as tiny as some of the other "rotated key" ideas, but still pretty small, the entire table would fit in 8 cache lines. It would work well on non-x86 architectures too, since it doesn't use any misaligned access.
The size of the key table starts to be a real problem in my engine HaChu, which was designed to handle the large Shogi variants, at least up to Tai Shogi (25x25 board), and perhaps even Taikyoku (36x36). As the number of piece types also increases with the board size (Tai Shogi has 93 initial types, while promotion adds another 37).

So I just use an on-the-fly calculation as key[piece][square] = pieceKey[piece]*squareKey[square]. That reduces the table size for Tai to (625 + 2*130)*8 = 7080 bytes. The fact that this nicely fits in L1 more than makes upfor the extra on-the-fly operations, especially if you realize that the access to a 2D-array does contain a hidden multiply (or an extra levelof indirection), which you now don't need.

Multiplying can be considered a more flexible generalization of rotating/shifting.

petero2
Posts: 580
Joined: Mon Apr 19, 2010 5:07 pm
Location: Sweden
Contact:

Re: most similar hashes of two positions

Post by petero2 » Sun Aug 30, 2015 9:48 am

syzygy wrote:
brtzsnr wrote:The second category is still open. As a reminder in the second category the positions must be reached by following a series of legal moves from the start position.
What do you mean by that? Practically any reasonably looking position is reachable by a series of legal moves from the initial position.

I did not look in detail at all positions listed by Peter, but they do seem to be reasonable. I'm sure most if not all of them are reachable by a series of legal moves from the initial position.
I think all positions in my list are reachable from the starting position. Take for example this position:
[D]2b1kqr1/p2p3p/3p4/p2PpP2/PpP2p2/6P1/8/RRB1KQ1N w - - 0 1
It can be reached from the starting position by the following sequence of moves:
[pgn]
[Event "Edited game"]
[Site "alien.localdomain"]
[Date "2015.08.30"]
[Round "-"]
[White "-"]
[Black "-"]
[Result "*"]

1. e4 c5 2. Ba6 bxa6 3. b4 g6 4. f4 Bh6 5. f5 Bf4 6. g3 Nf6 7. gxf4 Nh5 8.
c4 cxb4 9. Ne2 Ng3 10. Nd4 a5 11. hxg3 Nc6 12. Nxc6 Rb8 13. Nxb8 g5 14. d4
f6 15. Na6 gxf4 16. Nc5 Rg8 17. Nb7 Rh8 18. Nd6+ exd6 19. e5 fxe5 20. d5
Qf6 21. a4 Rg8 22. Nd2 Qf7 23. Rb1 Qf6 24. Ne4 Qf7 25. Rh2 Qf6 26. Nf2 Qf7
27. Nh1 Qf6 28. Qd3 Qf7 29. Qf1 Qf6 30. Ra2 Qf7 31. Raa1 Qf8
*
[/pgn]
Writing a program to find such move sequences could be an interesting challenge. A* or IDA* seems like good algorithms to use, but you need a powerful heuristic function to reduce the search space enough to be able to find long move sequences.

Post Reply