I store only a 32 bit key - the upper part of the 64 bit key while the lower part of the 64 bit key serves for table indexing. The reason is the pretty limited memory on my microcontroller target hardware. That's why I use the legality check to increase the equivalent key length.
I do a full pseudo legality check on the hash move, i.e. not only the offsets, but for sliders also free squares in-between. To top it off, this isn't a bitboard engine so that the free square check involves actual loops. Means, it couldn't possibly become any worse in terms of performance impact.
If the pseudo legality check fails, I treat that as a failed hash lookup. The king-in-check verification happens afterwards because that would crash the engine. If I have a hash move, I defer the full move generation to the case that it doesn't give a cut-off.
I ran a little benchmark using the PC UCI version with and without the pseudo legality check in the hash lookup. I actually removed the whole function call to even eliminate that overhead. I did however keep the king-in-check verification.
I tried two positions, endgame and middlegame. In both positions, the number of nodes, PV and score were the same with and without pseudo legality checks of the hash move, so the search tree must have been the same. The difference is the pure performance impact of the pseudo legality check. The target depth was chosen so that I get around 20-40 seconds of time to depth. PV was cleared with ucinewgame and new position command between the different hash table sizes. The engine has only one worker thread.
1) Endgame: 8/3k4/8/8/8/8/2BKN3/8 w - -
256MB hash: 1% faster without checks
1MB hash: 0.7% faster without checks
2) Middlegame: 3qr1k1/p2n1ppp/b1p1p3/8/PrpPP3/4N1P1/2Q2PBP/3RR1K1 w - -
256MB hash: 0.7% faster without checks
1MB hash: 0.7% faster without checks
My conclusion: the performance impact is negligible.