hash collisions

lucasart · Post by **lucasart** » Wed Jan 29, 2020 1:04 am

mar wrote: ↑Tue Jan 28, 2020 8:28 am If his engine crashes, then he should fix his hashmove validation.

A better test is to generate all possible moves during search (in hashmove format of course, typically 16 bits), generate all legal moves per node and make sure your validator correctly classifies these as legal/illegal.
After a couple of games in this mode your hashmove validation becomes robust.

The problem is not necessarily linked to the hash move. In fact, in Demolito, what caused crashed was not related to that at all. Demolito does not play the hash move, it simply generates moves, and sort them, giving highest score if a move happens to equal the hash move. So testing legality of hash before playing is irrelevant in Demolito.

What crashed Demolito is the fact that, using the static eval stored in the hash entry, I was deducing that we were not in check for null search. A condition like staticEval >= beta, would normally imply that staticEval > -MATE, and -MATE is the staticEval used when in check (calling evaluate in check is forbidden in Demolito, it must be handled by the search). So I was doing a null search in check, whenever a hash collision (single threaded) or race (SMP) causes this to happen (entry mixed with data from another entry in check). Of course, once you play a null move in check, the opponent captures the king, and all bets are off...

Daniel Shawul · Post by **Daniel Shawul** » Wed Jan 29, 2020 2:18 am

32-bit is not enough for me -- it crashes with a single thread search in less than 10 seconds and it is a real hash collision too,
but maybe I used a shitty RNG for generating my hashes. I used to do legality checking before switching to relying on lockless hash, but I guess it is time turn legality checking on now regardless.

These two positions have the same lower 32 bits in the hash key, and it wants to play the castling move on the second position.

[d]
[d]

jdart · Post by **jdart** » Wed Jan 29, 2020 3:30 am

it is completely unacceptable to leave a potential program crash bug

As I mentioned upthread, you could double the hash code size and then the probability of an illegal move from the hash is really vanishingly small.

However, if you want no possible probability of an illegal move from hash collision, even infinitesimally small, that is difficult to do efficiently. One of the big performance benefits of a hash move is that you can (almost always) skip full legal move generation if you have it. If you don't skip that step you can validate all moves but then you are losing speed.

--Jon

mar · Post by **mar** » Wed Jan 29, 2020 10:13 am

lucasart wrote: ↑Wed Jan 29, 2020 1:04 am The problem is not necessarily linked to the hash move. In fact, in Demolito, what caused crashed was not related to that at all. Demolito does not play the hash move, it simply generates moves, and sort them, giving highest score if a move happens to equal the hash move. So testing legality of hash before playing is irrelevant in Demolito.

What crashed Demolito is the fact that, using the static eval stored in the hash entry, I was deducing that we were not in check for null search. A condition like staticEval >= beta, would normally imply that staticEval > -MATE, and -MATE is the staticEval used when in check (calling evaluate in check is forbidden in Demolito, it must be handled by the search). So I was doing a null search in check, whenever a hash collision (single threaded) or race (SMP) causes this to happen (entry mixed with data from another entry in check). Of course, once you play a null move in check, the opponent captures the king, and all bets are off...

I see now, I forgot that you use full legal movegen in Demolito, my bad.
If you already classify checkers, then shouldn't in-check flag be for free in child node?

YUFe · Post by **YUFe** » Wed Jan 29, 2020 11:06 am

jdart wrote: ↑Wed Jan 29, 2020 3:30 amyou could double the hash code size

Maybe in the age of SSDs that is possible but I find it to be besides the point.
In this case it seem to be a implementation problem, not one of too many positions analyzed.

I find your entire thinking process / attitude frightening. I hope you do not work at Boeing.
"Our Motto: We make mistakes AND accept low risks."

hgm · Post by **hgm** » Wed Jan 29, 2020 11:08 am

It is not really important that the hash move is legal. Just that it doesn't create an inconsistent state the engine cannot handle, e.g. between board and piece list, when it gets made. Sliders jumping over pieces is usually no problem at all. Capturing of own pieces usually only gives a wrong score, which is also harmless. Applying a promotion to a non-pawn could be a problem, depending on your promotion code, e.g. cause a non-existent piece type. Moving an empty square might be a problem, if you did not define that as a valid piece type (i.e. have no piece-square table for it, or no entry in the piece list to update its location). In my engines the most sensitive point is usually capture of a King, as the in-check detection code assumes a King to be present, and could stray off board if there is none. For this reason I let the MakeMove code typically test for capture of a King, which then would cause an immediate return from the node, with a +INF score.

chrisw · Post by **chrisw** » Wed Jan 29, 2020 12:06 pm

jdart wrote: ↑Wed Jan 29, 2020 3:30 am
it is completely unacceptable to leave a potential program crash bug
As I mentioned upthread, you could double the hash code size and then the probability of an illegal move from the hash is really vanishingly small.

However, if you want no possible probability of an illegal move from hash collision, even infinitesimally small, that is difficult to do efficiently. One of the big performance benefits of a hash move is that you can (almost always) skip full legal move generation if you have it. If you don't skip that step you can validate all moves but then you are losing speed.

--Jon

Well, you can increase the size of the key, but the possibility of an illegal move remains, becoming almost an ethical issue. Are you okay with the possibility your product can knowingly crash? Me not. Two reasons, first it’s commercially unacceptable for shipment, if you ship it via mobile platform, 100,000,000 downloads may well disagree with the key width, and someone’s mobile phone crashes. Secondly, possession of a known bug is very destructive for general bug finding, it becomes too easy to explain away all rare bug occurrences as being down to random hash fail, which then psychologically stops the programmer from hunting down rare bugs in general.

Validation is not very expensive, you have to do it anyway with other history type moves, and it doesn’t require a full on move list to compare against. You can just serially test the necessary conditions. Is there a piece on from square, can this piece type move to the to square (standard attack mask) catches most stuff, then it’s just a few special cases to consider.

lucasart · Post by **lucasart** » Wed Jan 29, 2020 12:50 pm

mar wrote: ↑Wed Jan 29, 2020 10:13 am
lucasart wrote: ↑Wed Jan 29, 2020 1:04 am The problem is not necessarily linked to the hash move. In fact, in Demolito, what caused crashed was not related to that at all. Demolito does not play the hash move, it simply generates moves, and sort them, giving highest score if a move happens to equal the hash move. So testing legality of hash before playing is irrelevant in Demolito.

What crashed Demolito is the fact that, using the static eval stored in the hash entry, I was deducing that we were not in check for null search. A condition like staticEval >= beta, would normally imply that staticEval > -MATE, and -MATE is the staticEval used when in check (calling evaluate in check is forbidden in Demolito, it must be handled by the search). So I was doing a null search in check, whenever a hash collision (single threaded) or race (SMP) causes this to happen (entry mixed with data from another entry in check). Of course, once you play a null move in check, the opponent captures the king, and all bets are off...
I see now, I forgot that you use full legal movegen in Demolito, my bad.
If you already classify checkers, then shouldn't in-check flag be for free in child node?

Yes, of course. Here's the fix, no performance impact:
https://github.com/lucasart/Demolito/co ... 0ef9714a1a
I never said there was a performance tradeoff. It was just a stupid mistake, not a calculated risk.

bob · Post by **bob** » Thu Jan 30, 2020 5:44 am

I can't imagine one NOT using the lockless hashing idea in a SMP program. This interleaved update issue has been known (at least to me) since the early 80's when we started to see SMP crays. We spent a week out at the Los Alamos weapons lab (Harry and myself along with Burt Wendroff of the Lachex program) trying to track this down. We were using 12 byte hash entries packed into adjacent words of memory, which made it even worse, since every other word had a piece of two positions. As we studied the problem, we first extended each entry to 16 bytes to get rid of the nonsensical interleaving that was happing when two different processors stored to two adjacent hash entries that shared a 64 bit word of memory. This greatly reduced the error (we took the hash move, checked for legality, and complained if it was illegal, or in extreme cases we would store the actual position to compare).

Eventually we figured ut that having two (or more) threads storing to consecutive words of memory (Cray word = 64 bits of course) was problematic due to the possibilities of (a1, a2), (b1, b2) being stored, which is ok, but also (a1, b2) and b1, a2) which are not good.

The lockless hashing solution fixes this with just two performances. First, the xor (word 1 ^ word 2) which is really inconsequential in out of order execution anyway; second you can end up with an entry that will never match (but which can be overwritten with good information of course). So you miss a hash hit you would have gotten with perfect hashing. Again, this happens infrequently enough that we went with the lockless approach, which had the added advantage of avoiding the serious overhead of an atomic lock / semaphore / etc. access. Overall a big gain except for the rare case when the entry is corrupted and becomes useless. The good thing is the corrupted entry will never be used, which addresses the main concern...

It is just too easy to implement this and solve the problem until you reach the search speeds where 64 bits fails to provide reasonable hash signatures. We are a LONG way from that point today.

jstanback · Post by **jstanback** » Thu Jan 30, 2020 4:51 pm

Hi Bob, As I mentioned in my previous post, I don't think storing a 64 bit hash signature is enough even now on hardware like at TCEC. Wasp used "only" 16G of hash memory = 1G positions, so 30 bits is used up in the address. Since the table is always full of entries, that leaves 34 bits of useful verification. So with 4 probes I think the collision rate is 4 in 2^34 or 1 in 4 billion. At 100M nps if the HT is probed at every node (not quite the case for Wasp) there should be around 360 collisions per hour of search. So I think the move retrieved from the HT cannot be played without enough verification to ensure it won't cause a crash. I added a full move legality test for each successful hash fetch and surprisingly the slowdown is only about 1%.

The code is below. I could eliminate the test for leaving king in check since that is done in the moveloop as part of my normal move legality test for pseudo-legal moves from the move generator.

John

Code: Select all

int HashMoveIsLegal(POS* pos, int side, MOVE mv)
{
  SQUARE f = moveFROM(mv);
  SQUARE t = moveTO(mv);
  PIECE  fpc = PieceType(Piece(pos,f));
  PIECE  tpc = PieceType(Piece(pos,t));
  BITBOARD bbf = SQ2BB(f);
  BITBOARD bbt = SQ2BB(t);
  BITBOARD occ = Occupied(pos);
  
  // check that the color of the moving piece is correct and that
  // there is not a piece of the same color on the to_square
  if (!(pos->PieceBB[side][0] & bbf) || (pos->PieceBB[side][0] & bbt)) return(0);

  // if capture, test that the bit for the captured piece is set
  if (tpc && !(pos->PieceBB[!side][tpc] & bbt)) return(0);

  // if promotion, test that this is pawn moving to 8th rank
  if ((mv & promoteAny) && (fpc != pawn || Rank(side,t) != RANK8)) return(0);

  int cpc = tpc; // captured piece
  int csq = t;

  switch (fpc)
  {
    case pawn:
      if (BB_PawnRays[side][f] & bbt)
        {
          if (t == pos->epsquare)
            {
              cpc = pawn;
              csq = t-pawn_dir[side];
              if (tpc != 0) return(0);
              if (!(pos->PieceBB[!side][pawn] & SQ2BB(csq))) return(0);
            }
          else if (!tpc) return(0);
        }
      else if (t == f+pawn_dir[side])
        {
          if (tpc) return(0);
        }
      else if (t == f+2*pawn_dir[side] && Rank(side,f) == RANK2)
        {
          if (tpc || Piece(pos,f+pawn_dir[side])) return(0);
        }
      else return(0);
      break;
    case knight:
      if (!(BB_KnightRays[f] & bbt)) return(0);
      break;
    case bishop:
      if (!(BB_BishopRays[f] & bbt) || (BB_Between[f][t] & occ)) return(0);
      break;
    case rook:
      if (!(BB_RookRays[f] & bbt) || (BB_Between[f][t] & occ)) return(0);
      break;
    case queen:
      if (!((BB_BishopRays[f] | BB_RookRays[f]) & bbt) ||
          (BB_Between[f][t] & occ)) return(0);
      break;
    case king:
      if (!(BB_KingRays[f] & bbt))
        {
          if (t == f+2 || t == f-2)
            {
              if (Rank(side,f) != RANK1) return(0);
              SQUARE s1,s2,s3;
              if (t > f) {s1=f+1; s2=f+2; s3=f+3;}
              else       {s1=f-1; s2=f-2; s3=f-4;}
              if (Piece(pos,s1) || Piece(pos,s2)) return(0);
              if (Piece(pos,s3) != Piece(pos,f)-2) return(0);
              if (SqAtakd(pos,f,!side)) return(0);
              if (SqAtakd(pos,t,!side)) return(0);
              if (SqAtakd(pos,s1,!side)) return(0);
            }
          else return(0);
        }
  }

  // Update bitboards to remove captured piece
  if (cpc)
    {
      pos->PieceBB[!side][cpc] ^= SQ2BB(csq);
      pos->PieceBB[!side][0]   ^= SQ2BB(csq);
    }

  // Update fsq & tsq bits of occupied piece bitboard
  BITBOARD bb = (SQ2BB(f) | SQ2BB(t));
  pos->PieceBB[side][0] ^= bb;

  // test whether own king is attacked
  SQUARE ksq = pos->kingsq[side];
  if (ksq == f) ksq = t;
  int legal = !SqAtakd(pos,ksq,!side);

  // restore bitboards
  pos->PieceBB[side][0] ^= bb;
  if (cpc)
    {
      pos->PieceBB[!side][cpc] ^= SQ2BB(csq);
      pos->PieceBB[!side][0]   ^= SQ2BB(csq);
    }

  return(legal);
}

hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions

Re: hash collisions