The mailbox trials

Mike Sherwin · Post by **Mike Sherwin** » Thu Mar 04, 2021 2:40 pm

Maybe then there ought to be a PerfQt function which is perft + Qsearch. I think that would make sense.

hgm · Post by **hgm** » Thu Mar 04, 2021 2:51 pm

With MikeB's trick (thanks!) I get 18.14 sec for Stockfish 10's perft 7, on the same machine. Basically the same as Qperft.

Mike Sherwin wrote: ↑Thu Mar 04, 2021 2:40 pm Maybe then there ought to be a PerfQt function which is perft + Qsearch. I think that would make sense.

I suppose we could define perftq(n,m) as n ply of arbitrary moves followed by m ply of captures. I am not sure how useful this is, though. In a search stand-pat plays an essential role in QS, and move ordering is essential for exploiting that. You lose all that in perft.

Mike Sherwin · Post by **Mike Sherwin** » Thu Mar 04, 2021 3:52 pm

hgm wrote: ↑Thu Mar 04, 2021 2:51 pm With MikeB's trick (thanks!) I get 18.14 sec for Stockfish 10's perft 7, on the same machine. Basically the same as Qperft.

Mike Sherwin wrote: ↑Thu Mar 04, 2021 2:40 pm Maybe then there ought to be a PerfQt function which is perft + Qsearch. I think that would make sense.
I suppose we could define perftq(n,m) as n ply of arbitrary moves followed by m ply of captures. I am not sure how useful this is, though. In a search stand-pat plays an essential rule in QS, and move ordering is essential for exploiting that. You lose all that in perft.

If the main focus is QS speed due to the underlying move generation method then what does lack of stand pat matter if it is common between all the move generators? And why limit captures since that is what we want to measure the speed of? Both node types can be counted to separate all moves from just captures only. As long as it is the same for all move generators ...

hgm · Post by **hgm** » Thu Mar 04, 2021 4:37 pm

Stand-pat allows futility pruning, and futility pruning makes it possible to do 'staged' move generation, where you refrain from generating captures that would be futile. And limit the effort spend on move generation after a capture causes a cutoff. So there are many opportunities here to speed up an engine, while those speedups would probably just provide overhead that slows down perft, without providing any benefits.

jstanback · Post by **jstanback** » Thu Mar 04, 2021 7:31 pm

My favorite mailbox move generator uses a 64 square board and has a pre-computed table of Moves of 64x10x10 = 6400 bytes which is indexed by square and direction. The directions are N,S,E,W,NE,SE,NW,SW,Knight,and King. For each square and direction the table has a list of squares in that direction which is terminated by a value OFFBRD. To generate the moves for a bishop on square b2 and add them to mvlist[], the move generator calls:
Scan(pos,piece,Dir_NE,mvlist)
Scan(pos,piece,Dir_NW,mvlist)
Scan(pos,piece,Dir_SE,mvlist)
Scan(pos,piece,Dir_SW,mvlist)
And so forth for the other pieces. I think I had one Scan function for captures and another for non-captures.
For me (a long time ago) this generator was as fast for Perft() as my 0x88 and faster than my bitboard generator and I liked that it used a 64 square board so that I could use a hybrid of mailbox and bitboard functions until I got more comfortable with bitboard stuff. I had piecelists for each color and piecetype which was faster and easier for making/unmaking moves than having one piecelist for each color.

John

hgm · Post by **hgm** » Thu Mar 04, 2021 9:00 pm

But what is the point of tabulating moves in a given direction, if you can calculate the next destination by simply adding the step for that direction to the previous destination? toSqr = table[i++]; would just e a memory load extra compared to toSqr += step; as you have to increment the loop index i anyway. (And either would be kept in a register.)

It is true that on a 64-square board there wouldn't be an easy way to test whether you fall off-board, but I never saw any particular advantage of having a 64-square board (in pure mailbox). And even then, you could have used a 64x8 table range[pos][dir] to indicate how far you could slide from pos before you hit the edge. (Why do you use 10 directions, btw?) That would give something like:

Code: Select all

i = range[fromSqr][direction];
toSqr = fromSqr;
while(i) {
  toSqr += step;
  { // do something with the to-square
    int victim = board[toSqr];
    if(victim) {
      ... // here a capture could be generated
      break; // but it ends the ray scan
    }
    ... // here a non-capture could be generated
  }
  i--;
}

That would just be an addition, a decrement and the a branch based on the result of that decrement. (Plus of course what you wanted to do with toSqr.) When toSqr came from the table, you would not only have to load it from there, but also throw an extra compare on it to see if you hit the sentinel.

The loop above actually looks very much like how I did it in Inferno. Except that instead of a static range[][] table I used a similar (but dynamically updated) view-distance table, which would not hold the distance to the edge, but to the nearest blocker in the given direction. (Which could be an edge, but also a piece.) So the non-captures could be generated even without accessing the board; all generated toSqr are guaranteed to be empty. In the design with a neighbor table I was planning to tabulate the square number of the blocker itself, (instead of the distance to it), so you could do:

Code: Select all

toSqr = neighbor[fromSqr][direction];
... // here a capture could be generated
while((toSqr -= step) != fromSqr) {
  ... // generate a non-capture
}

The loading and testing of the toSqr occupant, and branching on that, has been removed from the loop. For generating captures in QS the entire loop can be omitted. (You would have to test whether the listed neighbor is a foe or friend/edge, though.)

Mike Sherwin · Post by **Mike Sherwin** » Thu Mar 04, 2021 9:16 pm

I don't know enough to implement the final goal of this trial in mailbox. I don't know how a neighbor table works for example. However, I know bitboards well enough so I wanted to see how much work it would be in bitboards just to get close to what is proposed in mailbox. I hope the mailbox version is a lot less work than it would be in bitboards.

Code: Select all

#include <intrin.h>

typedef signed long        s32;
typedef unsigned long      u32;
typedef unsigned long long u64;

struct Thread {
  s32 wtm;
  unsigned long qtop[2];
  u64 kings[2];
  u64 figures[2][6];
  s32 board[64];
};

#define wtm     t->wtm
#define qtop    t->qtop
#define ptop    t->ptop
#define kings   t->kings
#define figures t->figures
#define board   t->board

constexpr auto ILLEGAL = 0x8001;
constexpr auto INF = 0x8000;
constexpr auto MATE = 0x7fff;

u64 possibleAttacks[64];

s32 Eval(Thread* t) {
  s32 score;

  return score;
}

u64 AttackedByWhite(Thread* t, u64 bb) {

  return false;
}

u64 AttackedByBlack(Thread* t, u64 bb) {

  return true;
}

u64 (*AttackedBy[])(Thread* t, u64 bb) = { AttackedByBlack, AttackedByWhite };

s32 Qsearch(Thread* t, s32 alpha, s32 beta) {
  unsigned long type = 0, types, victim, guild, assassin;
  bool ok;
  u64 victims, villains, villain, attackers, attacker, assassins, guilds;
  s32 score;

  if (Eval(t) >= beta) return beta;

  if (AttackedBy[wtm](t, kings[1 - wtm])) return ILLEGAL;

  types = qtop[1 - wtm];

  while (types) {
    ok = _BitScanForward(&type, types);
    types ^= 1 << type;
    victims = figures[wtm][type];
    villains = AttackedBy[wtm](t, victims);
    if (villains) {
      while (victims) {
        ok = _BitScanForward64(&victim, victims);
        victims ^= 1ull << victim;
        attackers = villains & possibleAttacks[victim];
        while (attackers) {
          guilds = qtop[wtm];
          while (guilds) {
            ok = _BitScanReverse(&guild, guilds);
            guilds ^= 1 << guild;
            assassins = figures[wtm][guild] & attackers;
            while (assassins) {
              ok = _BitScanForward64(&assassin, assassins);
              assassins ^= 1ull << assassin;
              MakeHit(t, assassin, victim);
              score = -Qsearch(t, -beta, -alpha);
              CallOffHit(t, assassin, victim);
              if (score >= beta) return beta;
              if (score > alpha) alpha = beta;
            }
          }
        }
      }
    }
  }
  return alpha;
}

Mike Sherwin · Post by **Mike Sherwin** » Thu Mar 04, 2021 9:38 pm

jstanback wrote: ↑Thu Mar 04, 2021 7:31 pm My favorite mailbox move generator uses a 64 square board and has a pre-computed table of Moves of 64x10x10 = 6400 bytes which is indexed by square and direction. The directions are N,S,E,W,NE,SE,NW,SW,Knight,and King. For each square and direction the table has a list of squares in that direction which is terminated by a value OFFBRD. To generate the moves for a bishop on square b2 and add them to mvlist[], the move generator calls:
Scan(pos,piece,Dir_NE,mvlist)
Scan(pos,piece,Dir_NW,mvlist)
Scan(pos,piece,Dir_SE,mvlist)
Scan(pos,piece,Dir_SW,mvlist)
And so forth for the other pieces. I think I had one Scan function for captures and another for non-captures.
For me (a long time ago) this generator was as fast for Perft() as my 0x88 and faster than my bitboard generator and I liked that it used a 64 square board so that I could use a hybrid of mailbox and bitboard functions until I got more comfortable with bitboard stuff. I had piecelists for each color and piecetype which was faster and easier for making/unmaking moves than having one piecelist for each color.

John

Hi John, Are any of your engines available as source code?

hgm · Post by **hgm** » Thu Mar 04, 2021 10:37 pm

Perhaps it is a good idea to introduce some of the more advanced concepts before starting to code and present actual searches. So let me start with this:

Neighbor Table

The neighbor table is an array indexed by square number and direction. For every occupied square it holds the square number of the nearest occupied square in that direction. There is a rim of 'edge guards' around the board on 'virtual squares' just beyond the edge; this excludes simple 0-63 square numbering, but requires at least a 10x10 board.

For convenience of saving / restoring elements of the neighbor table, the 8 neighbors of a given square would be packed as bytes into one 64-bit word. The table would have to be updated every time a square is evacuated (easy), or occupied (hard). Fortunately in QS squares only get evacuated; on the destination the attacker replaces the victim. (Forget e.p. for now.)

Code: Select all

typedef union {
  unsigned char dir[8]; // indexed by direction number
  uint64_t all;
} Neighbors;

Neighbors neighbor[10*10]; // neigbor table (8x8 playing area, plus surrounding rim)

void Evacuate(int sqr)
{
  int i;
  for(i=0; i<4; i++) { // for all orientations
    int up = neighbor[sqr].dir[i];   // upstream neighbor
    int dn = neighbor[sqr].dir[i+4]; // downstream neighbor
    neighbor[up].dir[i+4] = dn;      // let those see each other
    neighbor[dn].dir[i] = up;
  }
}

void Reoccupy(int sqr)
{ // for 'unmaking' Evacuate()
  int i;
  for(i=0; i<4; i++) { // for all orientations
    int up = neighbor[sqr].dir[i];   // upstream neighbor
    int dn = neighbor[sqr].dir[i+4]; // downstream neighbor
    neighbor[up].dir[i+4] = sqr;     // let those see the given square
    neighbor[dn].dir[i] = sqr;
  }
}

Of course the loops over the four ray orientations would be unrolled in practice; there would be no looping or branching here.

The neighbor table is very useful for generating slider captures: it tells you directly where the potential victims are in the directions the slider can move in. No need to scan through a possibly large number of empty squares to find them. It also makes it relatively easy to calculate the mobility of a piece: just sum the distances between the square the piece is on, and its neighbors.

Attack Map

The attack map attackers[victim] indicates which pieces attack the given victim. This is indicated as an integer where each bit corresponds to a certain attacker, and would be set to 1 if that attacker indeed can capture the given victim. The bits are assigned in order of increasing attacker value, such that standard bit-extraction techniques (which find the least-significant 1 bit first) would extract the attackers in LVA order. This makes the attack map very suitable for generating captures: just run through the opponent's piece list to visit potential victims in MVV order, and for each victim extract the attackers in LVA order.

The attack map is also very helpful in updating itself after a move: discovered slider moves all went to the square evacuated by the move, and will thus be recorded in the old attack-map element for the moved piece. We can mask out the slider attackers from that, extract them one by one to get their piece numbers, look up their location, look up in which direction this is relative to the evacuated square, and displace the attack to the neighbor in the opposit direction. Sounds a bit cumbersome, but the upside is that usually you have to do it zero times, because the moved piece was not blocking any sliders.

The Present Mask

The attack map defines a mapping of pieces on bits; we can use the same mapping to keep track of which pieces are still present, and which are currently captured. If we AND the attack-map elements each time we use those with this 'present mask', there is no need to update the map for the disappearence of the moves of captured pieces. We can also use the present mask when we want to loop over the piece list, skipping the captured pieces, by extracting the bits for the pieces that are present, and only processing those.

Mike Sherwin · Post by **Mike Sherwin** » Thu Mar 04, 2021 11:51 pm

hgm wrote: ↑Thu Mar 04, 2021 10:37 pm Perhaps it is a good idea to introduce some of the more advanced concepts before starting to code and present actual searches. So let me start with this:

Neighbor Table

The neighbor table is an array indexed by square number and direction. For every occupied square it holds the square number of the nearest occupied square in that direction. There is a rim of 'edge guards' around the board on 'virtual squares' just beyond the edge; this excludes simple 0-63 square numbering, but requires at least a 10x10 board.

For convenience of saving / restoring elements of the neighbor table, the 8 neighbors of a given square would be packed as bytes into one 64-bit word. The table would have to be updated every time a square is evacuated (easy), or occupied (hard). Fortunately in QS squares only get evacuated; on the destination the attacker replaces the victim. (Forget e.p. for now.)
Code: Select all
typedef union {
  unsigned char dir[8]; // indexed by direction number
  uint64_t all;
} Neighbors;

Neighbors neighbor[10*10]; // neigbor table (8x8 playing area, plus surrounding rim)

void Evacuate(int sqr)
{
  int i;
  for(i=0; i<4; i++) { // for all orientations
    int up = neighbor[sqr].dir[i];   // upstream neighbor
    int dn = neighbor[sqr].dir[i+4]; // downstream neighbor
    neighbor[up].dir[i+4] = dn;      // let those see each other
    neighbor[dn].dir[i] = up;
  }
}

void Reoccupy(int sqr)
{ // for 'unmaking' Evacuate()
  int i;
  for(i=0; i<4; i++) { // for all orientations
    int up = neighbor[sqr].dir[i];   // upstream neighbor
    int dn = neighbor[sqr].dir[i+4]; // downstream neighbor
    neighbor[up].dir[i+4] = sqr;     // let those see the given square
    neighbor[dn].dir[i] = sqr;
  }
}
Of course the loops over the four ray orientations would be unrolled in practice; there would be no looping or branching here.

The neighbor table is very useful for generating slider captures: it tells you directly where the potential victims are in the directions the slider can move in. No need to scan through a possibly large number of empty squares to find them. It also makes it relatively easy to calculate the mobility of a piece: just sum the distances between the square the piece is on, and its neighbors.

Attack Map

The attack map attackers[victim] indicates which pieces attack the given victim. This is indicated as an integer where each bit corresponds to a certain attacker, and would be set to 1 if that attacker indeed can capture the given victim. The bits are assigned in order of increasing attacker value, such that standard bit-extraction techniques (which find the least-significant 1 bit first) would extract the attackers in LVA order. This makes the attack map very suitable for generating captures: just run through the opponent's piece list to visit potential victims in MVV order, and for each victim extract the attackers in LVA order.

The attack map is also very helpful in updating itself after a move: discovered slider moves all went to the square evacuated by the move, and will thus be recorded in the old attack-map element for the moved piece. We can mask out the slider attackers from that, extract them one by one to get their piece numbers, look up their location, look up in which direction this is relative to the evacuated square, and displace the attack to the neighbor in the opposit direction. Sounds a bit cumbersome, but the upside is that usually you have to do it zero times, because the moved piece was not blocking any sliders.

The Present Mask

The attack map defines a mapping of pieces on bits; we can use the same mapping to keep track of which pieces are still present, and which are currently captured. If we AND the attack-map elements each time we use those with this 'present mask', there is no need to update the map for the disappearence of the moves of captured pieces. We can also use the present mask when we want to loop over the piece list, skipping the captured pieces, by extracting the bits for the pieces that are present, and only processing those.

I understand the neighbor table now. Very cool idea.

THIS IS JUST FOR INFORMATION
But that brought something to memory. Do you remember looking at my Conundrum source? You asked me why I did something twice. But, I couldn't tell you why because I couldn't remember. I just knew there was a reason. Anyway, it is a ts incrementally updated move generator. So what triggered my memory is that the fs and ts is processed after every move. I think the neighbor table would make Conundrum even faster. And Conundrum is no slouch at 54 million moves per second during perft, making and unmaking all moves, no hash and no bulk counting. What might interest you is how moves are generated. They are generated by ts.

The code to see if the king is in check is as simple as
if (h->mclr[kingSq] & 0x0000ffff) KingIsInCheck();

Or if you want to check for queen captures once the queen square is loaded

moves = h->mclr[queenSq] & 0x0000ffff;

moves is a u32 with each bit being a piece. So

pawns = pawn[wtm] & moves;

Gives all the pawn attackers. And that is all there is to it.

Here is the white perft function to see how it works. (Should have called it Bench because it is flawed as a Perft)

Code: Select all

void Wperf()
{
  u64 squares;
  u32 moves;
  s32 id;
  s32 fs;
  s32 ts;

  register history *h;

  h = & hist[ ply];

  squares = h->whiteSquares;
  if( depth == 1)  gen = FALSE;
  while(squares)
  {
    ts = FirstOne64(squares);
    moves = h->mclr[ts] & 0x0000ffff;
    while(moves)
    {
      id = FirstOne32(moves);
      fs =  piece[id].sq;
      if( depth == 1)
      {
        if(WGameMove(fs, ts)) 
        {
          WGameBack();
        }
      }
      else
      if(WGameMove(fs, ts))
      {
         depth--;
        Bperf();
         depth++;
        WGameBack();
      }
      moves &= mask32c[id];
    }
    squares &= mask64c[ts];
  }
   gen = TRUE;
}

The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials