OpenCL perft() Technical Issues

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Oscar the Cat

Post by sje »

Oscar is a fairly laid back sort of cat, but he loves to ankle surf to distract me whenever I'm in the kitchen. It takes the delivery of a freshly opened can of cat food to make him desist.

Oscar is the sixth shelter cat I've adopted (all male, three have since passed), and is my fourth black (or mostly black) cat. Apparently due to superstition, shelters have a hard time adopting out black cats, particularly older males. So that's who I get! Black cats have all have an impressively evil appearance; it's unfortunate that they're so difficult to photograph well.

He likes to keep an eye on me when I work, so I'll have to build him a perch to keep him off of the keyboard.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Oscar and transposition tables

Post by sje »

Currently, Oscar does not use transposition tables. These can be added later after various OpenCL issues are resolved. There are at present many unknowns regarding efficiency and portability topics with respect to differing OpenCL hosts, and so more research is needed before deploying techniques which could work well on one GPU model but refuse to even load on another.

--------

I've added SAN and FEN encoders to Oscar. Although not necessary for OpenCL operation, they are helpful with testing. Also, these and any other routines not strictly required for OpenCL operation can be quite useful for those using Oscar's source for development of their own chess applications.

The program now includes code for checking FEN validity, although this is slightly incomplete.

--------

It looks like there will be only two perft() applications using the Oscar framework, not three as I had reported earlier. Both will be the functionally the same except that one will be compiled for OpenCL while the other won't. Both will use command line options, and each capable of both one shot operation and work unit processing.

So, at least these two invocation styles:

Code: Select all

operft -w <workunit-file> -o <output-file> -d <depth>

operft -p <FEN-string> -d <depth>
and maybe more.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Running on a 3.4 GHz Core i5 (i5-4670).

Post by sje »

Ajedrecista wrote:I have just checked this position with 1 GB of hash in an Intel Pentium D930 (of year 2006) and JetChess 1.0.0.0 (single core, 32-bit) computed perft(7) in 475.013 seconds (0:06:55.013), that is, around 6.0459e+8 (leaf nodes)/second.
Symbolic, dual Xeon 5150 CPUs (also from 2006):

Code: Select all

&#91;&#93; sf r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10
&#91;&#93; pctran 7
Count&#58; 287,188,994,746   Pt&#58; 10&#58;13.081   Wt&#58; 2&#58;41.902   U&#58; 0.946681   1.87374 GHz   533.691 ps
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Generating move notation

Post by sje »

For a SAN move encoder to work, the raw moves must have various flag bits to indicate check, checkmate, and file/rank disambiguation. Such is not strictly neede for perft(), but having real SAN output makes debugging easier.

Here are Oscar's notation move flag generation routines:

Code: Select all

static void PositionNotateChecks&#40;Position * const positionptr, MoveSeg * const movesegptr&#41;
&#123;
  // This routine notates checking moves in the given move segment for the given position.

  ui index;

  for &#40;index = 0; index < movesegptr->count; index++)
  &#123;
    PositionExecute&#40;positionptr, movesegptr->baseptr&#91;index&#93;);
    if &#40;positionptr->apd.incheck&#41;
      SetMf&#40;movesegptr->baseptr&#91;index&#93;, MfChck&#41;;
    PositionRetract&#40;positionptr&#41;;
  &#125;;
&#125;

static void PositionNotateCheckmates&#40;Position * const positionptr, MoveSeg * const movesegptr&#41;
&#123;
  // This routine notates checkmating moves in the given move segment for the given position.

  ui index;

  for &#40;index = 0; index < movesegptr->count; index++)
  &#123;
    const Move move = movesegptr->baseptr&#91;index&#93;;

    if &#40;TestMf&#40;move, MfChck&#41;)
    &#123;
      PositionExecute&#40;positionptr, move&#41;;
      if &#40;PositionIsCheckmate&#40;positionptr&#41;)
        SetMf&#40;movesegptr->baseptr&#91;index&#93;, MfMate&#41;;
      PositionRetract&#40;positionptr&#41;;
    &#125;;
  &#125;;
&#125;

static void PositionNotateDisambiguation&#40;const Position * const positionptr, MoveSeg * const movesegptr&#41;
&#123;
  // This routine notates disambiguation of the moves in the given move segment for the given position.

  const Move * const baseptr = movesegptr->baseptr;
  const ui limit = movesegptr->count;
  ui index0;
  Sq tosq;
  Census census;
  ui landings&#91;SqLen&#93;;

  CensusLoadFromBoard&#40;&census, &positionptr->board&#41;;

  for &#40;tosq = SqA1; tosq <= SqH8; tosq++)
    landings&#91;tosq&#93; = 0;
  for &#40;index0 = 0; index0 < limit; index0++)
    landings&#91;GetToSq&#40;baseptr&#91;index0&#93;)&#93;++;

  for &#40;index0 = 0; index0 < limit; index0++)
  &#123;
    const Move move0 = baseptr&#91;index0&#93;;
    const Man frman0 = GetFrMan&#40;move0&#41;;
    const Sq tosq0 = GetToSq&#40;move0&#41;;

    if &#40;IsManQRBN&#40;frman0&#41; && &#40;census.mancount&#91;frman0&#93; > 1&#41; && &#40;landings&#91;tosq0&#93; > 1&#41;)
    &#123;
      const Sq frsq0 = GetFrSq&#40;move0&#41;;
      const File frfile0 = MapSqToFile&#40;frsq0&#41;;
      const Rank frrank0 = MapSqToRank&#40;frsq0&#41;;
      ui index1;
      ui msc = 0, mfc = 0, mrc = 0;

      for &#40;index1 = 0; index1 < limit; index1++)
      &#123;
        const Move move1 = baseptr&#91;index1&#93;;
        const Sq frsq1 = GetFrSq&#40;move1&#41;;

        if (&#40;frman0 == GetFrMan&#40;move1&#41;) && &#40;tosq0 == GetToSq&#40;move1&#41;) && &#40;frsq0 != frsq1&#41;)
        &#123;
          msc++;
          if &#40;MapSqToFile&#40;frsq1&#41; == frfile0&#41;
            mfc++;
          if &#40;MapSqToRank&#40;frsq1&#41; == frrank0&#41;
            mrc++;
        &#125;;
      &#125;;

      if &#40;msc > 0&#41;
      &#123;
        if (&#40;mfc == 0&#41; || &#40;mrc > 0&#41;)
          SetMf&#40;movesegptr->baseptr&#91;index0&#93;, MfAndf&#41;;
        if &#40;mfc > 0&#41;
          SetMf&#40;movesegptr->baseptr&#91;index0&#93;, MfAndr&#41;;
      &#125;;
    &#125;;
  &#125;;
&#125;

static void PositionNotate&#40;Position * const positionptr, MoveSeg * const movesegptr&#41;
&#123;
  // This routine fully notates the moves in the given move segment for the given position.

  PositionNotateChecks&#40;positionptr, movesegptr&#41;;
  PositionNotateCheckmates&#40;positionptr, movesegptr&#41;;
  PositionNotateDisambiguation&#40;positionptr, movesegptr&#41;;
&#125;
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Sorting moves by SAN

Post by sje »

A move sort routine is handy for a number of reasons. For perft(), a sorted move vector makes it easier to compare output from different programs, or different versions of the same program.

For Oscar, running in OpenCL mode adds a few challenges in regard to sorting. Usually, it's best to just pick one of the sort routines already present in the C/C++ standard library. But with OpenCL, there's no guarantee that the library will have the routine, or even if there is any library at all. Further, even if a library routine is available, it may be too large or it could drag in too much other code to operate under OpenCL memory constraints.

So, I wrote a SAN sort for Oscar with no library dependencies. It's just a simple bubble sort; there's not much need for speed for something which only runs at ply zero. Further, since recursion is not allowed, the more exotic sorting algorithms would need a non recursive adaptation.

Code: Select all

static void MoveSegSortSan&#40;MoveSeg * const movesegptr&#41;
&#123;
  // This routine sorts by SAN the moves in the given move storage segment.

  const ui count = movesegptr->count;

  if &#40;count > 1&#41;
  &#123;
    Move * const baseptr = movesegptr->baseptr;
    ui index;
    bool changed = true;
    const ui passcount = count - 1;
    ui pass = 0;
    SanRec sanvec&#91;MoveGenLen&#93;;

    for &#40;index = 0; index < count; index++)
      SanRecLoadFromMove&#40;&sanvec&#91;index&#93;, baseptr&#91;index&#93;);

    while &#40;changed && &#40;pass < passcount&#41;)
    &#123;
      const ui paircount = count - pass - 1;
      ui pair;

      changed = false;
      for &#40;pair = 0; pair < paircount; pair++)
      &#123;
        const ui eindex0 = pair + 0, eindex1 = pair + 1;

        if &#40;CompareStrings&#40;sanvec&#91;eindex0&#93;.chvec, sanvec&#91;eindex1&#93;.chvec&#41; > 0&#41;
        &#123;
          const Move move0 = baseptr&#91;eindex0&#93;, move1 = baseptr&#91;eindex1&#93;;
          const SanRec sanrec0 = sanvec&#91;eindex0&#93;, sanrec1 = sanvec&#91;eindex1&#93;;

          baseptr&#91;eindex0&#93; = move1;
          baseptr&#91;eindex1&#93; = move0;
          sanvec&#91;eindex0&#93; = sanrec1;
          sanvec&#91;eindex1&#93; = sanrec0;
          changed = true;
        &#125;;
      &#125;;
      pass++;
    &#125;;
  &#125;;
&#125;
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

FEN verification

Post by sje »

Oscar now has full support for verification of FEN strings. While not strictly necessary, a little extra checking never hurts and it can be done quickly without using too much space.

The FEN validation code includes full testing of the en passant target square. If the square is not nil, then there must be at least one en passant capture available in the position. All possible constraints are checked, even making sure that the half move counter is zero.

--------

Inside the move Execute() routine, Oscar will set the en passant target square to a non-nil value if and only if at least one en passant capture is available. This helps ensure correct hash signature generation and relieves the move generation and move counting routines from doing checking themselves. Further, this supports correct FEN string generation at any time during calculations; a help with debugging.

--------

An upcoming task is the development of the parameter structure definitions needed to communicate input and output values between the driver an an OpenCL kernel. There is a need for caution: the OpenCL header file has a number of scalar type definitions which are different for different platforms. Alignment may be an issue, and the endian sense may be an issue.

All OpenCL kernel/driver parameters must be passed by value in both directions; pointers are disallowed because the kernel and the driver have different memory spaces.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Transposition table assistance

Post by sje »

Oscar now has transposition table assistance along with incremental hash signature updating. Everything seems to work, including hashing of castling availability status and en passant targets.

When performing perft(), there's no need for a hash contribution based on moving color when a hash contribution based on ply is folded into the main hash prior to hitting the table.

Oscar uses a 128 bit hash although this could be changed to a 64 bit hash for those who are willing to take a larger risk of false positives. Timing shows no measurable difference between the non hash version and the hash version with 128 bit codes when fetch/store operations are deactivated. So, while there is an extra cost in space for 128 bit codes vs 64 bit codes, there's not one in time.

At present, the element count of Oscar's transposition table is determined at compile time. The reason for this is that there's no guarantee that malloc() or its pals will work well, or even exist in a truly portable OpenCL program. More research is needed on this topic.

From the header file:

Code: Select all

// Perft transposition table items

#define PTTAddrBitLen  24                 // Log2&#40;Transposition table length&#41;
#define PTTAddrMask    MX&#40;PTTAddrBitLen&#41;  // Transposition table index mask
#define PTTLen         BX&#40;PTTAddrBitLen&#41;  // Transposition table length

// Hashes

typedef struct
&#123;
  ui64 qwrd0;  // 1st quad word
  ui64 qwrd1;  // 2nd quad word
&#125; Hash;

// Perft transposition table element

typedef struct
&#123;
  Hash      msig;       // Main hash signature; adjusted for ply
  NodeCount nodecount;  // Movepath count
&#125; PerftTTRec;
Constant value key storage:

Code: Select all

static Hash ManSqHash&#91;ManRCLen&#93;&#91;SqLen&#93;;  // Hash codes indexed by man and square
static Hash CabsHash&#91;CabsLen&#93;;           // Hash codes indexed by cabs
static Hash EpFileHash&#91;FileLen&#93;;         // Hash codes indexed by en passant target file
static Hash PlyHash&#91;PlyLen&#93;;             // Hash codes indexed by ply
The table (currently, Oscar's only global variable):

Code: Select all

// Transposition table storage

static PerftTTRec PTT&#91;PTTLen&#93;;  // Perft transposition table
Fetch and store:

Code: Select all

static bool PositionPTTFetch&#40;const Position * const positionptr, const ui ply, NodeCount * const nodecountptr&#41;
&#123;
  bool found;
  Hash addrhash = positionptr->msig;

  HashFoldPly&#40;&addrhash, ply&#41;;

  const ui index0 = &#40;ui&#41; &#40;addrhash.qwrd0 & PTTAddrMask&#41;;
  PerftTTRec * const ptr0 = &PTT&#91;index0&#93;;

  if &#40;HashTestEqual&#40;&addrhash, &ptr0->msig&#41;)
  &#123;
    *nodecountptr = ptr0->nodecount;
    found = true;
  &#125;
  else
  &#123;
    const ui index1 = index0 ^ 1;
    PerftTTRec * const ptr1 = &PTT&#91;index1&#93;;
    
    if &#40;HashTestEqual&#40;&addrhash, &ptr1->msig&#41;)
    &#123;
      *nodecountptr = ptr1->nodecount;
      found = true;
    &#125;
    else
    &#123;
      *nodecountptr = 0;
      found = false;
    &#125;;
  &#125;;
  return found;
&#125;

static void PositionPTTStash&#40;const Position * const positionptr, const ui ply, const NodeCount nodecount&#41;
&#123;
  Hash addrhash = positionptr->msig;

  HashFoldPly&#40;&addrhash, ply&#41;;

  const ui index0 = &#40;ui&#41; &#40;addrhash.qwrd0 & PTTAddrMask&#41;;
  const ui index1 = index0 ^ 1;
  PerftTTRec * const ptr0 = &PTT&#91;index0&#93;;
  PerftTTRec * const ptr1 = &PTT&#91;index1&#93;;
  PerftTTRec * ptr;
  
  if &#40;ptr0->nodecount < ptr1->nodecount&#41;
    ptr = ptr0;
  else
    ptr = ptr1;
  ptr->msig = addrhash;
  ptr->nodecount = nodecount;
&#125;
The construction of the constant hashes is detailed in the next post.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Cooking hash

Post by sje »

Oscar's constant value hash codes start with the first eight primes:

Code: Select all

static const char CvOrdinalToPrime&#91;PrimeLen&#93; =
&#123;
  2, 3, 5, 7, 11, 13, 17, 19
&#125;;
The product of these primes is 9,699,690 which is also the period of the following pseudorandom bit generator which uses those primes:

Code: Select all

static ui CalcHashBit&#40;const ui bitordinal&#41;
&#123;
  ui result = 0;
  ui index;

  for &#40;index = 0; index < PrimeLen; index++)
    result ^= &#40;bitordinal % CvOrdinalToPrime&#91;index&#93;) & 0x01;
  return result;
&#125;
We want to build hash code constants from pairs of 64 bit integers, so to get a single 64 bit pseudorandom value:

Code: Select all

static ui64 CalcHashQwrd&#40;const ui qwrdordinal&#41;
&#123;
  ui64 result = 0;
  ui index;

  for &#40;index = 0; index < QwrdBitsLen; index++)
  &#123;
    result |= CalcHashBit&#40;&#40;qwrdordinal * QwrdBitsLen&#41; + index&#41;;
    result <<= 1;
  &#125;;
  return result;
&#125;
Finally, to make all the constant hashes which Oscar needs:

Code: Select all

static void InitHashes&#40;void&#41;
&#123;
  // This routine initializes the constant value for the base hash codes.

  ui ordinal = 0;
  Man man;
  Cabs cabs;
  File file;
  ui ply;

  for &#40;man = ManWhitePawn; man <= ManBlackKing; man++)
  &#123;
    Sq sq;

    for &#40;sq = SqA1; sq <= SqH8; sq++)
    &#123;
      ManSqHash&#91;man&#93;&#91;sq&#93;.qwrd0 = CalcHashQwrd&#40;ordinal++);
      ManSqHash&#91;man&#93;&#91;sq&#93;.qwrd1 = CalcHashQwrd&#40;ordinal++);
    &#125;;
  &#125;;

  for &#40;cabs = CabsX0; cabs <= CabsXf; cabs++)
  &#123;
    CabsHash&#91;cabs&#93;.qwrd0 = CalcHashQwrd&#40;ordinal++);
    CabsHash&#91;cabs&#93;.qwrd1 = CalcHashQwrd&#40;ordinal++);
  &#125;;

  for &#40;file = FileA; file <= FileH; file++)
  &#123;
    EpFileHash&#91;file&#93;.qwrd0 = CalcHashQwrd&#40;ordinal++);
    EpFileHash&#91;file&#93;.qwrd1 = CalcHashQwrd&#40;ordinal++);
  &#125;;

  for &#40;ply = 0; ply < PlyLen; ply++)
  &#123;
    PlyHash&#91;ply&#93;.qwrd0 = CalcHashQwrd&#40;ordinal++);
    PlyHash&#91;ply&#93;.qwrd1 = CalcHashQwrd&#40;ordinal++);
  &#125;;
&#125;
The above uses the first 105,472 bit values of the bit generator.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Sample runs

Post by sje »

Oscar with transposition table assistance running on a single thread on a year 2006 2.66 GHz Xeon:

Code: Select all

gail&#58;tmp sje$ ./operft "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 6
Na3 4856835
Nc3 5708064
Nf3 5723523
Nh3 4877234
a3 4463267
a4 5363555
b3 5310358
b4 5293555
c3 5417640
c4 5866666
d3 8073082
d4 8879566
e3 9726018
e4 9771632
f3 4404141
f4 4890429
g3 5346260
g4 5239875
h3 4463070
h4 5385554
119060324   55.03 MHz
gail&#58;tmp sje$ ./operft "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 7
Na3 120142144
Nc3 148527161
Nf3 147678554
Nh3 120669525
a3 106743106
a4 137077337
b3 133233975
b4 134087476
c3 144074944
c4 157756443
d3 227598692
d4 269605599
e3 306138410
e4 309478263
f3 102021008
f4 119614841
g3 135987651
g4 130293018
h3 106678423
h4 138495290
3195901860   147.68 MHz
gail&#58;tmp sje$ ./operft "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 8
Na3 3193522577
Nc3 3926684340
Nf3 3937354096
Nh3 3221278282
a3 2863411653
a4 3676309619
b3 3579299617
b4 3569067629
c3 3806229124
c4 4199667616
d3 6093248619
d4 7184581950
e3 8039390919
e4 8102108221
f3 2728615868
f4 3199039406
g3 3641432923
g4 3466204702
h3 2860408680
h4 3711123115
84998978956   318.24 MHz
gail&#58;tmp sje$ ./operft "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 9
Na3 85849641909
Nc3 109418317145
Nf3 108393009416
Nh3 86659653631
a3 74950758099
a4 101265301849
b3 96577095997
b4 97442160946
c3 108697368719
c4 120549219832
d3 176976245463
d4 227220482342
e3 259522947791
e4 263561543780
f3 68094899093
f4 84792070664
g3 99646370024
g4 92281289941
h3 74778417365
h4 102853440161
2439530234167   542.32 MHz
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Then again

Post by sje »

Then again, there's nothing like making a public code posting to trigger detection of faulty code.

The quadword pseudorandom value generator was leaving bit zero reset. The corrected version is:

Code: Select all

static ui64 CalcHashQwrd&#40;const ui qwrdordinal&#41;
&#123;
  ui64 result = 0;
  ui index;

  for &#40;index = 0; index < QwrdBitsLen; index++)
  &#123;
    result <<= 1;
    result |= CalcHashBit&#40;&#40;qwrdordinal * QwrdBitsLen&#41; + index&#41;;
  &#125;;
  return result;
&#125;
Better initialization:

Code: Select all

static void HashAssign&#40;Hash * const hashptr, ui * const ordinalptr&#41;
&#123;
  // This routine assigns a pseudorandom value to the given hash object.
  
  hashptr->qwrd0 = CalcHashQwrd&#40;(*ordinalptr&#41;++);
  hashptr->qwrd1 = CalcHashQwrd&#40;(*ordinalptr&#41;++);
&#125;
The above called by:

Code: Select all

static void InitHashes&#40;void&#41;
&#123;
  // This routine initializes the constant value for the base hash codes.

  ui ordinal = 0;
  Man man;
  Sq sq;
  Cabs cabs;
  File file;
  ui ply;

  for &#40;man = ManWhitePawn; man <= ManBlackKing; man++)
    for &#40;sq = SqA1; sq <= SqH8; sq++)
      HashAssign&#40;&ManSqHash&#91;man&#93;&#91;sq&#93;, &ordinal&#41;;

  for &#40;cabs = CabsX0; cabs <= CabsXf; cabs++)
    HashAssign&#40;&CabsHash&#91;cabs&#93;, &ordinal&#41;;

  for &#40;file = FileA; file <= FileH; file++)
    HashAssign&#40;&EpFileHash&#91;file&#93;, &ordinal&#41;;

  for &#40;ply = 0; ply < PlyLen; ply++)
    HashAssign&#40;&PlyHash&#91;ply&#93;, &ordinal&#41;;
&#125;