OpenCL perft() Technical Issues

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

OpenCL perft() Technical Issues

Post by sje »

OpenCL perft() Technical Issues

First, I'm going to give a more concise name to the OpenCL/portable ANSI C perft() program because I am a slow typist. The program has been renamed Oscar after a shelter cat who arrived at my home three months ago.

--------

Oscar the program started as an OpenCL exploratory effort and that will remain its primary purpose. That's why Oscar has a small code footprint, a small data footprint, and doesn't demand anything not available for an OpenCL kernel. So, no recursion, no function pointers, and no expectation of run time support libraries.

Oscar's kernel source has no preprocessor include directives which reference any of the standard library headers. In fact, there is only one include directive, the one in Oscar's single C source file ("Oscar.c") references Oscar's single C header file ("Oscar.h"). It would be possible to just have a single source file where the header text is simply jammed into the front of the C source file. I may eventually do that, but for now it's easier to have the two files be separate. Also, Oscar's header file has a single extern declaration, the entry point for the kernel to be called by the test driver or by the OpenCL scheduler.

Oscar does not use bitboards in order to keep its footprint small and to run decently on systems less capable than those with multicore 64 bit CPUs and big memory bandwidth. This makes Oscar a good candidate for targeting inexpensive single board computers like the Arduino, the Raspberry Pi, and the BeagleBone Black.

--------

Oscar can built and deployed in a number of different ways.

The first way is a simple test harness which runs as a single thread on a CPU. This is mostly what I've working with now. This program is called operft and it takes two command line arguments: a quoted FEN string and an integer giving the depth of the desired perft(). There is not much error checking as I am too lazy to provide such at present. The output of operft is a single decimal integer (no commas) printed to the standard output. The program can be built and run on any system with an ANSI C tool chain.

The second way also runs as a single thread on a CPU, but processes a whole work unit at a time. This program is called operftwu and it takes two command line arguments: the input work unit file name and the result file name. The work units and the results have the same formats as those described in my other posts. The purpose of this program is for testing work unit batch processing issues which are independent of OpenCL issues. The program can also be used on platforms which have no OpenCL capability. Like operft, opertwu can be built and run on any system having an ANSI C tool chain.

The third way requires OpenCL and is called operftocl. It is invoked exactly like operftwu and has the same functionality except that all of the chess work is done through OpenCL. Note that some computers, like the old Mac I'm using right now, force OpenCL to use only the CPU because of the lack of a spiffy GPU. The idea is that the operftocl program will be the main workhorse in the perft(14) effort.

All source for all versions will be freely available, likely released with the standard GNU license.

--------

One challenge with OpenCL is the mechanism by which an OpenCL application is built. At present, I can do this only with Macs running Mac OS/X 10.7 or better. I may later need some assistance with building operftocl for Linux systems. Also, the Macs I have use either OpenCL 1.0 or OpenCL 1.2; some help will be needed targeting other versions of OpenCL. I've got no idea on how to make an OpenCL program using Windows.

Those wanting to build operftocl on a Mac will need the Xcode IDE to use it to set up a project file. Fortunately, Apple has many, many megabytes of helpful documentation plus sample source code online for free. It may also be possible to evade Xcode and build an OpenCL program using only command line calls.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Some issues specific to OpenCL applications

Post by sje »

Some issues specific to OpenCL applications

There are some issues specific to OpenCL with respect to source code mangling. It appears that the OpenCL tool will eat the C source and spit out a special source file suitable for a second trip done the digestive tract. How this exactly works I don't know and I don't want to know.

There is an OpenCL C header file that needs to be included. This file contains various typedef and other definitions needed to map target C types to target OpenCL types. There are also some non-standard attributes attached to some definitions for handling alignment requirements. To handle this and other OpenCL foolery, Oscar's source files will need some preprocessor directives to allow a single version of Oscar's source to be used for both the OpenCL version and the non OpenCL version. I need to find out which preprocessor symbol to test to indicate an OpenCL target.

I hope there won't be too much trouble; Oscar, like Symbolic, uses its own scalar type definitions and it should be easy to redefine these with the OpenCL types for an OpenCL target.

--------

OpenCL requires the use of a scheduler (provided) which activates the OpenCL processors at run time. This scheduler needs a number of inputs including a pointer to the input data records, a pointer to the output data records, the name of the kernel, and a work shape. The work shape includes a number of items including a three dimensional array specification of the work flow. For Oscar, this is set up as a single dimension vector with an element count equal to the number of FEN records loaded into temporary storage. The work shape also tells the chunk size and the maximum number of OpenCL cores to be used at a time. Fortunately, all of the details can be ignored by an end user. Only the adventurous souls who want to fiddle with the source need delve into the OpenCL pit of madness.

--------

For both the operftocl and opertwu programs, Oscar will gobble the entire contents of a perft() work unit before starting any chess calculations. The operftcl program needs to have all the data before calling the scheduler, and the scheduler won't return until all the records are processed. This may not be the best approach if a work unit requires many days of calculation due to the possibility of power outages causing all work to be lost.

It may be a good idea to have both opertocl and operftwu work on small chunks of a work unit at a time. A problem here is that (apparently) the OpenCL scheduler won't return until all records in a chunk are processed, so the running time for a chunk will be no less than the time needed to handle the most difficult record. So there is a trade-off between the need for a restart capability and the total OpenCL throughput.

--------

An OpenCL kernel does not have the standard niceties for memory protection, interactive debugging, or even the last ditch fprintf() logging call. That's why all the chess stuff has to be debugged outside of OpenCL, and that's why the operft and operftwu programs are used for as much of the development as possible. Also, those modifying Oscar's opertocl have to be prepared to have their machine crash or freeze with a messed up display. This is reminiscent of the Old Days with programming a Mac before OS/X; a bad pointer dereference meant toggling the power switch and waiting for a reboot. A really bad pointer dereference would mess up the OS in some evil manner and contaminate the machine leading to even more problems.

Also from the Old Days, it was possible on some machines like the IBM PC for a program to adjust the video adapter with bad frequency parameters. How bad? There were reports of malicious programs which would reconfigure the adapter to cause the monitor to heat up and catch fire. I doubt that this is possible nowadays, but even so I disclaim all responsibility ahead of time, just in case.
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Some issues specific to OpenCL applications

Post by ZirconiumX »

I would be willing to have my Linux computer act as a test for Linux OpenCL. I'm using proprietary drivers, so OpenCL should be up to date.

Matthew:out
Some believe in the almighty dollar.

I believe in the almighty printf statement.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Oscar the Cat

Post by sje »

Oscar had a rough life on the streets but now he's enjoying a safe and comfortable retirement indoors. He got his name because of his many battle scars. Now close to 20 pounds, he leaves an Oscar-size dent on the bed.
Image
Oscar contributes to the programming effort by sitting on the keyboard to tell me when it's time to take a rest break.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Some issues specific to OpenCL applications

Post by sje »

ZirconiumX wrote:I would be willing to have my Linux computer act as a test for Linux OpenCL. I'm using proprietary drivers, so OpenCL should be up to date.
This will be very helpful because I am clueless with respect to making OpenCL work with Linux.

I have several Linux boxes, all using Intel or AMD CPUs and all running the latest Debian distribution and updated regularly.

I also have an old iMac which I've at times used with Debian Linux for the PowerPC. This iMac has ATI Rage 128 Pro video with 8 MB RAM and it just might have OpenCL capability under Linux.
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Some issues specific to OpenCL applications

Post by ZirconiumX »

sje wrote:
ZirconiumX wrote:I would be willing to have my Linux computer act as a test for Linux OpenCL. I'm using proprietary drivers, so OpenCL should be up to date.
This will be very helpful because I am clueless with respect to making OpenCL work with Linux.

I have several Linux boxes, all using Intel or AMD CPUs and all running the latest Debian distribution and updated regularly.

I also have an old iMac which I've at times used with Debian Linux for the PowerPC. This iMac has ATI Rage 128 Pro video with 8 MB RAM and it just might have OpenCL capability under Linux.
Sadly, OpenCL is only officially supported on HD 4xxx and higher, and the team have no particular interest in adding support for such an old GPU.

My GPU is a Radeon HD 6670.

Matthew:out
Some believe in the almighty dollar.

I believe in the almighty printf statement.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Sample invocations of operft

Post by sje »

Here are a few sample invocations of operft, Oscar's test program for non OpenCL operation one position at a time. The program has correctly handled all of the perft() test positions I could find.

Code: Select all

gail:tmp sje$ ./operft "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 4
197281
gail:tmp sje$ ./operft "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 5
4865609
gail:tmp sje$ ./operft "r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1" 4
4085603
gail:tmp sje$ ./operft "r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1" 5
193690690
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Chess coding for small systems

Post by sje »

Chess coding for small systems: pre-calculated tables

Oscar's chess code is intended for small systems, those with perhaps with no more than 16 KB RAM. The idea is to allow only a few tables of pre-calculated data, those which will be of most help for doing chess.

The space parsimony also helps on much more powerful machines, for it allows most or all of the code and data to reside in level one cache.

Oscar's most referenced table is the 1,024 signed byte array Next

Code: Select all

static si8 Next[DirLen][SqLen];  // For each direction, for each square: the next square
DirLen = number of directs = 16 = 4 orthogonal + 4 diagonal + 8 crooked
SqLen = number of squares = 64 (a1, b1, … h8)

Initialization:

Code: Select all

static void InitNext(void)
{
  Dir dir;
  
  for &#40;dir = DirE; dir <= DirESE; dir++)
  &#123;
    const si df = DeltaFile&#91;dir&#93;;
    const si dr = DeltaRank&#91;dir&#93;;
    Sq sq;
    
    for &#40;sq = SqA1; sq <= SqH8; sq++)
    &#123;
      const si f1 = df + &#40;si&#41; MapSqToFile&#40;sq&#41;;
      const si r1 = dr + &#40;si&#41; MapSqToRank&#40;sq&#41;;
      
      if (&#40;f1 < 0&#41; || &#40;f1 >= FileLen&#41; || &#40;r1 < 0&#41; || &#40;r1 >= RankLen&#41;)
        Next&#91;dir&#93;&#91;sq&#93; = SqNil;
      else
        Next&#91;dir&#93;&#91;sq&#93; = MapFileRankToSq&#40;&#40;File&#41; f1, &#40;Rank&#41; r1&#41;;
    &#125;;
  &#125;;
&#125;
The Next array is quickly initialized at start time, but could also be initialized at compile time with a slight loss of transparency. It keeps the program from going off the edge of the board by returning a value of SqNil (= -1) where appropriate.

--------

For storing pre-calculated data regarding the direction relationships between any two squares, Oscar has the 4,096 array SqSqInfo:

Code: Select all

static ui8 SqSqInfo&#91;SqLen&#93;&#91;SqLen&#93;;  // For each from-square, for each to-square&#58; directional info
Initialization:

Code: Select all

// SqSqInfo bit/mask coding

#define InfoHasDir BX&#40;7&#41;
#define InfoAdjSqs BX&#40;6&#41;
#define InfoAtkfBP BX&#40;5&#41;
#define InfoAtkfWP BX&#40;4&#41;
#define InfoDirMsk MX&#40;4&#41;

static void InitSqSqInfo&#40;void&#41;
&#123;
  Sq frsq, tosq;
  
  for &#40;frsq = SqA1; frsq <= SqH8; frsq++)
  &#123;
    for &#40;tosq = SqA1; tosq <= SqH8; tosq++)
    &#123;
      ui8 infobyte;
      const Dir dir = CalcDir&#40;frsq, tosq&#41;;
      
      if &#40;IsDirNil&#40;dir&#41;)
        infobyte = 0;
      else
      &#123;
        infobyte = InfoHasDir;
        if &#40;AdjacentSquares&#40;frsq, tosq&#41;)
        &#123;
          infobyte |= InfoAdjSqs;
          if &#40;IsDirDiago&#40;dir&#41;)
          &#123;
            if (&#40;dir >= DirSW&#41;)
              infobyte |= InfoAtkfBP;
            else
              infobyte |= InfoAtkfWP;
          &#125;;
        &#125;;
        infobyte |= &#40;dir & InfoDirMsk&#41;;
      &#125;;
      SqSqInfo&#91;frsq&#93;&#91;tosq&#93; = infobyte;
    &#125;;
  &#125;;
&#125;
As you might expect, the SqSqInfo array is referenced in many places.

Some initialization helper routines:

Code: Select all

static bool AdjacentSquares&#40;const Sq frsq, const Sq tosq&#41;
&#123;
  bool result;
  
  if &#40;frsq == tosq&#41;
    result = false;
  else
  &#123;
    const si fd = MapSqToFile&#40;tosq&#41; - MapSqToFile&#40;frsq&#41;;
    const si rd = MapSqToRank&#40;tosq&#41; - MapSqToRank&#40;frsq&#41;;
    const si afd = &#40;fd >= 0&#41; ? fd &#58; -fd;
    const si ard = &#40;rd >= 0&#41; ? rd &#58; -rd;
    
    result = &#40;afd <= 1&#41; && &#40;ard <= 1&#41;;
  &#125;;
  return result;
&#125;

static Dir CalcDir&#40;const Sq frsq, const Sq tosq&#41;
&#123;
  Dir dir;
  
  if &#40;frsq == tosq&#41;
    dir = DirNil;
  else
  &#123;
    const si fd = MapSqToFile&#40;tosq&#41; - MapSqToFile&#40;frsq&#41;;
    const si rd = MapSqToRank&#40;tosq&#41; - MapSqToRank&#40;frsq&#41;;
    
    if (&#40;fd == 0&#41; || &#40;rd == 0&#41;)
    &#123;
      if &#40;fd == 0&#41;
        dir = &#40;rd > 0&#41; ? DirN &#58; DirS;
      else
        dir = &#40;fd > 0&#41; ? DirE &#58; DirW;
    &#125;
    else
    &#123;
      const si afd = &#40;fd >= 0&#41; ? fd &#58; -fd;
      const si ard = &#40;rd >= 0&#41; ? rd &#58; -rd;

      if &#40;afd == ard&#41;
      &#123;
        if &#40;fd == rd&#41;
          dir = &#40;rd > 0&#41; ? DirNE &#58; DirSW;
        else
          dir = &#40;fd > 0&#41; ? DirSE &#58; DirNW;
      &#125;
      else
      &#123;
        const bool fn = &#40;afd == 2&#41; && &#40;ard == 1&#41;, rn = &#40;afd == 1&#41; && &#40;ard == 2&#41;;
        
        if &#40;fn || rn&#41;
        &#123;
          if &#40;fn&#41;
          &#123;
            if &#40;fd > 0&#41;
              dir = &#40;rd > 0&#41; ? DirENE &#58; DirESE;
            else
              dir = &#40;rd > 0&#41; ? DirWNW &#58; DirWSW;
          &#125;
          else
          &#123;
            if &#40;rd > 0&#41;
              dir = &#40;fd > 0&#41; ? DirNNE &#58; DirNNW;
            else
              dir = &#40;fd > 0&#41; ? DirSSE &#58; DirSSW;
          &#125;;
        &#125;
        else
          dir = DirNil;
      &#125;;
    &#125;;
  &#125;;
  return dir;
&#125;
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: OpenCL perft() Technical Issues

Post by Modern Times »

sje wrote: The program has been renamed Oscar after a shelter cat who arrived at my home three months ago.
I hope Oscar is settling in well and behaving himself !

You've done a wonderful thing giving him a home.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Some issues specific to OpenCL applications

Post by sje »

ZirconiumX wrote:My GPU is a Radeon HD 6670.
Thanks to the lspci command, I now remember that the video card in my Core i7-2600 Linux box is an Nvidia GT 430 (96 CUDA cores, 1 GB DDR3 RAM, 1.4 GHz processor clock) released in 2010 for about US$80.

My AMD Athlon II X4 Linux machine has a Nvidia GeForce 6150SE nForce 430. I'm not sure if it supports any version of OpenCL.