Multithreaded movepath enumeration (perft)

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Multithreaded movepath enumeration (perft)

Post by sje »

For program testing and benchmarking purposes, I wrote a simpleminded multithreaded movepath enumerator. It works by launching a separate counting thread for each move in the root position and then summing the results.

All threads start at the same time, but they finish at different times because of the differences in move counts for each initial move. Each thread has its own transposition table, so there's no data sharing that would improve the overall speed.

Here's the trace for the five hour run of perft(10) = 69,352,859,712,417 on a 2.66 GHz quad core Mac Pro:

Code: Select all

Starting: Na3
Starting: Nc3
Starting: Nf3
Starting: Nh3
Starting: a3
Starting: a4
Starting: b3
Starting: b4
Starting: c3
Starting: c4
Starting: d3
Starting: d4
Starting: e3
Starting: e4
Starting: f3
Starting: f4
Starting: g3
Starting: g4
Starting: h3
Starting: h4
Finishing: f3   1945020011164
Finishing: a3   2149477156227
Finishing: h3   2142832044687
Finishing: f4   2418056589775
Finishing: Na3   2440848135252
Finishing: Nh3   2470483499345
Finishing: g4   2624128147144
Finishing: b3   2774842822463
Finishing: b4   2772533545113
Finishing: g3   2853630724145
Finishing: a4   2905552970419
Finishing: Nf3   3090773583680
Finishing: h4   2948003834105
Finishing: Nc3   3096505857746
Finishing: c3   3072577495123
Finishing: c4   3437747391692
Finishing: d3   5071006040569
Finishing: d4   6459463242656
Finishing: e3   7299373354878
Finishing: e4   7380003266234
count: 69352859712417

Done

real	302m27.128s
user	1101m13.876s
sys	2m33.075s
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Multithreaded movepath enumeration (perft)

Post by sje »

I modified the test program to restrict the number of worker threads to the number of cores. The wall time dropped from 302 minutes to 230 minutes, almost a 25 percent reduction due to a lack of thrashing.

Code: Select all

emptotal: 69352859712417

Done

real	230m6.369s
user	856m37.424s
sys	1m16.961s
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Multithreaded movepath enumeration (perft)

Post by sje »

Running perft(7) with no transpositions or bulk counting takes a little over four minutes on a lightly loaded four core 2.66 GHz Xeon Mac Pro. Here is a trace where the thread dispatch/completion move assignments are listed:

Code: Select all

Dispatch 0   Na3
Dispatch 1   Nc3
Dispatch 2   Nf3
Dispatch 3   Nh3
Complete 0   Na3   120142144
Dispatch 0   a3
Complete 3   Nh3   120669525
Dispatch 3   a4
Complete 2   Nf3   147678554
Dispatch 2   b3
Complete 1   Nc3   148527161
Dispatch 1   b4
Complete 0   a3   106743106
Dispatch 0   c3
Complete 3   a4   137077337
Dispatch 3   c4
Complete 2   b3   133233975
Dispatch 2   d3
Complete 1   b4   134087476
Dispatch 1   d4
Complete 0   c3   144074944
Dispatch 0   e3
Complete 3   c4   157756443
Dispatch 3   e4
Complete 2   d3   227598692
Dispatch 2   f3
Complete 1   d4   269605599
Dispatch 1   f4
Complete 2   f3   102021008
Dispatch 2   g3
Complete 1   f4   119614841
Dispatch 1   g4
Complete 0   e3   306138410
Dispatch 0   h3
Complete 2   g3   135987651
Dispatch 2   h4
Complete 3   e4   309478263
Complete 1   g4   130293018
Complete 0   h3   106678423
Complete 2   h4   138495290

Depth: 7   Count: 3195901860   Elapsed: 258.584
1.23592e+07 Hz   8.09111e-08 seconds
And the cryptically written code that drives the multithreaded enumeration:

Code: Select all

ui64 MetaPath::EMP(const Pos& pos, const ui depth)
{
  ui64 total;

  if (depth == 0) total = 1;
  else
  {
    PathBase *pbptrs[PathDeckLen];
    bool idle[ProcCoreLen];
    si moveindex[ProcCoreLen];
    ui dispatch = 0, completed = 0;
    MoveVec mvec;

    for &#40;ui i = 0; i < PathDeckLen; i++) pbptrs&#91;i&#93; = new PathBase&#40;);
    pos.GenerateBest&#40;mvec&#41;;
    
    const ui movecount = mvec.GetCount&#40;);
    
    for &#40;ui i = 0; i < movecount; i++) results&#91;i&#93; = 0;    
    for &#40;ui i = 0; i < corecount; i++) &#123;idle&#91;i&#93; = true; moveindex&#91;i&#93; = -1;&#125;

    while &#40;completed < movecount&#41;
    &#123;
      ui task;
    
      task = 0;
      while (&#40;task < corecount&#41; && &#40;dispatch < movecount&#41;)
      &#123;
        if &#40;idle&#91;task&#93; && &#40;dispatch <  movecount&#41;)
        &#123;
          Pos *posptr = new Pos&#40;pos&#41;;

          posptr->Execute&#40;mvec&#91;dispatch&#93;);
          idle&#91;task&#93; = false; moveindex&#91;task&#93; = dispatch;

          EventNode *enptr = new EventNode&#40;EkEnumerate&#41;;

          enptr->PutDepth&#40;depth - 1&#41;; enptr->PutPosptr&#40;posptr&#41;;
          enptr->PutCountptr&#40;&results&#91;dispatch&#93;); enptr->PutPBPtrs&#40;pbptrs&#41;;
          ptptrs&#91;task&#93;->ELAppend&#40;enptr&#41;;
          std&#58;&#58;cout <<
            "Dispatch " << task << "   " << mvec&#91;dispatch&#93;.FmtMove&#40;) << '\n';
          dispatch++;
        &#125;;
        task++;
      &#125;;
 
      task = 0;
      while (&#40;task < corecount&#41; && &#40;completed < movecount&#41;)
      &#123;
        if (!idle&#91;task&#93; && &#40;results&#91;moveindex&#91;task&#93;&#93; > 0&#41; && &#40;completed < movecount&#41;)
        &#123;
          std&#58;&#58;cout <<
            "Complete " << task << "   " << mvec&#91;moveindex&#91;task&#93;&#93;.FmtMove&#40;) <<
            "   " << results&#91;moveindex&#91;task&#93;&#93; << '\n';
          idle&#91;task&#93; = true; moveindex&#91;task&#93; = -1;
          completed++;
        &#125;;
        task++;
      &#125;;

      CatNap&#40;);
    &#125;;

    total = 0;
    for &#40;ui i = 0; i < movecount; i++) total += results&#91;i&#93;;
    for &#40;ui i = 0; i < PathDeckLen; i++) delete pbptrs&#91;i&#93;;
  &#125;;
  return total;
&#125;