Explanation for non-expert?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Explanation for non-expert?

Post by zullil »

Can someone explain in general terms the following code from the recent smp patch for SF:

Code: Select all

     SplitPoint *bestSp = NULL;
              int bestThread = 0;
              int bestScore = INT_MAX;

              for &#40;size_t i = 0; i < Threads.size&#40;); ++i&#41;
              &#123;
                  const int size = Threads&#91;i&#93;->splitPointsSize; // Local copy
                  sp = size ? &Threads&#91;i&#93;->splitPoints&#91;size - 1&#93; &#58; NULL;

                  if (   sp
                      && sp->allSlavesSearching
                      && sp->slavesCount < MAX_SLAVES_PER_SPLITPOINT
                      && available_to&#40;Threads&#91;i&#93;))
                  &#123;
                      int score = sp->spLevel * 256 * 256 + sp->slavesCount * 256 - sp->depth * 1;

                      if &#40;score < bestScore&#41;
                      &#123;
                           bestSp = sp;
                           bestThread = i;
                           bestScore = score;
                      &#125;
                  &#125;
              &#125;

              if &#40;bestSp&#41;
              &#123;
                  sp = bestSp;
 
                  // Recheck the conditions under lock protection
                  Threads.mutex.lock&#40;);
                  sp->mutex.lock&#40;);

                  if (   sp->allSlavesSearching
                      && sp->slavesCount < MAX_SLAVES_PER_SPLITPOINT
                      && available_to&#40;Threads&#91;bestThread&#93;))
                  &#123;
                      sp->slavesMask.set&#40;idx&#41;;
                      sp->slavesCount++;
                      activeSplitPoint = sp;
                      searching = true;
                  &#125;

                  sp->mutex.unlock&#40;);
                  Threads.mutex.unlock&#40;);
              &#125;
          &#125;
It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Explanation for non-expert?

Post by bob »

zullil wrote:Can someone explain in general terms the following code from the recent smp patch for SF:

Code: Select all

     SplitPoint *bestSp = NULL;
              int bestThread = 0;
              int bestScore = INT_MAX;

              for &#40;size_t i = 0; i < Threads.size&#40;); ++i&#41;
              &#123;
                  const int size = Threads&#91;i&#93;->splitPointsSize; // Local copy
                  sp = size ? &Threads&#91;i&#93;->splitPoints&#91;size - 1&#93; &#58; NULL;

                  if (   sp
                      && sp->allSlavesSearching
                      && sp->slavesCount < MAX_SLAVES_PER_SPLITPOINT
                      && available_to&#40;Threads&#91;i&#93;))
                  &#123;
                      int score = sp->spLevel * 256 * 256 + sp->slavesCount * 256 - sp->depth * 1;

                      if &#40;score < bestScore&#41;
                      &#123;
                           bestSp = sp;
                           bestThread = i;
                           bestScore = score;
                      &#125;
                  &#125;
              &#125;

              if &#40;bestSp&#41;
              &#123;
                  sp = bestSp;
 
                  // Recheck the conditions under lock protection
                  Threads.mutex.lock&#40;);
                  sp->mutex.lock&#40;);

                  if (   sp->allSlavesSearching
                      && sp->slavesCount < MAX_SLAVES_PER_SPLITPOINT
                      && available_to&#40;Threads&#91;bestThread&#93;))
                  &#123;
                      sp->slavesMask.set&#40;idx&#41;;
                      sp->slavesCount++;
                      activeSplitPoint = sp;
                      searching = true;
                  &#125;

                  sp->mutex.unlock&#40;);
                  Threads.mutex.unlock&#40;);
              &#125;
          &#125;
It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
In simple terms it would seem to have two effects. One, to limit the number of threads at any single split point. I've always done this since I have often used 16 to 32 CPUs even in the days of Cray Blitz. Second, it tries to pick the best split that already exists and do the usual "late join" type of operation to add this thread to that group. Ive not had much luck seeing any improvement for this, myself, so I can't comment on how well part 2 will work for stockfish.

I don't know how they test these changes, but self-play is likely not the best solution for this, this is a matter of efficiency and can be better measured, with less computational effort, buy doing measurements against a set of normal positions. It would also seem this would be more likely to be important for 16 cores and up, but I don't know if they can test under that limit...

I do not know what "spLevel" is, but a split point is weighted by taking that value * 65536 (it might by ply #, I do not know) and then adding 256 * number already helping there and then subtracting the depth of search at that split point. spLevel obviously dominates, when there is a tie in spLevel between two split points, the number of threads already working there breaks the tie. It appears, as a guess, that the split point chosen will be (a) the one at the lowest depth (which makes sense) (b) with tiebreaks going to the one with the fewest threads already there, and (c) with tie breaks there going to the one with the largest remaining search depth.
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Explanation for non-expert?

Post by gladius »

zullil wrote:Can someone explain in general terms the following code from the recent smp patch for SF:

<snip>

It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
I'll repost my comment from the fishcooking forum:

Really nice idea too. It's a great refinement of the late-join idea. Instead of just joining the first split point we find, score all the potential ones, and pick the one with the best score. The scoring metric checks how many recursive splits have been done first, then the number of slaves already on the split point, and finally the depth. So, it will distribute the slaves out more across the tree, and bias towards ones near the root of the tree.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Explanation for non-expert?

Post by mcostalba »

gladius wrote:
zullil wrote:Can someone explain in general terms the following code from the recent smp patch for SF:

<snip>

It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
I'll repost my comment from the fishcooking forum:

Really nice idea too. It's a great refinement of the late-join idea. Instead of just joining the first split point we find, score all the potential ones, and pick the one with the best score. The scoring metric checks how many recursive splits have been done first, then the number of slaves already on the split point, and finally the depth. So, it will distribute the slaves out more across the tree, and bias towards ones near the root of the tree.
It is not clear why it succeeds because the idea of late join was already tested without success, also the idea to join to near the root was already tested without success.

So here the new thing is this score formula that is made so to bias toward ones with small number of split points and only when the number of split point is equal, toward near the root of the tree. But near the root is not the key of the success, because when tested alone it failed.

IMO the key is to bias toward small number of split points (the parameter that Bob did not understand). Of course once we explain to him he will promptly say that he already did it in the '80, so, to avoid this pity, I won't tell him what spLevel means :-)
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: Explanation for non-expert?

Post by Joerg Oster »

mcostalba wrote:
gladius wrote:
zullil wrote:Can someone explain in general terms the following code from the recent smp patch for SF:

<snip>

It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
I'll repost my comment from the fishcooking forum:

Really nice idea too. It's a great refinement of the late-join idea. Instead of just joining the first split point we find, score all the potential ones, and pick the one with the best score. The scoring metric checks how many recursive splits have been done first, then the number of slaves already on the split point, and finally the depth. So, it will distribute the slaves out more across the tree, and bias towards ones near the root of the tree.
It is not clear why it succeeds because the idea of late join was already tested without success, also the idea to join to near the root was already tested without success.

So here the new thing is this score formula that is made so to bias toward ones with small number of split points and only when the number of split point is equal, toward near the root of the tree. But near the root is not the key of the success, because when tested alone it failed.

IMO the key is to bias toward small number of split points (the parameter that Bob did not understand). Of course once we explain to him he will promptly say that he already did it in the '80, so, to avoid this pity, I won't tell him what spLevel means :-)
It seems to me, that now the split points are more evenly distributed over all available threads. And that we allow more splitting than before, about 60 - 70% more splits. Is this true?
Jörg Oster
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Explanation for non-expert?

Post by zullil »

mcostalba wrote:
gladius wrote:
zullil wrote:Can someone explain in general terms the following code from the recent smp patch for SF:

<snip>

It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
I'll repost my comment from the fishcooking forum:

Really nice idea too. It's a great refinement of the late-join idea. Instead of just joining the first split point we find, score all the potential ones, and pick the one with the best score. The scoring metric checks how many recursive splits have been done first, then the number of slaves already on the split point, and finally the depth. So, it will distribute the slaves out more across the tree, and bias towards ones near the root of the tree.
It is not clear why it succeeds because the idea of late join was already tested without success, also the idea to join to near the root was already tested without success.
It succeeded at fast time controls. Not clear to me that it will gain anything at traditional tournament time controls. I understand that testing is difficult. Seems like Bob had a good point, about testing using a set of positions, rather than self-play at fast time controls. Maybe I'll do some crude testing using .epd files. Too bad SF doesn't have a built-in command for epd-testing (as Critter did). Looks like I'll need to find a GUI. :wink:

Thanks for all the responses. Seems like I figured out the basic idea. Still not sure why we're using base 256, but any sufficiently large base works.
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: Explanation for non-expert?

Post by Joerg Oster »

zullil wrote:
mcostalba wrote:
gladius wrote:
zullil wrote:Can someone explain in general terms the following code from the recent smp patch for SF:

<snip>

It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
I'll repost my comment from the fishcooking forum:

Really nice idea too. It's a great refinement of the late-join idea. Instead of just joining the first split point we find, score all the potential ones, and pick the one with the best score. The scoring metric checks how many recursive splits have been done first, then the number of slaves already on the split point, and finally the depth. So, it will distribute the slaves out more across the tree, and bias towards ones near the root of the tree.
It is not clear why it succeeds because the idea of late join was already tested without success, also the idea to join to near the root was already tested without success.
It succeeded at fast time controls. Not clear to me that it will gain anything at traditional tournament time controls. I understand that testing is difficult. Seems like Bob had a good point, about testing using a set of positions, rather than self-play at fast time controls. Maybe I'll do some crude testing using .epd files. Too bad SF doesn't have a built-in command for epd-testing (as Critter did). Looks like I'll need to find a GUI. :wink:

Thanks for all the responses. Seems like I figured out the basic idea. Still not sure why we're using base 256, but any sufficiently large base works.
Not sure how and what you want to test, but bench command accepts external files containing fens, epd files should work fine.

Searching each position up to a certain depth: stockfish bench 128 8 25 filename depth

You can also specify a time limit (in ms) for each position: stockfish bench 128 8 30000 filename time
Jörg Oster
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Explanation for non-expert?

Post by zullil »

Joerg Oster wrote:
zullil wrote:
mcostalba wrote:
gladius wrote:
zullil wrote:Can someone explain in general terms the following code from the recent smp patch for SF:

<snip>

It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
I'll repost my comment from the fishcooking forum:

Really nice idea too. It's a great refinement of the late-join idea. Instead of just joining the first split point we find, score all the potential ones, and pick the one with the best score. The scoring metric checks how many recursive splits have been done first, then the number of slaves already on the split point, and finally the depth. So, it will distribute the slaves out more across the tree, and bias towards ones near the root of the tree.
It is not clear why it succeeds because the idea of late join was already tested without success, also the idea to join to near the root was already tested without success.
It succeeded at fast time controls. Not clear to me that it will gain anything at traditional tournament time controls. I understand that testing is difficult. Seems like Bob had a good point, about testing using a set of positions, rather than self-play at fast time controls. Maybe I'll do some crude testing using .epd files. Too bad SF doesn't have a built-in command for epd-testing (as Critter did). Looks like I'll need to find a GUI. :wink:

Thanks for all the responses. Seems like I figured out the basic idea. Still not sure why we're using base 256, but any sufficiently large base works.
Not sure how and what you want to test, but bench command accepts external files containing fens, epd files should work fine.

Searching each position up to a certain depth: stockfish bench 128 8 25 filename depth

You can also specify a time limit (in ms) for each position: stockfish bench 128 8 30000 filename time
Yeah, I've done that, benching with 16 threads and depth of 35 per position. Didn't see any real decrease in total time after the patch. Hence my wondering if this would help at all for long time control.

Folks have said that the gain comes from finding better moves, by searching the tree more "evenly". So I'm just curious if I can see any gain on epd tests, where finding a specific best move matters, rather than just searching to a specified depth or time.

In any case, I've installed polyglot, which will let me run epdtests from a command line. Now I can play around ...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Explanation for non-expert?

Post by bob »

mcostalba wrote:
gladius wrote:
zullil wrote:Can someone explain in general terms the following code from the recent smp patch for SF:

<snip>

It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
I'll repost my comment from the fishcooking forum:

Really nice idea too. It's a great refinement of the late-join idea. Instead of just joining the first split point we find, score all the potential ones, and pick the one with the best score. The scoring metric checks how many recursive splits have been done first, then the number of slaves already on the split point, and finally the depth. So, it will distribute the slaves out more across the tree, and bias towards ones near the root of the tree.
It is not clear why it succeeds because the idea of late join was already tested without success, also the idea to join to near the root was already tested without success.

So here the new thing is this score formula that is made so to bias toward ones with small number of split points and only when the number of split point is equal, toward near the root of the tree. But near the root is not the key of the success, because when tested alone it failed.

IMO the key is to bias toward small number of split points (the parameter that Bob did not understand). Of course once we explain to him he will promptly say that he already did it in the '80, so, to avoid this pity, I won't tell him what spLevel means :-)
I didn't understand it because spLevel was not in the Stockfish6 code I looked at. That change is not going to be worth 50 Elo or so however. But I will let YOU figure out why. And yes, if you look at my old DTS paper this kind of thing was factored in. There's very little new stuff in normal PVS/YBW type parallel searching.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Explanation for non-expert?

Post by zullil »

zullil wrote:
Joerg Oster wrote:
zullil wrote:
mcostalba wrote:
gladius wrote:
zullil wrote:Can someone explain in general terms the following code from the recent smp patch for SF:

<snip>

It seems that the idea is to order split-points lexicographically based on spLevel, slavesCount and depth, in order to allocate threads "intelligently".

Correction/elaboration welcome. Thanks.
I'll repost my comment from the fishcooking forum:

Really nice idea too. It's a great refinement of the late-join idea. Instead of just joining the first split point we find, score all the potential ones, and pick the one with the best score. The scoring metric checks how many recursive splits have been done first, then the number of slaves already on the split point, and finally the depth. So, it will distribute the slaves out more across the tree, and bias towards ones near the root of the tree.
It is not clear why it succeeds because the idea of late join was already tested without success, also the idea to join to near the root was already tested without success.
It succeeded at fast time controls. Not clear to me that it will gain anything at traditional tournament time controls. I understand that testing is difficult. Seems like Bob had a good point, about testing using a set of positions, rather than self-play at fast time controls. Maybe I'll do some crude testing using .epd files. Too bad SF doesn't have a built-in command for epd-testing (as Critter did). Looks like I'll need to find a GUI. :wink:

Thanks for all the responses. Seems like I figured out the basic idea. Still not sure why we're using base 256, but any sufficiently large base works.
Not sure how and what you want to test, but bench command accepts external files containing fens, epd files should work fine.

Searching each position up to a certain depth: stockfish bench 128 8 25 filename depth

You can also specify a time limit (in ms) for each position: stockfish bench 128 8 30000 filename time
Yeah, I've done that, benching with 16 threads and depth of 35 per position. Didn't see any real decrease in total time after the patch. Hence my wondering if this would help at all for long time control.
Maybe there's a hint of good news here after all. Some data:

Code: Select all

louis@LZsT5610&#58;~/Documents/Chess/Stockfish-f8f5dcbb682830a66a37f68f3c192bbbfc84a33a/src$ ./stockfish bench 16384 16 30 /home/louis/Documents/Chess/benchfile.txt 

===========================
Total time &#40;ms&#41; &#58; 2086034
Nodes searched  &#58; 29958515357
Nodes/second    &#58; 14361470

Code: Select all

louis@LZsT5610&#58;~/Documents/Chess/Stockfish/src$ ./stockfish bench 16384 16 30 /home/louis/Documents/Chess/benchfile.txt 

===========================
Total time &#40;ms&#41; &#58; 1442285
Nodes searched  &#58; 20159134598
Nodes/second    &#58; 13977219
benchfile.txt contains 185 = 5 * 37 positions; it is simply a concatenation of five copies of the standard SF bench positions.

The first result is for Stockfish from just before Joona's patch. The second result includes the patch and nothing more. The unpatched binary used 45% more time to complete the bench test.

16GB hash, 16 threads, depth 30 on a Dell T5610 workstation (dual Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz) running GNU/Linux 3.18.2-031802-generic x86_64

Of course, these runs are non-deterministic, and I have no idea what the variances are in these bench tests.