DTS Structure

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Onno Garms
Posts: 224
Joined: Mon Mar 12, 2007 7:31 pm
Location: Bonn, Germany

Re: DTS Structure

Post by Onno Garms »

Edmund wrote:A main difference moving to the Dynamic Tree Splitting Algorithm is that the whole search has to be laid out iterative instead of recursive. Now my question is, how can this be achieved best?
Iterative search is also useful for YBW. Also, it helps profiling because you don't have cycles in the call tree.

I personally don't like the macro based solutions for iterative search from other programs.

EDIT: because I dislike macros in general and try to minimize their use. I also dislike a huge case switch. The latter was fine for qsearch, but as a case switch cannot jump into the middle of loops, it requires ugly adaptions of the code.

I thought quite a long time, how to write a readable iterative search. I found a macro free solution. The price are - gotos. I hadn't used goto since my youth with C64 BASIC (not counting jumping out of nested loops) but for an iterative search I found gotos really useful.

Search looks like this:

Code: Select all


namespace Label
{
  enum _ {after_scout, after_research, ...};
}

class Node
{
  int alpha, beta;
  Move move;
  MoveList moves;
  int value;
  Label::_ ret_addr;
};  

search ()
{
  Node* node = &d_nodes[0];
  Node* child = node+1;

recurse:
  node->moves.init();
  while ((node->move = node->moves.next()))
  {
    child->alpha = ...
    child->beta = ...
    child->ret_addr = Label::after_scout;
    ++node;
    ++child;
    goto recurse;
    // when we return here, node and child have their original values
    // and child is evaluated    
after_scout:
    if &#40;child->value < -node->alpha&#41;
    &#123;
      child->alpha = ...
      child->beta = ...
      child->ret_addr = Label&#58;&#58;after_research;
      ++node
      ++child;
      goto recurse;
    after_research&#58;
      ...
    &#125;
    if &#40;cutoff&#41;
    &#123;
      node->value = ...
      goto node_done;
    &#125;
  &#125;
node_done&#58;
  --node;
  --child;
  switch &#40;node->ret_addr&#41;
  &#123;
    case Label after_research&#58;
      goto after_research;
    ...
  &#125;
&#125;
Using gcc enhancements of C++ or using inline assembly in msvc, it is possible to store the address of the lables rather then the enum. This saves the switch at the end, one can directly jump to the label. However I found the inline assembly based solution in msvc slower then with the case switch. Also I could not find a way to port to 64 bit in msvc. So I decided to stay in the C++ standard and have the case switch.
User avatar
Onno Garms
Posts: 224
Joined: Mon Mar 12, 2007 7:31 pm
Location: Bonn, Germany

Re: DTS Structure

Post by Onno Garms »

Code: Select all

switch &#40;child->ret_addr&#41;
Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

Re: DTS Structure

Post by Edmund »

Onno Garms wrote:...
Using gcc enhancements of C++ or using inline assembly in msvc, it is possible to store the address of the lables rather then the enum. This saves the switch at the end, one can directly jump to the label. However I found the inline assembly based solution in msvc slower then with the case switch. Also I could not find a way to port to 64 bit in msvc. So I decided to stay in the C++ standard and have the case switch.
Thats about the algorithm I was thinking of too. I just wonder how the inline assembly should be slower than the additional indirection.

I was thinking of something like:

first a jumptable is initialized holding the addresses of all labels

Code: Select all

	// init Jumptable

	int jmptable&#91;64&#93;;

	#define ADD_JUMPER&#40;jmp&#41; 		 \
		mov edx, DWORD PTR jmp		\
		mov &#91;esi&#93;, edx				  \
		add esi, 4
	
	__asm
	&#123;
		lea esi, jmptable			;Load address of the jmptable into esi
		
		ADD_JUMPER&#40;PV&#41;
		ADD_JUMPER&#40;ZW&#41;
		ADD_JUMPER&#40;QS&#41;
&#91;...&#93;
	&#125;
	
	#undef ADD_JUMPER

&#91;...&#93;
then to jump to a specific label I would have used a macro with an inlined assembly statement.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: DTS Structure

Post by Dann Corbit »

Edmund wrote:
Onno Garms wrote:...
Using gcc enhancements of C++ or using inline assembly in msvc, it is possible to store the address of the lables rather then the enum. This saves the switch at the end, one can directly jump to the label. However I found the inline assembly based solution in msvc slower then with the case switch. Also I could not find a way to port to 64 bit in msvc. So I decided to stay in the C++ standard and have the case switch.
Thats about the algorithm I was thinking of too. I just wonder how the inline assembly should be slower than the additional indirection.

I was thinking of something like:

first a jumptable is initialized holding the addresses of all labels

Code: Select all

	// init Jumptable

	int jmptable&#91;64&#93;;

	#define ADD_JUMPER&#40;jmp&#41; 		 \
		mov edx, DWORD PTR jmp		\
		mov &#91;esi&#93;, edx				  \
		add esi, 4
	
	__asm
	&#123;
		lea esi, jmptable			;Load address of the jmptable into esi
		
		ADD_JUMPER&#40;PV&#41;
		ADD_JUMPER&#40;ZW&#41;
		ADD_JUMPER&#40;QS&#41;
&#91;...&#93;
	&#125;
	
	#undef ADD_JUMPER

&#91;...&#93;
then to jump to a specific label I would have used a macro with an inlined assembly statement.
I sometimes use array(s) of function pointers.

E.g. something like this:
genmoves[piecetype][color](rank, file);

where genmoves is a 6x2 array of function pointers.

It removes all the goto branches (if you don't count calls and returns).
That approach used to be a *lot* faster than a switch if it were in a critical path, but lately the difference has pretty well vanished.

Here is a benchmark program to test your compiler for pointer, array, and switch invocation (a profiler will tell you which approach is fastest):

Code: Select all

#include <math.h>
#include <stdio.h>

typedef double  (*f_t&#41; &#40;double&#41;;
static f_t      f&#91;&#93; = &#123;log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh, 0&#125;;

static double   accum0 = 0;
static double   accum1 = 0;
static double   accum2 = 0;


void            arr&#40;void&#41;
&#123;
    int             i;
    double          d = 0;
    for &#40;i = 0; f&#91;i&#93;; i++) &#123;
        d += f&#91;i&#93; &#40;0.5&#41;;
    &#125;
    accum0 += d;
&#125;

void            poi&#40;void&#41;
&#123;
    f_t            *flist = f;
    double          d = 0;
    while (*flist&#41; &#123;
        f_t             ff = *flist;
        d += ff&#40;0.5&#41;;
        flist++;
    &#125;
    accum1 += d;
&#125;

void            swi&#40;void&#41;
&#123;
    int             i;
    double          d = 0;
    for &#40;i = 0; f&#91;i&#93;; i++) &#123;
        switch &#40;i&#41; &#123;
        case 0&#58;
            d += f&#91;0&#93; &#40;0.5&#41;;
            break;
        case 1&#58;
            d += f&#91;1&#93; &#40;0.5&#41;;
            break;
        case 2&#58;
            d += f&#91;2&#93; &#40;0.5&#41;;
            break;
        case 3&#58;
            d += f&#91;3&#93; &#40;0.5&#41;;
            break;
        case 4&#58;
            d += f&#91;4&#93; &#40;0.5&#41;;
            break;
        case 5&#58;
            d += f&#91;5&#93; &#40;0.5&#41;;
            break;
        case 6&#58;
            d += f&#91;6&#93; &#40;0.5&#41;;
            break;
        case 7&#58;
            d += f&#91;7&#93; &#40;0.5&#41;;
            break;
        case 8&#58;
            d += f&#91;8&#93; &#40;0.5&#41;;
            break;
        case 9&#58;
            d += f&#91;9&#93; &#40;0.5&#41;;
            break;
        default&#58;
            break;
        &#125;
    &#125;
    accum2 += d;
&#125;

int             main&#40;void&#41;
&#123;
    long            i;
    for &#40;i = 0; i < 1000000; i++) &#123;
        arr&#40;);
        poi&#40;);
        swi&#40;);
    &#125;
    printf&#40;"%.20g, %.20g, %.20g\n", accum0, accum1, accum2&#41;;
    return 0;
&#125;

User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: DTS Structure

Post by Zach Wegner »

Onno Garms wrote:
Edmund wrote:A main difference moving to the Dynamic Tree Splitting Algorithm is that the whole search has to be laid out iterative instead of recursive. Now my question is, how can this be achieved best?
Iterative search is also useful for YBW. Also, it helps profiling because you don't have cycles in the call tree.

I personally don't like the macro based solutions for iterative search from other programs.

EDIT: because I dislike macros in general and try to minimize their use. I also dislike a huge case switch. The latter was fine for qsearch, but as a case switch cannot jump into the middle of loops, it requires ugly adaptions of the code.

I thought quite a long time, how to write a readable iterative search. I found a macro free solution. The price are - gotos. I hadn't used goto since my youth with C64 BASIC (not counting jumping out of nested loops) but for an iterative search I found gotos really useful.

Search looks like this:

Code: Select all


namespace Label
&#123;
  enum _ &#123;after_scout, after_research, ...&#125;;
&#125;

class Node
&#123;
  int alpha, beta;
  Move move;
  MoveList moves;
  int value;
  Label&#58;&#58;_ ret_addr;
&#125;;  

search ()
&#123;
  Node* node = &d_nodes&#91;0&#93;;
  Node* child = node+1;

recurse&#58;
  node->moves.init&#40;);
  while (&#40;node->move = node->moves.next&#40;)))
  &#123;
    child->alpha = ...
    child->beta = ...
    child->ret_addr = Label&#58;&#58;after_scout;
    ++node;
    ++child;
    goto recurse;
    // when we return here, node and child have their original values
    // and child is evaluated    
after_scout&#58;
    if &#40;child->value < -node->alpha&#41;
    &#123;
      child->alpha = ...
      child->beta = ...
      child->ret_addr = Label&#58;&#58;after_research;
      ++node
      ++child;
      goto recurse;
    after_research&#58;
      ...
    &#125;
    if &#40;cutoff&#41;
    &#123;
      node->value = ...
      goto node_done;
    &#125;
  &#125;
node_done&#58;
  --node;
  --child;
  switch &#40;node->ret_addr&#41;
  &#123;
    case Label after_research&#58;
      goto after_research;
    ...
  &#125;
&#125;
Using gcc enhancements of C++ or using inline assembly in msvc, it is possible to store the address of the lables rather then the enum. This saves the switch at the end, one can directly jump to the label. However I found the inline assembly based solution in msvc slower then with the case switch. Also I could not find a way to port to 64 bit in msvc. So I decided to stay in the C++ standard and have the case switch.
That's quite close to the way it used to be in ZCT (apparently before the first public release). I had a big switch at the top, each statement just a goto (or a return in one case). Having it at the top is also marginally more flexible, you can call qsearch directly, and for when children join a node at a split point, they can go straight into the move loop.

Frankly going that far to avoid macros just makes the code ugly. At the very least you can wrap all the child-> stuff into a function, but you still have a goto and a label sitting around.

And of course switches can jump into the middle of a loop. Why couldn't they? :)
CThinker
Posts: 388
Joined: Wed Mar 08, 2006 10:08 pm

Re: DTS Structure

Post by CThinker »

My understanding is that DTS is really not that different from YBW, except that instead of "waiting", a thread is made available to search other candidate sub-trees. So, the common recursive search can be easily made to do DTS. Crafty, I believe, is a good example of this.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: DTS Structure

Post by bob »

CThinker wrote:My understanding is that DTS is really not that different from YBW, except that instead of "waiting", a thread is made available to search other candidate sub-trees. So, the common recursive search can be easily made to do DTS. Crafty, I believe, is a good example of this.
Not so easy. The problem is that the DTS algorithm is based on choosing a split point from all available nodes on _all_ other threads. That is, you choose the most beneficial split point. As an example, in Crafty (as in any YBW algorithm) you will split at a node after one node has been searched without causing a fail-high. In Crafty, about 92% of nodes fail high on the first move, if they fail high. This means that if this is a CUT node, 92% of the time we won't split here since we fail high on the first branch. Unfortunately, 8% of the time we will fail high on the second (or later) move, and doing a parallel split there does unnecessary work.

With DTS, in Cray Blitz, I circumvented this to an extent because rather than having to split at the current node of the current thread that has satisfied YBW, I could look at all nodes on all threads, find the ones that had satisfied YBW (or even better, ones that had searched multiple moves which really reduces the odds of doing a split at a CUT node).

The problem is that recursive search means you can split "here" or not at all, because of the call stack. With DTS there is no call stack, so you can split at a node that is 10 plies back in the tree just as easily as you can split "here". That's the main difference. The stuff about processors not waiting is pretty comparable between Crafty and DTS. As are the data structures and such. But the ability to see all nodes for all threads when a CPU goes idle, rather than having non-idle CPUs ask idle ones to "join in" is a significant efficiency enhancement...
User avatar
Onno Garms
Posts: 224
Joined: Mon Mar 12, 2007 7:31 pm
Location: Bonn, Germany

Re: DTS Structure

Post by Onno Garms »

bob wrote: The problem is that recursive search means you can split "here" or not at all, because of the call stack.
ACK, but that is a problem for both DTS and YBWC.
With DTS there is no call stack, so you can split at a node that is 10 plies back in the tree just as easily as you can split "here".
You mean "iterative search" rather than "DTS"?
That's the main difference.
Isnt' the main difference that in DTS the child that finishes last at a split point takes over the parent? This means, we move nodes to another thread. For this reason, an iterative search is essential for DTS (because we cannot move callstacks of recursive search). For YBW an iterative search is just a benefit on top (allowing to split anywhere in the tree).
As are the data structures and such. But the ability to see all nodes for all threads when a CPU goes idle, rather than having non-idle CPUs ask idle ones to "join in" is a significant efficiency enhancement...
ACK
Edmund
Posts: 670
Joined: Mon Dec 03, 2007 3:01 pm
Location: Barcelona, Spain

Re: DTS Structure

Post by Edmund »

CThinker wrote:My understanding is that DTS is really not that different from YBW, except that instead of "waiting", a thread is made available to search other candidate sub-trees. So, the common recursive search can be easily made to do DTS. Crafty, I believe, is a good example of this.
Additionally to Bob's statement that DTS is more flexible in the choice of split points I want to add that DTS also has more flexibility regarding node ownership:

YBW:
Slave takes work on splitpoint (SP)
Master returns to SP, only needs information from Slave
Master takes on other task
Slave finishes search
Master finishes other task
Master propagates the values

DTS:
Slave takes work on splitpoint (SP)
Master returns to SP, only needs information from Slave
Master takes on other task
Slave finishes search
Slave becomes the new owner of the node and propagates the values

This difference is critical if the information gained from propagating the values as soon as they were available would have saved some other threads work.
User avatar
Onno Garms
Posts: 224
Joined: Mon Mar 12, 2007 7:31 pm
Location: Bonn, Germany

Re: DTS Structure

Post by Onno Garms »

Zach Wegner wrote: That's quite close to the way it used to be in ZCT (apparently before the first public release).
I think it is more similar to what Edmund wants to implement. The only difference to Eduard's approach is that I do not store the addresses of the jump labels but have a case switch at the the end to translate enums into jumps.