Iterative DTS

LoopList · Post by **LoopList** » Mon Jul 02, 2007 3:14 pm

I rewrote the main parts of LOOPs recursive search engine. Now LOOP is searching iterative in order to implement DTS. But the first tests have shown that the searchspeed decreased about 10% compared to the recursive version.

Both search engines are producing identical searchtrees, therefore I dont think to discover some bugs.

The following piece of code are the main parts of my first simple search without any heuristics or trans tables and so on...

How can I improve the pure search speed? This piece of code is definitely to slow compared to the recursive version

Code: Select all

////
//// iterative search stack
////

struct stack_c &#123;
  sintx phase;
  sintx check;
  sintx alpha;
  sintx beta;
  sintx value;
  sintx value_best;
  sintx move;
  sintx move_best;
  sintx depth;
  undo_c undo;
  sort_c sort;
&#125;;

////
//// iterative root/tree search
////

void search&#40;sintx depth, sintx ply, sintx check, int phase, int moves&#41; &#123;
  stack_c stack&#91;256&#93;;
  stack_c * current;
  int value;
  int alpha;
  int beta;

  ////
  //// initialize current
  ////

  current=stack;
  current->depth=depth;
  current->phase=phase;
  current->check=check; 
  current->alpha=-VALUE_MATE;
  current->beta=VALUE_MATE;

  ////
  //// start iteration
  ////

  while &#40;1&#41; &#123;
    switch &#40;current->phase&#41; &#123;

      ////
      //// PHASE_ROOT_START_W
      ////

    case PHASE_ROOT_START_W&#58;
      current->phase=PHASE_ROOT_LOOP_W;
      continue;

      ////
      //// PHASE_ROOT_LOOP_W
      ////

    case PHASE_ROOT_LOOP_W&#58;
      
      ////
      //// search
      ////

      current->phase=PHASE_ROOT_UNDO_W;
      depth=current->depth;
      alpha=current->alpha;
      beta=current->beta;
      ply++;
      current++;
      current->check=check;
      current->depth=depth-1;
      current->alpha=-beta;
      current->beta=-alpha;
      current->phase=PHASE_TREE_START_B;
      continue;

      ////
      //// PHASE_ROOT_UNDO_W
      ////

    case PHASE_ROOT_UNDO_W&#58;
      ply--;
      undo_move_w&#40;tree.board, current->undo, current->move&#41;;
      value=current->value;
      current->phase=PHASE_ROOT_LOOP_W;
      continue;

      ////
      //// PHASE_TREE_START_W
      ////

    case PHASE_TREE_START_W&#58;
      if &#40;current->depth <= 0&#41; &#123;

        ////
        //// return value
        ////

        value=search_q_w&#40;)
        current--;
        current->value=-value;
        continue;
      &#125;
      else &#123;

        ////
        //// initialize
        ////

        current->phase=PHASE_TREE_LOOP_W;
        continue;
      &#125;

      ////
      //// PHASE_TREE_LOOP_W
      ////

    case PHASE_TREE_LOOP_W&#58;
      current->move=sort_move_w&#40;current->sort&#41;;

      ////
      //// all moves searched
      ////

      if &#40;current->move == MOVE_NONE&#41; &#123;

        ////
        //// return mate or draw
        ////

        if &#40;current->value_best == VALUE_NONE&#41; &#123;
          if &#40;current->check == 1&#41; &#123;
            value=value_mate_ply&#40;ply&#41;;
            current--;
            current->value=-value;
            continue;
          &#125;
          else &#123;
            current--;
            current->value=-VALUE_DRAW;
            continue;
          &#125;
        &#125;

        ////
        //// return value
        ////

        value=current->value_best;
        current--;
        current->value=-value;
        continue;
      &#125;

      do_move_w&#40;tree.board, current->undo, current->move&#41;;

      ////
      //// search
      ////

      current->phase=PHASE_TREE_UNDO_W;
      depth=current->depth;
      extensions=current->extensions;
      alpha=current->alpha;
      beta=current->beta;
      ply++;
      current++;
      current->check=check;
      current->depth=depth-1;
      current->alpha=-beta;
      current->beta=-alpha;
      current->phase=PHASE_TREE_START_B;
      continue;

      ////
      //// PHASE_TREE_UNDO_W
      ////

    case PHASE_TREE_UNDO_W&#58;
      ply--;
      undo_move_w&#40;tree.board, current->undo, current->move&#41;;
      value=current->value;
      if &#40;value > current->value_best&#41; &#123;
        if &#40;value > current->alpha&#41; &#123;
          if &#40;value >= current->beta&#41; &#123;

            ////
            //// return value
            ////

            current--;
            current->value=-value;
            continue;
          &#125;
          current->alpha=value;
          current->move_best=current->move;
        &#125;
        current->value_best=value;
      &#125;
      current->phase=PHASE_TREE_LOOP_W;
      continue;
    &#125;
  &#125;
&#125;

Dann Corbit · Post by **Dann Corbit** » Mon Jul 02, 2007 7:45 pm

I suggest profiling both versions. That will show you what the problem is for sure.

Aleks Peshkov · Post by **Aleks Peshkov** » Mon Jul 02, 2007 7:56 pm

I do not know what is DTS, but

Code: Select all

      current->phase=PHASE_TREE_START_B;
      continue;

is slow comparing to simple unconditional

Code: Select all

goto PHASE_TREE_START_B;

bob · Post by **bob** » Mon Jul 02, 2007 8:17 pm

LoopList wrote:
I rewrote the main parts of LOOPs recursive search engine. Now LOOP is searching iterative in order to implement DTS. But the first tests have shown that the searchspeed decreased about 10% compared to the recursive version.

Both search engines are producing identical searchtrees, therefore I dont think to discover some bugs.

The following piece of code are the main parts of my first simple search without any heuristics or trans tables and so on...

How can I improve the pure search speed? This piece of code is definitely to slow compared to the recursive version

Code: Select all

////
//// iterative search stack
////

struct stack_c &#123;
  sintx phase;
  sintx check;
  sintx alpha;
  sintx beta;
  sintx value;
  sintx value_best;
  sintx move;
  sintx move_best;
  sintx depth;
  undo_c undo;
  sort_c sort;
&#125;;

////
//// iterative root/tree search
////

void search&#40;sintx depth, sintx ply, sintx check, int phase, int moves&#41; &#123;
  stack_c stack&#91;256&#93;;
  stack_c * current;
  int value;
  int alpha;
  int beta;

  ////
  //// initialize current
  ////

  current=stack;
  current->depth=depth;
  current->phase=phase;
  current->check=check; 
  current->alpha=-VALUE_MATE;
  current->beta=VALUE_MATE;

  ////
  //// start iteration
  ////

  while &#40;1&#41; &#123;
    switch &#40;current->phase&#41; &#123;

      ////
      //// PHASE_ROOT_START_W
      ////

    case PHASE_ROOT_START_W&#58;
      current->phase=PHASE_ROOT_LOOP_W;
      continue;

      ////
      //// PHASE_ROOT_LOOP_W
      ////

    case PHASE_ROOT_LOOP_W&#58;
      
      ////
      //// search
      ////

      current->phase=PHASE_ROOT_UNDO_W;
      depth=current->depth;
      alpha=current->alpha;
      beta=current->beta;
      ply++;
      current++;
      current->check=check;
      current->depth=depth-1;
      current->alpha=-beta;
      current->beta=-alpha;
      current->phase=PHASE_TREE_START_B;
      continue;

      ////
      //// PHASE_ROOT_UNDO_W
      ////

    case PHASE_ROOT_UNDO_W&#58;
      ply--;
      undo_move_w&#40;tree.board, current->undo, current->move&#41;;
      value=current->value;
      current->phase=PHASE_ROOT_LOOP_W;
      continue;

      ////
      //// PHASE_TREE_START_W
      ////

    case PHASE_TREE_START_W&#58;
      if &#40;current->depth <= 0&#41; &#123;

        ////
        //// return value
        ////

        value=search_q_w&#40;)
        current--;
        current->value=-value;
        continue;
      &#125;
      else &#123;

        ////
        //// initialize
        ////

        current->phase=PHASE_TREE_LOOP_W;
        continue;
      &#125;

      ////
      //// PHASE_TREE_LOOP_W
      ////

    case PHASE_TREE_LOOP_W&#58;
      current->move=sort_move_w&#40;current->sort&#41;;

      ////
      //// all moves searched
      ////

      if &#40;current->move == MOVE_NONE&#41; &#123;

        ////
        //// return mate or draw
        ////

        if &#40;current->value_best == VALUE_NONE&#41; &#123;
          if &#40;current->check == 1&#41; &#123;
            value=value_mate_ply&#40;ply&#41;;
            current--;
            current->value=-value;
            continue;
          &#125;
          else &#123;
            current--;
            current->value=-VALUE_DRAW;
            continue;
          &#125;
        &#125;

        ////
        //// return value
        ////

        value=current->value_best;
        current--;
        current->value=-value;
        continue;
      &#125;

      do_move_w&#40;tree.board, current->undo, current->move&#41;;

      ////
      //// search
      ////

      current->phase=PHASE_TREE_UNDO_W;
      depth=current->depth;
      extensions=current->extensions;
      alpha=current->alpha;
      beta=current->beta;
      ply++;
      current++;
      current->check=check;
      current->depth=depth-1;
      current->alpha=-beta;
      current->beta=-alpha;
      current->phase=PHASE_TREE_START_B;
      continue;

      ////
      //// PHASE_TREE_UNDO_W
      ////

    case PHASE_TREE_UNDO_W&#58;
      ply--;
      undo_move_w&#40;tree.board, current->undo, current->move&#41;;
      value=current->value;
      if &#40;value > current->value_best&#41; &#123;
        if &#40;value > current->alpha&#41; &#123;
          if &#40;value >= current->beta&#41; &#123;

            ////
            //// return value
            ////

            current--;
            current->value=-value;
            continue;
          &#125;
          current->alpha=value;
          current->move_best=current->move;
        &#125;
        current->value_best=value;
      &#125;
      current->phase=PHASE_TREE_LOOP_W;
      continue;
    &#125;
  &#125;
&#125;

This is most likely a cache issue. To explain:

In the recursive implementation, a block of data for local variables are allocated on the stack, and all local variables for one ply are close together. Which means that when you reference one, you suck a lot of them into a cache line at the same time, which makes referencing those pre-fetched values extremely cheap. And the probability is that you will actually use those local values at about the same time.

In the iterative version, you most likely have separate arrays for each local variable, all indexed by a "ply" variable or something similar. This means that a reference to any single local variable value sucks in several words from that array, but they are each for a different ply and are not needed.

The way to correct this is to create a structure of your local data values, and then make an array of that structure type so that once again all local data for a single ply are close together in menory and will benefit from cache much better...

Dan Andersson · Post by **Dan Andersson** » Mon Jul 02, 2007 8:29 pm

Along with the potential issues already suggested there are known bottlenecks to the 'giant switch statement' technique. What problems you face depends on what optimizations your compiler makes.
Does it pre-compute a lookup table?
Does it go through the switch statement or does optimize to a goto?
Would rewriting with conditionals get better branch prediction?

There is a lot of research available from online sources. Anton Ertl has a lot of papers on efficient VM interpreters.

MvH Dan Andersson

bob · Post by **bob** » Mon Jul 02, 2007 9:46 pm

That I don't follow. What "giant switch table"??? I didn't have one in Cray Blitz at all so I am not sure what you are talking about here...

Dan Andersson · Post by **Dan Andersson** » Mon Jul 02, 2007 9:56 pm

I'm talking about the specific example code in the root post. It's a not exactly 'giant' switch but still a switch statement implementing a state machine.

MvH Dan Andersson

bob · Post by **bob** » Mon Jul 02, 2007 11:02 pm

Dan Andersson wrote:I'm talking about the specific example code in the root post. It's a not exactly 'giant' switch but still a switch statement implementing a state machine.

MvH Dan Andersson

OK. I didn't look at the code, it was too long to scroll through. I commented only on how I have seen "iterative searches" implemented in the past, which usually wrecks performance with respect to cache.

LoopList · Post by **LoopList** » Tue Jul 03, 2007 9:00 am

I dont really understand?! I created a structure of local data values which belongs together. I managed this in an array and reference to every new ply (stack level) via pointer. Whats wrong (cache unfriendly) with this construction?

Please see below:

Code: Select all

//// 
//// iterative search stack 
//// 

struct stack_c &#123; 
  sintx phase; 
  sintx check; 
  sintx alpha; 
  sintx beta; 
  sintx value; 
  sintx value_best; 
  sintx move; 
  sintx move_best; 
  sintx depth; 
  undo_c undo; 
  sort_c sort; 
&#125;; 

...
stack_c stack&#91;256&#93;; 
stack_c * current; 
...

hgm · Post by **hgm** » Tue Jul 03, 2007 9:58 am

This is OK, Bob simply did not read your code, so you should ignore his comments.

It could be that your code is slightly less efficient because every 'local' variable has to be accessed through the pointer. Even if the compiler keeps the latter in a register permanently, it still means you have one less register available for expression evaluation.

It could also be that you are just unlucky, and that the several independent stacks grow in such a way that at the critical depth thy collide in the cache, or collide wih important global data structures. This is especially a high risk in machines with a small number of cache ways, such as AMD CPUs. Better merge all your stacks into a single one, to avoid this, and make sure the system stack does not collide with the globals (e.g Zobrist table).

Iterative DTS

Iterative DTS

Re: Iterative DTS

Re: Iterative DTS

Re: Iterative DTS

Re: Iterative DTS

Re: Iterative DTS

Re: Iterative DTS

Re: Iterative DTS

Re: Iterative DTS

Re: Iterative DTS