An argument against direct recursion

sje · Post by **sje** » Thu Nov 24, 2011 2:43 am

An argument against direct recursion in a chess tree search:

I'd guess that nearly every modern A/B search routine uses direct recursion to call itself from ply to ply. This seems to be the natural way to do things, but is it the best?

In the Old Days, programming languages and their associated calling sequences did not support direct recursion. Chess program authors either had to code in assembly language with custom calling sequences or write non-recursive versions of a search; there was no other choice (outside of Lisp). Later, with the advent of new languages like C, Pascal, C++, and the like, direct recursion became widely supported and hand-made alternatives were dropped.

But consider some advantages of a non direct recursive approach:

1) There is zero danger of a stack overflow.

2) As there are no local (i.e., automatic) variables, each thread in a multithreaded search needs less stack space.

3) With no formal parameters or a return result, there can be less run time overhead. Also, less storage.

4) The programmer is forced to think more carefully during the design phase.

5) When a crash occurs, all the state variables can be read from a ply indexed vector of values; there is nothing hidden because there is nothing stored in local variables, formal parameters, or return results.

6) Less L1 cache thrashing as the search routine doesn't share stack space with any routines it might call.

7) Exception processing is simplified; a search can return from twenty ply deep just by changing a state variable. And none of the setjmp/longjmp bull.

8) You could write a chess program in Cobol. (Just joking!)

bob · Post by **bob** » Thu Nov 24, 2011 4:23 am

sje wrote:An argument against direct recursion in a chess tree search:

I'd guess that nearly every modern A/B search routine uses direct recursion to call itself from ply to ply. This seems to be the natural way to do things, but is it the best?

In the Old Days, programming languages and their associated calling sequences did not support direct recursion. Chess program authors either had to code in assembly language with custom calling sequences or write non-recursive versions of a search; there was no other choice (outside of Lisp). Later, with the advent of new languages like C, Pascal, C++, and the like, direct recursion became widely supported and hand-made alternatives were dropped.

But consider some advantages of a non direct recursive approach:

1) There is zero danger of a stack overflow.

2) As there are no local (i.e., automatic) variables, each thread in a multithreaded search needs less stack space.

3) With no formal parameters or a return result, there can be less run time overhead. Also, less storage.

4) The programmer is forced to think more carefully during the design phase.

5) When a crash occurs, all the state variables can be read from a ply indexed vector of values; there is nothing hidden because there is nothing stored in local variables, formal parameters, or return results.

6) Less L1 cache thrashing as the search routine doesn't share stack space with any routines it might call.

7) Exception processing is simplified; a search can return from twenty ply deep just by changing a state variable. And none of the setjmp/longjmp bull.

8) You could write a chess program in Cobol. (Just joking!)

I've done both, so a couple of perspectives...

(1) I would never use setjmp()/longjmp() ANYWAY. Sloppy to say the least. But for recursion, it has the nice characteristic that it makes the search very easy to understand. And it makes things like "internal iterative deepening" and such very easy to implement.

(2) an iterated search (Cray Blitz) is certainly a clean way to do things, and for a parallel implementation, one can go to a full-blown DTS design, or a "young brothers wait, but at ANY ply" approach. There's a gain there.

(3) I do not notice any speed difference. One needs local data per ply, so you end up using (hopefully) an array of structures so that each ply's local data is close together for cache efficiency. Whether the data is on the stack or in an array seems to be six of one, half-dozen of the other. The stack is slightly more efficient to use since ebp points to the local data, where for an array you end up using another register for the pointer to that data.

(4) Procedure calls are not particularly fast (or slow) but the iterated search has none, so that is a small win.

Having done both, I like the recursive negamax approach. It is simpler, needs less debugging, and you don't have those "ply=N changes data for ply=M type bugs, as that can't happen in a recursive search.

sje · Post by **sje** » Thu Nov 24, 2011 6:10 am

If there's a problem with an iterative search (vs direct recursive calling), then it's the possibility that the resulting code is not quite as easy to read. An example: the old Control Data mainframes did not support recursive calls, so CDC Fortran didn't, so the Northwestern Chess 3.x program didn't, and I believe this carried over to the assembly language Chess 4.x and from there to the Pascal Chess 0.5 (even though Pascal supports recursive calls). Take a look at the Pascal source for Chess 0.5 and you'll see a bunch of labels and goto statements, particularly in the non-recursive search. And that's too bad, because it didn't have to be that way; a recursive version would have been better, as would have been a better structured iterative version. But I suspect that Chess 0.5 is the way it is because so much of it is adapted straight from Chess 4.x.

Concerning IID, I think that can be handled without too much trouble just by adding a few states to the state variable type that controls the main case statement. Maybe: IIDstart, IIDpick, IIDdown, IIDup, and IIDdone.

Concerning inter-ply bug magnets, I avoid these by having a very simple inter-ply data flow:

1) Just before starting (ply + 1), the (ply + 1) frame is given its input window (based upon the current window, reversed and downshifted).

2) Just after finishing (ply + 1), the earlier ply recovers the (ply + 1) updated window alpha value (upshifted) and, if needed, the (ply + 1) PV.

3) That's it.

Concerning the need for an extra address register in addition to the routine activation frame base register, yes, that can be true. On the other hand, without having to make a recursive call for each node, there's a gain in not having to save/restore all the registers each time through. There could even be a chance that the optimizer can do a better job when it knows it can preserve register content knowledge.

I'm currently converting the demonstration checkmate search in BozoChess from direct recursion to iteration with explicit stack storage. I've already cut out all the local variables, formal parameters, and the return result; now I have to add the the big case statement and the state variable which controls it. When I've got that working, I'll post it so others may judge my approach.

Aleks Peshkov · Post by **Aleks Peshkov** » Thu Nov 24, 2011 6:34 am

A several years ago I have written a small recursive alpha-beta procedure that GNU C++ compiler optimized into completely inline non recursive code. When I added a few features and refactorings to the program this recursion unrolling opimization disappeared.

sje · Post by **sje** » Thu Nov 24, 2011 9:03 am

A sample non-recursive mate finder (from BozoChess):

Code: Select all

  procedure CpcFindMate&#40;var cpc&#58; cpctype; fullmovecount&#58; Integer&#41;;
    var
      ply&#58; plytype;
      depth&#58; Integer;

    procedure MateNode;

      procedure ApplyKillerBonus;
        var
          index&#58; Integer;
      begin
        with cpc, ssc, pirvec&#91;ply&#93; do
          begin
            index &#58;= GmsLocateMove&#40;gms, killers.k0move&#41;;
            if index >= 0 then
              Inc&#40;gms.moves&#91;index&#93;.sv, 16&#41;;
            index &#58;= GmsLocateMove&#40;gms, killers.k1move&#41;;
            if index >= 0 then
              Inc&#40;gms.moves&#91;index&#93;.sv, 8&#41;
          end
      end; &#123; ApplyKillerBonus &#125;
  
    begin
      with cpc, ssc  do
        while pirvec&#91;ply&#93;.nss <> nssexit do
          case pirvec&#91;ply&#93;.nss of

            nssplystart&#58;
              with pirvec&#91;ply&#93; do
                begin
                  &#123;CpcTraceCV&#40;cpc&#41;;&#125;
                  Inc&#40;nodecount&#41;;
                  PirClear&#40;pirvec&#91;ply&#93;);
                  nss &#58;= nsstermtest
                end;
          
            nsstermtest&#58;
              with pirvec&#91;ply&#93; do
                if depth = 0 then
                  begin
                    if PosIsCheckmate&#40;currpos&#41; then
                      window.alfa &#58;= svlosein0
                    else
                      window.alfa &#58;= sveven;
                    nss &#58;= nssplyfinish
                  end
                else
                  nss &#58;= nssgenerate;
          
            nssgenerate&#58;
              with pirvec&#91;ply&#93; do
                begin
                  if depth = 1 then
                    PosGenOnlyChecks&#40;currpos, gms&#41;
                  else
                    begin
                      PosGenerate&#40;currpos, gms&#41;;
                      PosApplyOrderAntiMobility&#40;currpos, gms&#41;;
                      ApplyKillerBonus;
                      GmsSortByScore&#40;gms&#41;
                    end;
                  nss &#58;= nssmovepick
                end;
          
            nssmovepick&#58;
              with pirvec&#91;ply&#93; do
                if &#40;moveindex = gms.movecount&#41; or WindowIsClosed&#40;window&#41; or
                    &#40;ispzmover and IsSvMating&#40;window.alfa&#41;) or
                    (&#40;not ispzmover&#41; and &#40;not IsSvLosing&#40;window.alfa&#41;)) then
                  nss &#58;= nssendscan
                else
                  begin
                    srchmove &#58;= gms.moves&#91;moveindex&#93;;
                    nss &#58;= nssexecute
                  end;
          
            nssexecute&#58;
              begin
                with pirvec&#91;ply&#93; do
                  begin
                    VariationAppendMove&#40;currvar, srchmove&#41;;
                    WindowShiftDn&#40;window, pirvec&#91;ply + 1&#93;.window&#41;;
                    PosExecute&#40;currpos, srchmove&#41;;
                  end;
                Inc&#40;ply&#41;;
                Dec&#40;depth&#41;;
                pirvec&#91;ply&#93;.nss &#58;= nssplystart
              end;
          
            nssretract&#58;
              begin
                Inc&#40;depth&#41;;
                Dec&#40;ply&#41;;
                with pirvec&#91;ply&#93; do
                  begin
                    PosRetract&#40;currpos&#41;;
                    srchsv &#58;= CalcSvUp&#40;pirvec&#91;ply + 1&#93;.window.alfa&#41;;
                    VariationRecycleTail&#40;currvar&#41;;
                    if srchsv > window.alfa then
                      begin
                        bestmove &#58;= srchmove;
                        window.alfa &#58;= srchsv;
                        if srchsv < window.beta then
                          begin
                            VariationRecycle&#40;pv&#41;;
                            VariationAppendMove&#40;pv, srchmove&#41;;
                            VariationAppend&#40;pv, pirvec&#91;ply + 1&#93;.pv&#41;
                          end
                      end;
                    Inc&#40;moveindex&#41;;
                    nss &#58;= nssmovepick
                  end
              end;
          
            nssendscan&#58;
              with pirvec&#91;ply&#93; do
                begin
                  if not MoveFlagTest&#40;bestmove, mfvoid&#41; then
                    KillersAddMove&#40;killers, bestmove&#41;;
                  nss &#58;= nssplyfinish
                end;
          
            nssplyfinish&#58;
              with pirvec&#91;ply&#93; do
                if ply > 0 then
                  nss &#58;= nssretract
                else
                  nss &#58;= nssexit;

          end &#123; case &#125;
    end; &#123; MateNode &#125;

  begin
    with cpc, ssc do
      begin
        ply &#58;= 0;
        depth &#58;= &#40;fullmovecount * 2&#41; - 1;

        pirvec&#91;ply&#93;.nss &#58;= nssplystart;
        pirvec&#91;ply&#93;.window &#58;= rootwindow;
        MateNode;
        predsv &#58;= pirvec&#91;ply&#93;.window.alfa;
        VariationAppend&#40;predvar, pirvec&#91;ply&#93;.pv&#41;;

        TimerStop&#40;ssctimer&#41;;
        usedusec &#58;= TimerCurrent&#40;ssctimer&#41;;
        if IsSvMating&#40;predsv&#41; then
          begin
            VariationNotate&#40;predvar, rootpos&#41;;
            st &#58;= stforcedmate
          end
        else
          begin
            VariationRecycle&#40;predvar&#41;;
            st &#58;= stlimitdepth
          end
      end
  end; &#123; CpcFindMate &#125;

Edmund · Post by **Edmund** » Thu Nov 24, 2011 9:56 am

I switched Glass search recently to Iterative. It will be in place in the soon to be released version 1.8 (not that you would notice).

I cannot state any speed gain at all through the changes. I even had some trouble in the beginning because the code turned out slower. I needed to get the tree-stack properly aligned to 64byte, otherwise it would be like 5-10% slower. After all I really like this style for debugging, as at any point in search you have access to all the variables from all the plies in the tree.

In the past I have read that profilers would be more effective for iterative over recursive code. I cannot support that claim - no messurable difference.

sje · Post by **sje** » Thu Nov 24, 2011 10:50 am

Edmund wrote:After all I really like this style for debugging, as at any point in search you have access to all the variables from all the plies in the tree.

In the past I have read that profilers would be more effective for iterative over recursive code. I cannot support that claim - no measurable difference.

I don't see any speed difference, although such might appear with a more complicated general search.

In the above scheme, there are several states that could be merged, but I chose not to do this as it would make the code more complicated.

The only really tricky parts are the two places where the ply changes as the store/load of the state variable depends on this, and everything else depends on the state variable.

There are only three lines where the code references data outside the current ply, and these can be quickly found as each has a "pirvec[ply + 1]" selector vs a "pirvec[ply]" selector.

The kick-off code in the main routine should be simplified further:

Code: Select all

        with pirvec&#91;ply&#93; do
          begin
            nss &#58;= nssplystart;
            window &#58;= rootwindow;
            MateNode;
            predsv &#58;= window.alfa;
            VariationAppend&#40;predvar, pv&#41;
          end;

sje · Post by **sje** » Thu Nov 24, 2011 11:00 am

Stuff that lives inside a ply indexed record:

Code: Select all

    &#123; Ply indexed record &#125;

    pirtype =
      record
        isplyzero&#58; Boolean;       &#123; Constant ply zero flag; set only for the first ply &#125;
        islastply&#58; Boolean;       &#123; Constant last ply flag; set only for the last ply &#125;
        ispzmover&#58; Boolean;       &#123; Constant ply zero mover flag; set for even ply &#125;
        nss&#58;       nsstype;       &#123; Node search state &#125;
        window&#58;    windowtype;    &#123; Input window; score result returned as alfa value &#125;
        pv&#58;        variationtype; &#123; Predicted variation result &#125;
        moveindex&#58; Integer;       &#123; Index into move generation vector &#125;
        killers&#58;   killerstype;   &#123; Killer move data &#125;
        bestmove&#58;  movetype;      &#123; Best move &#125;
        srchmove&#58;  movetype;      &#123; Move currently searched &#125;
        srchsv&#58;    svtype;        &#123; Returned score for the searched move &#125;
        gms&#58;       gmstype        &#123; Moves &#125;
      end;

sje · Post by **sje** » Thu Nov 24, 2011 11:30 am

Another thing in the iterative scheme: continuation support. See the wiki article http://en.wikipedia.org/wiki/Continuation

Edmund · Post by **Edmund** » Thu Nov 24, 2011 12:56 pm

sje wrote:Another thing in the iterative scheme: continuation support. See the wiki article http://en.wikipedia.org/wiki/Continuation

We have to distinguish here ...
1) there are the winboard command pause/resume (I dont know whether any interface or engine has them actually implemented). Upon receiving the pause command the engine just enters a loop with the only break being when the engine receives resume as an input. You dont need iterative search for that.

2) the engine stores it current search-state on disk and exits and the next time it loads it is able to continue its search from where it left before. This is a more extreme form of persistant hash, where not only position scores and best moves are stored but the whole search state gets collected. For this to work properly you indeed need an iterative search.
How would this be implemented in a most user friendly way for UCI / Winboard engines?

An argument against direct recursion

An argument against direct recursion

Re: An argument against direct recursion

Re: An argument against direct recursion

Re: An argument against direct recursion

Sample non-recursive mate finder

Re: An argument against direct recursion

Re: An argument against direct recursion

Stuff that lives inside a ply indexed record

Another thing: continuation support

Re: Another thing: continuation support