An argument against direct recursion in a chess tree search:
I'd guess that nearly every modern A/B search routine uses direct recursion to call itself from ply to ply. This seems to be the natural way to do things, but is it the best?
In the Old Days, programming languages and their associated calling sequences did not support direct recursion. Chess program authors either had to code in assembly language with custom calling sequences or write non-recursive versions of a search; there was no other choice (outside of Lisp). Later, with the advent of new languages like C, Pascal, C++, and the like, direct recursion became widely supported and hand-made alternatives were dropped.
But consider some advantages of a non direct recursive approach:
1) There is zero danger of a stack overflow.
2) As there are no local (i.e., automatic) variables, each thread in a multithreaded search needs less stack space.
3) With no formal parameters or a return result, there can be less run time overhead. Also, less storage.
4) The programmer is forced to think more carefully during the design phase.
5) When a crash occurs, all the state variables can be read from a ply indexed vector of values; there is nothing hidden because there is nothing stored in local variables, formal parameters, or return results.
6) Less L1 cache thrashing as the search routine doesn't share stack space with any routines it might call.
7) Exception processing is simplified; a search can return from twenty ply deep just by changing a state variable. And none of the setjmp/longjmp bull.
8) You could write a chess program in Cobol. (Just joking!)
An argument against direct recursion
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: An argument against direct recursion
I've done both, so a couple of perspectives...sje wrote:An argument against direct recursion in a chess tree search:
I'd guess that nearly every modern A/B search routine uses direct recursion to call itself from ply to ply. This seems to be the natural way to do things, but is it the best?
In the Old Days, programming languages and their associated calling sequences did not support direct recursion. Chess program authors either had to code in assembly language with custom calling sequences or write non-recursive versions of a search; there was no other choice (outside of Lisp). Later, with the advent of new languages like C, Pascal, C++, and the like, direct recursion became widely supported and hand-made alternatives were dropped.
But consider some advantages of a non direct recursive approach:
1) There is zero danger of a stack overflow.
2) As there are no local (i.e., automatic) variables, each thread in a multithreaded search needs less stack space.
3) With no formal parameters or a return result, there can be less run time overhead. Also, less storage.
4) The programmer is forced to think more carefully during the design phase.
5) When a crash occurs, all the state variables can be read from a ply indexed vector of values; there is nothing hidden because there is nothing stored in local variables, formal parameters, or return results.
6) Less L1 cache thrashing as the search routine doesn't share stack space with any routines it might call.
7) Exception processing is simplified; a search can return from twenty ply deep just by changing a state variable. And none of the setjmp/longjmp bull.
8) You could write a chess program in Cobol. (Just joking!)
(1) I would never use setjmp()/longjmp() ANYWAY. Sloppy to say the least. But for recursion, it has the nice characteristic that it makes the search very easy to understand. And it makes things like "internal iterative deepening" and such very easy to implement.
(2) an iterated search (Cray Blitz) is certainly a clean way to do things, and for a parallel implementation, one can go to a full-blown DTS design, or a "young brothers wait, but at ANY ply" approach. There's a gain there.
(3) I do not notice any speed difference. One needs local data per ply, so you end up using (hopefully) an array of structures so that each ply's local data is close together for cache efficiency. Whether the data is on the stack or in an array seems to be six of one, half-dozen of the other. The stack is slightly more efficient to use since ebp points to the local data, where for an array you end up using another register for the pointer to that data.
(4) Procedure calls are not particularly fast (or slow) but the iterated search has none, so that is a small win.
Having done both, I like the recursive negamax approach. It is simpler, needs less debugging, and you don't have those "ply=N changes data for ply=M type bugs, as that can't happen in a recursive search.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: An argument against direct recursion
If there's a problem with an iterative search (vs direct recursive calling), then it's the possibility that the resulting code is not quite as easy to read. An example: the old Control Data mainframes did not support recursive calls, so CDC Fortran didn't, so the Northwestern Chess 3.x program didn't, and I believe this carried over to the assembly language Chess 4.x and from there to the Pascal Chess 0.5 (even though Pascal supports recursive calls). Take a look at the Pascal source for Chess 0.5 and you'll see a bunch of labels and goto statements, particularly in the non-recursive search. And that's too bad, because it didn't have to be that way; a recursive version would have been better, as would have been a better structured iterative version. But I suspect that Chess 0.5 is the way it is because so much of it is adapted straight from Chess 4.x.
Concerning IID, I think that can be handled without too much trouble just by adding a few states to the state variable type that controls the main case statement. Maybe: IIDstart, IIDpick, IIDdown, IIDup, and IIDdone.
Concerning inter-ply bug magnets, I avoid these by having a very simple inter-ply data flow:
1) Just before starting (ply + 1), the (ply + 1) frame is given its input window (based upon the current window, reversed and downshifted).
2) Just after finishing (ply + 1), the earlier ply recovers the (ply + 1) updated window alpha value (upshifted) and, if needed, the (ply + 1) PV.
3) That's it.
Concerning the need for an extra address register in addition to the routine activation frame base register, yes, that can be true. On the other hand, without having to make a recursive call for each node, there's a gain in not having to save/restore all the registers each time through. There could even be a chance that the optimizer can do a better job when it knows it can preserve register content knowledge.
I'm currently converting the demonstration checkmate search in BozoChess from direct recursion to iteration with explicit stack storage. I've already cut out all the local variables, formal parameters, and the return result; now I have to add the the big case statement and the state variable which controls it. When I've got that working, I'll post it so others may judge my approach.
Concerning IID, I think that can be handled without too much trouble just by adding a few states to the state variable type that controls the main case statement. Maybe: IIDstart, IIDpick, IIDdown, IIDup, and IIDdone.
Concerning inter-ply bug magnets, I avoid these by having a very simple inter-ply data flow:
1) Just before starting (ply + 1), the (ply + 1) frame is given its input window (based upon the current window, reversed and downshifted).
2) Just after finishing (ply + 1), the earlier ply recovers the (ply + 1) updated window alpha value (upshifted) and, if needed, the (ply + 1) PV.
3) That's it.
Concerning the need for an extra address register in addition to the routine activation frame base register, yes, that can be true. On the other hand, without having to make a recursive call for each node, there's a gain in not having to save/restore all the registers each time through. There could even be a chance that the optimizer can do a better job when it knows it can preserve register content knowledge.
I'm currently converting the demonstration checkmate search in BozoChess from direct recursion to iteration with explicit stack storage. I've already cut out all the local variables, formal parameters, and the return result; now I have to add the the big case statement and the state variable which controls it. When I've got that working, I'll post it so others may judge my approach.
-
- Posts: 892
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: An argument against direct recursion
A several years ago I have written a small recursive alpha-beta procedure that GNU C++ compiler optimized into completely inline non recursive code. When I added a few features and refactorings to the program this recursion unrolling opimization disappeared.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Sample non-recursive mate finder
A sample non-recursive mate finder (from BozoChess):
Code: Select all
procedure CpcFindMate(var cpc: cpctype; fullmovecount: Integer);
var
ply: plytype;
depth: Integer;
procedure MateNode;
procedure ApplyKillerBonus;
var
index: Integer;
begin
with cpc, ssc, pirvec[ply] do
begin
index := GmsLocateMove(gms, killers.k0move);
if index >= 0 then
Inc(gms.moves[index].sv, 16);
index := GmsLocateMove(gms, killers.k1move);
if index >= 0 then
Inc(gms.moves[index].sv, 8)
end
end; { ApplyKillerBonus }
begin
with cpc, ssc do
while pirvec[ply].nss <> nssexit do
case pirvec[ply].nss of
nssplystart:
with pirvec[ply] do
begin
{CpcTraceCV(cpc);}
Inc(nodecount);
PirClear(pirvec[ply]);
nss := nsstermtest
end;
nsstermtest:
with pirvec[ply] do
if depth = 0 then
begin
if PosIsCheckmate(currpos) then
window.alfa := svlosein0
else
window.alfa := sveven;
nss := nssplyfinish
end
else
nss := nssgenerate;
nssgenerate:
with pirvec[ply] do
begin
if depth = 1 then
PosGenOnlyChecks(currpos, gms)
else
begin
PosGenerate(currpos, gms);
PosApplyOrderAntiMobility(currpos, gms);
ApplyKillerBonus;
GmsSortByScore(gms)
end;
nss := nssmovepick
end;
nssmovepick:
with pirvec[ply] do
if (moveindex = gms.movecount) or WindowIsClosed(window) or
(ispzmover and IsSvMating(window.alfa)) or
((not ispzmover) and (not IsSvLosing(window.alfa))) then
nss := nssendscan
else
begin
srchmove := gms.moves[moveindex];
nss := nssexecute
end;
nssexecute:
begin
with pirvec[ply] do
begin
VariationAppendMove(currvar, srchmove);
WindowShiftDn(window, pirvec[ply + 1].window);
PosExecute(currpos, srchmove);
end;
Inc(ply);
Dec(depth);
pirvec[ply].nss := nssplystart
end;
nssretract:
begin
Inc(depth);
Dec(ply);
with pirvec[ply] do
begin
PosRetract(currpos);
srchsv := CalcSvUp(pirvec[ply + 1].window.alfa);
VariationRecycleTail(currvar);
if srchsv > window.alfa then
begin
bestmove := srchmove;
window.alfa := srchsv;
if srchsv < window.beta then
begin
VariationRecycle(pv);
VariationAppendMove(pv, srchmove);
VariationAppend(pv, pirvec[ply + 1].pv)
end
end;
Inc(moveindex);
nss := nssmovepick
end
end;
nssendscan:
with pirvec[ply] do
begin
if not MoveFlagTest(bestmove, mfvoid) then
KillersAddMove(killers, bestmove);
nss := nssplyfinish
end;
nssplyfinish:
with pirvec[ply] do
if ply > 0 then
nss := nssretract
else
nss := nssexit;
end { case }
end; { MateNode }
begin
with cpc, ssc do
begin
ply := 0;
depth := (fullmovecount * 2) - 1;
pirvec[ply].nss := nssplystart;
pirvec[ply].window := rootwindow;
MateNode;
predsv := pirvec[ply].window.alfa;
VariationAppend(predvar, pirvec[ply].pv);
TimerStop(ssctimer);
usedusec := TimerCurrent(ssctimer);
if IsSvMating(predsv) then
begin
VariationNotate(predvar, rootpos);
st := stforcedmate
end
else
begin
VariationRecycle(predvar);
st := stlimitdepth
end
end
end; { CpcFindMate }
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: An argument against direct recursion
I switched Glass search recently to Iterative. It will be in place in the soon to be released version 1.8 (not that you would notice).
I cannot state any speed gain at all through the changes. I even had some trouble in the beginning because the code turned out slower. I needed to get the tree-stack properly aligned to 64byte, otherwise it would be like 5-10% slower. After all I really like this style for debugging, as at any point in search you have access to all the variables from all the plies in the tree.
In the past I have read that profilers would be more effective for iterative over recursive code. I cannot support that claim - no messurable difference.
I cannot state any speed gain at all through the changes. I even had some trouble in the beginning because the code turned out slower. I needed to get the tree-stack properly aligned to 64byte, otherwise it would be like 5-10% slower. After all I really like this style for debugging, as at any point in search you have access to all the variables from all the plies in the tree.
In the past I have read that profilers would be more effective for iterative over recursive code. I cannot support that claim - no messurable difference.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: An argument against direct recursion
I don't see any speed difference, although such might appear with a more complicated general search.Edmund wrote:After all I really like this style for debugging, as at any point in search you have access to all the variables from all the plies in the tree.
In the past I have read that profilers would be more effective for iterative over recursive code. I cannot support that claim - no measurable difference.
In the above scheme, there are several states that could be merged, but I chose not to do this as it would make the code more complicated.
The only really tricky parts are the two places where the ply changes as the store/load of the state variable depends on this, and everything else depends on the state variable.
There are only three lines where the code references data outside the current ply, and these can be quickly found as each has a "pirvec[ply + 1]" selector vs a "pirvec[ply]" selector.
The kick-off code in the main routine should be simplified further:
Code: Select all
with pirvec[ply] do
begin
nss := nssplystart;
window := rootwindow;
MateNode;
predsv := window.alfa;
VariationAppend(predvar, pv)
end;
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Stuff that lives inside a ply indexed record
Stuff that lives inside a ply indexed record:
Code: Select all
{ Ply indexed record }
pirtype =
record
isplyzero: Boolean; { Constant ply zero flag; set only for the first ply }
islastply: Boolean; { Constant last ply flag; set only for the last ply }
ispzmover: Boolean; { Constant ply zero mover flag; set for even ply }
nss: nsstype; { Node search state }
window: windowtype; { Input window; score result returned as alfa value }
pv: variationtype; { Predicted variation result }
moveindex: Integer; { Index into move generation vector }
killers: killerstype; { Killer move data }
bestmove: movetype; { Best move }
srchmove: movetype; { Move currently searched }
srchsv: svtype; { Returned score for the searched move }
gms: gmstype { Moves }
end;
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Another thing: continuation support
Another thing in the iterative scheme: continuation support. See the wiki article http://en.wikipedia.org/wiki/Continuation
-
- Posts: 670
- Joined: Mon Dec 03, 2007 3:01 pm
- Location: Barcelona, Spain
Re: Another thing: continuation support
We have to distinguish here ...sje wrote:Another thing in the iterative scheme: continuation support. See the wiki article http://en.wikipedia.org/wiki/Continuation
1) there are the winboard command pause/resume (I dont know whether any interface or engine has them actually implemented). Upon receiving the pause command the engine just enters a loop with the only break being when the engine receives resume as an input. You dont need iterative search for that.
2) the engine stores it current search-state on disk and exits and the next time it loads it is able to continue its search from where it left before. This is a more extreme form of persistant hash, where not only position scores and best moves are stored but the whole search state gets collected. For this to work properly you indeed need an iterative search.
How would this be implemented in a most user friendly way for UCI / Winboard engines?