Move generation: staged vs all-at-once

Greg Strong · Post by **Greg Strong** » Fri May 01, 2009 1:48 am

What I've been meaning to try is to generate moves in stages at expected CUT nodes and generate all moves at expected ALL and PV nodes. That seems the most logical to me.

sje · Post by **sje** » Fri May 01, 2009 4:51 am

1) O(n^2) should be avoided where feasible. I have been experimenting with the gcc version of the standard function qsort() for ordering move-score pairs (eight bytes per record, four byte key) and it works well running O(n * log n). I suspect this is true in part because of 64 bit hardware and in part because it avoids the worst cases when the entire move vector needs to be scanned and searched that would otherwise cost the full O(n^2) effort.

2) Consider the case of when there are a typical 32 moves present. Repeated linear scanning will cost 1,024 operations in the worst case while qsort() will cost about 160 operations an any case. If more than five passes are needed by a linear scan, then qsort() wins. If the linear scan is at an ALL flavor node, qsort() wins big.

3) A homebrew version of qsort() with hard coded sizes and an inline comparator function should be even faster than the regular qsort(). I haven't tried this yet.

4) A lot of move ordering ideas that appear in public implementations work on a per-move basis and so fail to look at more global issues with the other moves and with the board. It is unreasonable to expect much of MVV/LVA or the like when the mover's queen is hanging.

5) How about a conditional compilation of the move ordering mechanism that would track the success of the various ordering heuristics? This could be more efficient than the alternative of many, many test runs. It could be possible to correlate the relative success of the ordering functions with certain characteristics of the position, the last move or two, etc. and use this data to adjust move ordering weighting dynamically.

CThinker · Post by **CThinker** » Fri May 01, 2009 5:14 am

bob wrote: I've not seen anyone physically sort the move list for history moves.

This makes for a very compact code. Each phase is nothing more than a call to some move generator. It is assumed that the move generator will order the moves (or it can choose not to).

Its something like:

Code: Select all

TMoveGenerators MoveGenerators = &#123;HashMoveGen,KillerMoveGen,GainingCapturesMoveGen,NonCapturesMoveGen,LosingCapturesMoveGen&#125;;

GetNextMove&#40;)
&#123;
    while&#40;true&#41;
    &#123;
        if &#40;there is a move from the current move list&#41;
        &#123;
            get &#40;pop&#41; the move from the list;
            return the move;
        &#125;

        go to the next move generator;
        if there are no more move generators
            return "no more moves";

        use the current move generator to populate the move list';
    &#125;
&#125;

The GainingCapturesMoveGen and the LosingCapturesMoveGen are related, in that, the later simply returns the moves that the former does not.

The use of such 'patters' is the reason for the simplicity and compactness of Thinker.

hgm · Post by **hgm** » Fri May 01, 2009 8:12 am

sje wrote:Yet how many chess programs do this? Some might have some code to try and capture the last piece moved, but that's about it.

The point is that in Chess programs this des not work in 99.9% of the tree nodes, because the are positions a Human would never consider. Because the side playing the all nodes tries everything, after a few moves he has pieces hanging all over the place. And the side playing the cut-nodes has hnging pieces too, because he failed to properly defend against attacks, and the opponent would have failed to folow up on the attack with an actual capture.

So the 'tactical delta' does not give you the most relevant part of the tactics. For positions that engines search the recipy is:
1) Do nothing, as the previous opponent moves are likely cr*p.
2) In the rare cases he makes trouble, execute the most valuable 'hostage' that he was so friendly to hand you on one of his previous moves.

Only for checks the tactical delta is meaningful, as it is not allowed to leave your King in check. And this is indeed often used in check detection.

The tactical-delta approach might work in PV nodes, but I doubt if there are enough of those to have an impact on performance.

bob · Post by **bob** » Fri May 01, 2009 8:24 am

CThinker wrote:
bob wrote: I've not seen anyone physically sort the move list for history moves.
This makes for a very compact code. Each phase is nothing more than a call to some move generator. It is assumed that the move generator will order the moves (or it can choose not to).

Its something like:
Code: Select all
TMoveGenerators MoveGenerators = &#123;HashMoveGen,KillerMoveGen,GainingCapturesMoveGen,NonCapturesMoveGen,LosingCapturesMoveGen&#125;;

GetNextMove&#40;)
&#123;
    while&#40;true&#41;
    &#123;
        if &#40;there is a move from the current move list&#41;
        &#123;
            get &#40;pop&#41; the move from the list;
            return the move;
        &#125;

        go to the next move generator;
        if there are no more move generators
            return "no more moves";

        use the current move generator to populate the move list';
    &#125;
&#125;
The GainingCapturesMoveGen and the LosingCapturesMoveGen are related, in that, the later simply returns the moves that the former does not.

The use of such 'patters' is the reason for the simplicity and compactness of Thinker.

I am not sure "compactness" for "speed" is a good tradeoff here. In about 1/2 the nodes a sort is totally wasted because they are ALL nodes. In the other 1/2, by the time you get to the hash move, good captures, then killers, sorting doesn't seem to buy a whole lot.

bob · Post by **bob** » Fri May 01, 2009 8:27 am

sje wrote:1) O(n^2) should be avoided where feasible. I have been experimenting with the gcc version of the standard function qsort() for ordering move-score pairs (eight bytes per record, four byte key) and it works well running O(n * log n). I suspect this is true in part because of 64 bit hardware and in part because it avoids the worst cases when the entire move vector needs to be scanned and searched that would otherwise cost the full O(n^2) effort.

2) Consider the case of when there are a typical 32 moves present. Repeated linear scanning will cost 1,024 operations in the worst case while qsort() will cost about 160 operations an any case. If more than five passes are needed by a linear scan, then qsort() wins. If the linear scan is at an ALL flavor node, qsort() wins big.

Generally you won't do that. After the first N passes, you quit and just take the rest of the moves in order since this is likely an ALL node by the time you have gotten this far into the move list. Or if not, history info or whatever is not going to help you find the right move.

3) A homebrew version of qsort() with hard coded sizes and an inline comparator function should be even faster than the regular qsort(). I haven't tried this yet.

4) A lot of move ordering ideas that appear in public implementations work on a per-move basis and so fail to look at more global issues with the other moves and with the board. It is unreasonable to expect much of MVV/LVA or the like when the mover's queen is hanging.

5) How about a conditional compilation of the move ordering mechanism that would track the success of the various ordering heuristics? This could be more efficient than the alternative of many, many test runs. It could be possible to correlate the relative success of the ordering functions with certain characteristics of the position, the last move or two, etc. and use this data to adjust move ordering weighting dynamically.

Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

More thoughts

Re: Move generation: staged vs all-at-once

Re: Implementing the "What just happened?" heurist

Re: Move generation: staged vs all-at-once

Re: More thoughts