Move generation: staged vs all-at-once

sje · Post by **sje** » Thu Apr 30, 2009 8:49 am

Another sampling:

	13.6%	13.6%	Symbolic	BD::AttackDel(Sq, Color)	
	13.5%	13.5%	Symbolic	SearchTask::ABNodeFull(Window, unsigned int, int, MoveList&)	
	13.4%	13.4%	Symbolic	BD::AttackAdd(Sq, Color, Man)	
	7.7%	7.7%	Symbolic	Pos::Execute(PPD&, Move)	
	6.6%	6.6%	Symbolic	Pos::Retract(PPD const&)	
	5.8%	5.8%	Symbolic	Pos::Evaluate(TrBaseEval*, TrBasePawn*) const	
	4.7%	4.7%	Symbolic	GMVec::AssignBasicPrelimScores(Pos const&)	
	4.7%	4.7%	Symbolic	SearchTask::ABNodeGain(Window, unsigned int, int, MoveList&)	
	4.5%	4.5%	Symbolic	BD::AttackCut(Sq)	
	3.8%	3.8%	Symbolic	Pos::RegenPinBitboards()	
	3.7%	3.7%	Symbolic	Pos::GenerateFull(GMVec&) const	
	3.7%	3.7%	Symbolic	BD::AttackPro(Sq)	
	3.4%	3.4%	Symbolic	SearchTask::ABNodeEvad(Window, unsigned int, int, MoveList&)	
	2.0%	2.0%	Symbolic	Pos::GenerateGain(GMVec&) const	
	1.9%	1.9%	Symbolic	Pos::BestPossibleGain() const	
	1.6%	1.6%	Symbolic	SearchTask::ABGate(Window, unsigned int, int, MoveList&)	
	1.5%	1.5%	Symbolic	GMVec::FindMoveIndex(Move) const	
	1.2%	1.2%	Symbolic	Pos::CountPriorOccurances(unsigned int) const	
	0.9%	0.9%	Symbolic	Pos::GenerateEvad(GMVec&) const	
	0.9%	0.9%	Symbolic	Window::DownShift()	
	0.2%	0.2%	Symbolic	MinUnsignedInt(unsigned int, unsigned int)	
	0.2%	0.2%	Symbolic	TrBasePawn::Probe(Pos const&, Score&)	
	0.1%	0.1%	Symbolic	Pos::GenerateAndScoreGain(GMVec&) const	
	0.1%	0.1%	Symbolic	Pos::TestLegalEnPassantMove(Move) const	
	0.0%	0.0%	Symbolic	Pos::GenerateAndScoreEvad(GMVec&) const	
	0.0%	0.0%	Symbolic	Pos::GenerateAndScoreFull(GMVec&) const

bob · Post by **bob** » Thu Apr 30, 2009 7:29 pm

sje wrote:
bob wrote: There are three levels of this...

(1) generate all moves and move captures to the front. We did this in Cray Blitz because we used the vector hardware to move things around, which was way fast.

(2) generate one at a time. I've never liked this. I tried it in early Crafty versions, as it is somewhat faster since you don't have to take the bit moves and unload them into an array. But you lose flexibility for ordering, if you want to do anything cute. Currently I don't do any ordering but used to use the history heuristic which doesn't translate very well to generating one at a time.

(3) The crafty approach, which is to generate captures only, and then to generate non-captures. The advantage of this is that the q-search doesn't need the non-captures, so not producing those saves time. Generating captures first also provides a ton of quick cutoffs and avoids the bigger job of generating non-captures. I've been doing this for many years because of the speed advantage it gives.
In the quiescence search it's a different story. The new Symbolic has a gainer generator (captures plus promotions) that's used there and this may be extended to include some special generators seen in the old Symbolic (gainers plus checks, gainer checks, checks, etc.) as appropriate.

The new Symbolic, like the old one, has a gainer generator and a holder generator (like its predecessor Spector and like Crafty) along with a separate all-moves generator. The all-moves generator call is always faster than a gainer generation call followed by a holder generation call. So if at least one holder move needs searched, then the all-moves mode is a win. And because of the factors mentioned earlier along with the program's bitboard database implementation, the all-moves call is pretty much a win everywhere else. Even in the endgame where the transposition table hit rate is high and the cutoff rate is also high, the all-moves call wins because there are usually fewer moves and they are generated quickly.

I'm not convinced. I did a ton of testing before I decided to split the move generator into two parts. The q-search is obviously a no-brainer. But the normal search has a majority of its nodes where a capture is the best move to refute a stupid move at the previous ply.

I have four generators.

GenerateCaptures()
GenerateNonCaptures()
GenerateChecks()
GenerateCheckEvasions()

To generate all moves you have to call the first two in succession. Since captures are so often the best move, I do that first, and after having exhausted the good-looking captures, I generate the rest, saving that effort if a capture is good enough.

GenerateCheckEvasions() is a legal-move generator and is only used when a node starts off with the side to move in check. GenerateChecks() is used in the first level of the q-search only to include checking moves there.

The only real downside to the all-moves approach is the processing time incurred for incremental updating of the bitboard database. This is significant in Symbolic to be sure. But it pays off in many other areas of the program including move generation, evaluation, pins (detection and effects), swap analysis, and move ordering. Also, Symbolic never has to test for a pseudolegal move or an insane move from a table.

Perhaps the biggest gains from the all-moves approach will be seen in move ordering. One idea taken from Chess 4.x is to search all moves, both gainer and holder, by a particular piece. Tough to do in some modes, but easy with all-moves in play. Similar: all moves that attack a given piece, all moves that defend a given piece, etc.

If I were doing any of that, I would agree. However Crafty is still "brute force" and doesn't try targeted generation for anything at all, nor is it likely to do so in the near future if at all.

bob · Post by **bob** » Thu Apr 30, 2009 7:32 pm

Bill Rogers wrote:Steven
Memories of Z80, with me the TRS-80. I always generate a comple ply but I evaluate each move as it is made then sort them according to their weights. In this way Captures are examined first and piece square table moves come in last. As the ply depth deepens this a great many times gives a quicker cut off and makes for faster plays.
This is basically the same idea used in the "Rebel" program and most likely most others.
Bill

IMHO a chess program is a classic example of where "procrastination" works perfectly. The ideal chess program design philosophy is "never do now what you can put off until later, because with alpha/beta, sometimes later never comes." That has proven to be true for me in move generation. Just look at how many times a capture is the best move because searching all possible moves hangs pieces all over the place. If you don't use bitboards, then generating only captures is not so good, because you still have to iterate down each ray for sliders, skipping over empty squares. For bitboard engines, this is not true and the idea becomes a big win in terms of speed.

And in chess, like automobiles, "speed kills".

CThinker · Post by **CThinker** » Thu Apr 30, 2009 7:52 pm

I have done both, and in the end, I went with the phased move generation scheme. Version 3.x of Thinker, and older versions of 4.x always generate all moves. Later versions of 4.x and all of 5.x generate moves in phases.

First, as has been mentioned, most of the time, the non-captures will never be used. The captures/promotions are usually good enough. After that, the killer moves are usually good enough.

Second, I sort the non-captures using history counts. The history counts get better after the captures and killer moves have been searched. So, the ordering of non-captures is better.

Third, in Thinker, sorting N moves and then sorting M moves is faster than sorting N + M moves, considering that the sorting algorithm in Thinker is order N^2 (the simplistic bubble sort).

Bill Rogers · Post by **Bill Rogers** » Thu Apr 30, 2009 9:12 pm

I have a puzzeling question and that is how do you generate capture moves only without generating all moves to see if it is a capture?
And ( I know this is not the proper wat to start a sentence ) if you have to generate all moves then why isn't a center control squares included in the initial sort as this then gives you a good starting point for all future generated moves?
Bill

bob · Post by **bob** » Thu Apr 30, 2009 9:36 pm

Bill Rogers wrote:I have a puzzeling question and that is how do you generate capture moves only without generating all moves to see if it is a capture?
And ( I know this is not the proper wat to start a sentence ) if you have to generate all moves then why isn't a center control squares included in the initial sort as this then gives you a good starting point for all future generated moves?
Bill

For bitboards it is trivial. Normally, to generate all moves, you create a "target square mask" that contains a 1 everywhere except for squares that contain your own pieces. You then complement this mask and hang on to it. When you generate any moves, the first thing you do is AND the move bitmap with the mask you saved, which excludes any move that would capture your own piece.

To generate captures, you change the mask to contain only 1 bits for the opponent's occupied squares, 0's everywhere else. Now you generate moves, AND with this bitmap, and it only leaves moves that capture opponent pieces, all others are "gone".

bob · Post by **bob** » Thu Apr 30, 2009 9:40 pm

CThinker wrote:I have done both, and in the end, I went with the phased move generation scheme. Version 3.x of Thinker, and older versions of 4.x always generate all moves. Later versions of 4.x and all of 5.x generate moves in phases.

First, as has been mentioned, most of the time, the non-captures will never be used. The captures/promotions are usually good enough. After that, the killer moves are usually good enough.

Second, I sort the non-captures using history counts. The history counts get better after the captures and killer moves have been searched. So, the ordering of non-captures is better.

Third, in Thinker, sorting N moves and then sorting M moves is faster than sorting N + M moves, considering that the sorting algorithm in Thinker is order N^2 (the simplistic bubble sort).

I've not seen anyone physically sort the move list for history moves. I always used a "selection" sort approach, which means making one pass over the list to pick out the move with the highest history counter. Then repeating when you want another move. After a few of these, you stop doing the history test and just take 'em in order since apparently order doesn't mean anything since this is an ALL node.

Doing this, you get (say) O(4*N) rather than O(N*N) if you just use history to select the first four moves and then give up and take the remainder in order generated. This is certainly more efficient.

mcostalba · Post by **mcostalba** » Thu Apr 30, 2009 10:34 pm

bob wrote: I've not seen anyone physically sort the move list for history moves.

Toga does this, while Glaurung does as you explained in your post, in Glaurung the first 12 moves are picked up in order the others are taken as they come.

I have tried to change that value, but it seem 12 is a lucky number here

wgarvin · Post by **wgarvin** » Fri May 01, 2009 1:25 am

bob wrote:
Bill Rogers wrote:I have a puzzeling question and that is how do you generate capture moves only without generating all moves to see if it is a capture?
And ( I know this is not the proper wat to start a sentence ) if you have to generate all moves then why isn't a center control squares included in the initial sort as this then gives you a good starting point for all future generated moves?
Bill
For bitboards it is trivial. Normally, to generate all moves, you create a "target square mask" that contains a 1 everywhere except for squares that contain your own pieces. You then complement this mask and hang on to it. When you generate any moves, the first thing you do is AND the move bitmap with the mask you saved, which excludes any move that would capture your own piece.

To generate captures, you change the mask to contain only 1 bits for the opponent's occupied squares, 0's everywhere else. Now you generate moves, AND with this bitmap, and it only leaves moves that capture opponent pieces, all others are "gone".

I think the other part of the answer, is that with bitboards, if you know which squares are "occupied", there are various techniques (such as magic move generation) that with O(1) effort, can produce a bitboard of all the squares your slider can attack. You then AND that with the "squares containing enemy pieces" bitboard and you have all of the pieces that can be captured by this slider, *without* having to iterate over all the intermediate squares like you might in a mailbox engine.

sje · Post by **sje** » Fri May 01, 2009 1:27 am

A chess primer will tell the beginner to always examine an opponent's last move and use that to help select a move. This works for humans who can determine the "tactical delta" created by a move: some squares get new attacks, some squares lose old defenses, etc.

Yet how many chess programs do this? Some might have some code to try and capture the last piece moved, but that's about it.

With all-at-once generation, it's much easier to fashion move ordering to take advantage of the beginner's advice. Bitboards that describe changes due to the last move are quickly and easily constructed, and these can be use in ordering moves when all moves are guaranteed to be present.

Ideas:

1) Capturing the last moved piece, preferably by a piece newly attacked.

2) Adding defenses to newly attacked pieces, or moving those insufficiently defended.

3) Capturing or attacking newly undefended pieces, or moving pieces to newly undefended squares.

Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Implementing the "What just happened?" heuristic