Move generation: staged vs all-at-once

sje · Post by **sje** » Thu Apr 30, 2009 3:18 am

In my very first chess program, a z80 assembly language piece of junk written long ago, all of the moves at a node were generated before any were searched. Later on I read about staged generation as was employed in Chess 4.x and it seemed to me that this was the better way.

But I'm not so sure anymore. Consider the main, non-quiescence portion of a traditional A/B search where the program maintains a complete bitboard array AttackFrom[Sq] and other handy database variables like pinned man bitboards. Legal-only move generation is extremely fast given an efficient NextSq() routine and a decent move representation. There is some set-up time, but this is constant and becomes insignificant when charged on a per-move basis. (Exception: check evasion, where all moves are usually generated anyway.) But if a staged move generation scheme is used, then there's a charge added for each call for a similar set-up effort.

Also, perhaps half the nodes in the main region of a traditional A/B search are ALL flavor nodes and all of the moves need to be examined anyway. No benefit for staged generation there.

Having all the moves available before any are searched has additional benefits when applying transposition, refutation, and killer moves. It is always possible that a move from one of these heuristic sources may not be legal at the current node, so they must always be checked for sanity and then legality. These checks can be skipped if all the moves are available; a simple scan of the move list will pick out the desired match -- if it's there.

Move ordering using other techniques can be better managed when all moves are generated and present. Some ideas on how to do this are presented in Ed Schröder's Inside REBEL and these could be adapted in a more modern environment (i.e., 64 bit, big memory PC).

A further advantage of all-at-once generation is that it's potentially easier to keep track of which moves have been searched and which ones remain.

Given a legal-only move generator, an all-at-once approach will immediately show when a checkmate/stalemate is present. This will simplify the remaining code and speed up things slightly. And if there's no mate but only one or two moves, this also will be known before any moves are searched and so the at information can be used to perform extension/reduction management.

I also suspect that all-at-once generation has a better instruction cache hit rate although I don't have hard data to prove it.

bob · Post by **bob** » Thu Apr 30, 2009 6:45 am

sje wrote:In my very first chess program, a z80 assembly language piece of junk written long ago, all of the moves at a node were generated before any were searched. Later on I read about staged generation as was employed in Chess 4.x and it seemed to me that this was the better way.

But I'm not so sure anymore. Consider the main, non-quiescence portion of a traditional A/B search where the program maintains a complete bitboard array AttackFrom[Sq] and other handy database variables like pinned man bitboards. Legal-only move generation is extremely fast given an efficient NextSq() routine and a decent move representation. There is some set-up time, but this is constant and becomes insignificant when charged on a per-move basis. (Exception: check evasion, where all moves are usually generated anyway.) But if a staged move generation scheme is used, then there's a charge added for each call for a similar set-up effort.

Also, perhaps half the nodes in the main region of a traditional A/B search are ALL flavor nodes and all of the moves need to be examined anyway. No benefit for staged generation there.

Having all the moves available before any are searched has additional benefits when applying transposition, refutation, and killer moves. It is always possible that a move from one of these heuristic sources may not be legal at the current node, so they must always be checked for sanity and then legality. These checks can be skipped if all the moves are available; a simple scan of the move list will pick out the desired match -- if it's there.

Move ordering using other techniques can be better managed when all moves are generated and present. Some ideas on how to do this are presented in Ed Schröder's Inside REBEL and these could be adapted in a more modern environment (i.e., 64 bit, big memory PC).

A further advantage of all-at-once generation is that it's potentially easier to keep track of which moves have been searched and which ones remain.

Given a legal-only move generator, an all-at-once approach will immediately show when a checkmate/stalemate is present. This will simplify the remaining code and speed up things slightly. And if there's no mate but only one or two moves, this also will be known before any moves are searched and so the at information can be used to perform extension/reduction management.

I also suspect that all-at-once generation has a better instruction cache hit rate although I don't have hard data to prove it.

There are three levels of this...

(1) generate all moves and move captures to the front. We did this in Cray Blitz because we used the vector hardware to move things around, which was way fast.

(2) generate one at a time. I've never liked this. I tried it in early Crafty versions, as it is somewhat faster since you don't have to take the bit moves and unload them into an array. But you lose flexibility for ordering, if you want to do anything cute. Currently I don't do any ordering but used to use the history heuristic which doesn't translate very well to generating one at a time.

(3) The crafty approach, which is to generate captures only, and then to generate non-captures. The advantage of this is that the q-search doesn't need the non-captures, so not producing those saves time. Generating captures first also provides a ton of quick cutoffs and avoids the bigger job of generating non-captures. I've been doing this for many years because of the speed advantage it gives.

Bill Rogers · Post by **Bill Rogers** » Thu Apr 30, 2009 6:52 am

Steven
Memories of Z80, with me the TRS-80. I always generate a comple ply but I evaluate each move as it is made then sort them according to their weights. In this way Captures are examined first and piece square table moves come in last. As the ply depth deepens this a great many times gives a quicker cut off and makes for faster plays.
This is basically the same idea used in the "Rebel" program and most likely most others.
Bill

sje · Post by **sje** » Thu Apr 30, 2009 7:09 am

bob wrote: There are three levels of this...

(1) generate all moves and move captures to the front. We did this in Cray Blitz because we used the vector hardware to move things around, which was way fast.

(2) generate one at a time. I've never liked this. I tried it in early Crafty versions, as it is somewhat faster since you don't have to take the bit moves and unload them into an array. But you lose flexibility for ordering, if you want to do anything cute. Currently I don't do any ordering but used to use the history heuristic which doesn't translate very well to generating one at a time.

(3) The crafty approach, which is to generate captures only, and then to generate non-captures. The advantage of this is that the q-search doesn't need the non-captures, so not producing those saves time. Generating captures first also provides a ton of quick cutoffs and avoids the bigger job of generating non-captures. I've been doing this for many years because of the speed advantage it gives.

In the quiescence search it's a different story. The new Symbolic has a gainer generator (captures plus promotions) that's used there and this may be extended to include some special generators seen in the old Symbolic (gainers plus checks, gainer checks, checks, etc.) as appropriate.

The new Symbolic, like the old one, has a gainer generator and a holder generator (like its predecessor Spector and like Crafty) along with a separate all-moves generator. The all-moves generator call is always faster than a gainer generation call followed by a holder generation call. So if at least one holder move needs searched, then the all-moves mode is a win. And because of the factors mentioned earlier along with the program's bitboard database implementation, the all-moves call is pretty much a win everywhere else. Even in the endgame where the transposition table hit rate is high and the cutoff rate is also high, the all-moves call wins because there are usually fewer moves and they are generated quickly.

The only real downside to the all-moves approach is the processing time incurred for incremental updating of the bitboard database. This is significant in Symbolic to be sure. But it pays off in many other areas of the program including move generation, evaluation, pins (detection and effects), swap analysis, and move ordering. Also, Symbolic never has to test for a pseudolegal move or an insane move from a table.

Perhaps the biggest gains from the all-moves approach will be seen in move ordering. One idea taken from Chess 4.x is to search all moves, both gainer and holder, by a particular piece. Tough to do in some modes, but easy with all-moves in play. Similar: all moves that attack a given piece, all moves that defend a given piece, etc.

sje · Post by **sje** » Thu Apr 30, 2009 7:30 am

Bill Rogers wrote:Memories of Z80, with me the TRS-80.

I had a couple of these I bought for a few dollars some twenty years after they were first put on the market. For my z80 work In the early days I had a couple of machines including a 16 KB Exidy (a pinball company) Sorcerer.

Much effort, little produced. Almost too painful to remember.

If I had to do a micro based chess program today, I'd pick the H8 (very much like a pdp11) and use a whole array of them. And that would be AFTER I had debugged everything on a simulator.

Gerd Isenberg · Post by **Gerd Isenberg** » Thu Apr 30, 2009 8:04 am

Trying (legally testing) hashmove before generating all moves and killers before generating quiet moves makes sense. Pawn takes (heavy) piece is also cheap to determine and an indicator to don't generate other moves in the first place. I prefer staged move-gen with the option to generate / sort all moves at once controlled by initial movegen state, which depends on expected node-type and depth left. What I do, is to generate 16 direction wise legal move target sets as a kind of unsorted move list on the fly.

Roman Hartmann · Post by **Roman Hartmann** » Thu Apr 30, 2009 8:12 am

I'm using a legal move generator that generates all the moves in every position for the AB-search and another move generator which generates pseudo-legal captures only when in Q-search unless in check then the QS will switch to legal move generation.

The nice thing about a move generator that generates all moves in a certain position is that it can be used to detect mate or stalemate without searching any moves. Move ordering is also very easy if you have all the moves available.

The drawback is that a lot of generated moves are never searched at all so generating most of those moves is just waste of time.

But anyway testing a next-move generator for the AB-search is rather high on my priority list as in some positions (almost) any move will do to get a quick cut-off.

best regards
Roman

rjgibert · Post by **rjgibert** » Thu Apr 30, 2009 8:24 am

What percentage of total time spent does your move generator use in your program?

sje · Post by **sje** » Thu Apr 30, 2009 8:42 am

rjgibert wrote:What percentage of total time spent does your move generator use in your program?

Subject to wide variance:

Code: Select all

	4.6%	4.6%	Symbolic	Pos&#58;&#58;GenerateFull&#40;GMVec&) const	
	3.8%	3.8%	Symbolic	Pos&#58;&#58;GenerateGain&#40;GMVec&) const	
	0.3%	0.3%	Symbolic	Pos&#58;&#58;GenerateEvad&#40;GMVec&) const

hgm · Post by **hgm** » Thu Apr 30, 2009 8:47 am

In HaQiKi D I currently generate all moves in advance. Captures and non-captures go in a different list, though, so I can easily sort the captures. Hash, and killer moves are extracted from the non-captures before the remaining non-captures are sorted; hash move is transferred to the capture list (if needed), and receives a sort key that would top any MVV/LVA value of the other captures. Captures are extracted one by one as te next move needs to be searched, and are then subjected to a kind of SEE. If it is obvious that the capture is bad, its searching is deferred until after the other captures, by reducing its sort key to a value that is lower than any MVV/LVA value.

Currently I even push non-captures on the stack in QS; the non-captures are generated as a side effect from generating captures anyway. I could make a seperte move generator for QS that saves a little time by not pushing them, but my eventual goal is to generate the moves in Q in an entirely different manner, from a differentially updated attack table. Then I would loop through the opponent's piece list high-to-low, in stead of through my own, look in the attack tables for attackers of the squares these pieces are on. This would give very efficient move-by-move generation of the captures. The attack table would also make it much easier to test for bad captures, and these could simply be skipped.

Move generation: staged vs all-at-once

Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once

Re: Move generation: staged vs all-at-once