What is the best known speed of move generation?

bob · Post by **bob** » Sun Oct 14, 2007 10:20 pm

hgm wrote:
bob wrote:OK. I did not look at his code and generally do not look at code of others due to a lack of time.
This policy apparrently also applies to what people post here...

Ok, in that tone, I would agree but would only ask why you are talking about yourself here?

If he generates legal moves that's fine. Sounds like a waste of time, but that's a different matter.
... as it was clearly explained above that it actually saves a lot of time, and that it is the generation of such illegal moves that is a big time waster.

Then explain it in a way that is theoretically sound. Moving a piece that is absolutely pinned is rare. Because absolute pins themselves are rare. So which is more efficient, to discover later that the piece was pinned and spend a little extra time doing this, or spending a little extra time on _every_ move to determine the same thing?

Make the common case fast...

Sorry, but you don't get to redefine something _I_ defined first. Perft was designed with two goals. (1) enumerate a tree of legal moves to a fixed depth; (2) test performance (hence the name "perf from performance, and t from time) by measuring move generation and make/unmake speed. Full make/unmake, whatever it is you do in your actual program is what was intended to be done here.
Not redefine. Just implement it better. They want to test the performance of their engine, and thus they do the performance test to mimic the behavior of their engine as closely as possible. But it is still a performance test, as it fulfills exactly the functions you defined.

Are you reading what people are doing? they are _not_ doing what they normally do in their engines. Not updating hash signature in Make/Unmake. Hashing move counts which is not used anywhere in the tree search. Etc. So how is _that_ reflective of the actual gen/make/unmake overhead? Why not just measure it as it really is?

bob wrote:Fine. But let me _again_ point out. "perft". "performance test". Originally defined in Cray Blitz, and then in Crafty in 1995. It was originally used after Crafty was originally released, as a way for others to (a) test move generator for correctness (it is easy to screw up en passant captures, promotions, etc) and (b) a way to measure the speed of the move generator and make/unmake code which also tests the data structures for efficiency as well.

Comparing two programs that do the computations differently is useless when time is measured... Yet everyone is doing that and it is wrong.
Unless, of course, the programs that are being compared are doing it the same way. Then it doesn't matter so much which way this is.

You only have to look at the speed differences to figure out they are _not_ testing it in the same way...

Everyone is well aware of this, and this is why we always report if we are making the last ply, hashing counts, classifying moves, detecting checks...

hgm wrote:
And, like I said, engines usually don't make the moves they generate in the end-leaves either. So for timing an engine under construction, Make / UnMake of the last ply would invite you to make completely wrong and counterproductive 'optimizations'.
Eh? Why don't you make/unmake moves in the last ply? I do, unless there are none generated to make...
How do you know there are none generated to make? Wouldn't that be by running your move generator? I was under the impression that Crafty would not search bad captures in end leaves. So you must generate the captures, in order to subject them to see, and then _not_ make them...

bob wrote: Again I don't follow. Perft has _nothing_ to do with an alpha/beta search.
They are both tree walks. So "nothing" must be a relative notion here...

The real engine spends a small part of its time generating and making/unmaking moves. It also does a real tree search, with extensions, reductions, evaluations, etc. perft has _none_ of that. So the perft numbers have zero to do with how the overall engine performs. Particularly when the perft algorithm is further hacked up to be as fast as possible, without regard to how the engine itself does the same tasks...

There are no cutoffs. No evaluations. no extensions. No hashing. No repetitions. No 50-move rule and other drawing conditions. My move generation and make/unmake generally take less than 15% of the total search time when profiled... So the speed of perft doesn't even relate to the speed of the chess engine.
None of the tasks you mention takes significant time. Except the hashing, but as you must have figured by now we already did this in the perft we were comparing with. And, knowing how much you value purity of the terms used, most of them really don't have anything to do with alpha-beta.

So extensions take no time? My evaluation takes more than 50% of the total search time as measured by profile runs. That is "no time"?

Please get serious. The only reason perft was even timed in the first place was to let us quantify the differences in various approaches to generate moves and represent the board. Rotated bitmaps, for example, have extra make/unmake overhead that a non-rotated bitboard program doesn't have. But optimizing perft itself is pointless. It has no use in and of itself.

bob · Post by **bob** » Sun Oct 14, 2007 10:23 pm

smrf wrote:
bob wrote:
smrf wrote:I agree with H.G.. If Perft (Performance Test) is done to validate a move generator's true performance, it should be done under realistic circumstances. That is to make use not only of the existence of a move but of all related necessary properties, e.g. if it would be a check threat or castling etc. - thus to prove that ability, it seems to be necessary to also provide statistics about that kind of information. SMIRF has done such a thing as an example, avoiding best move count results by intention, but providing a lot of move related information.

Reinhard.
I don't see how HG said what you are supposedly agreeing with. Many programs hash the move counts. We don't do that in a _real_ search. I make moves and unmake them in the last ply of search, unless I don't generate any moves in the first place and just return the static eval.

None of that makes any sense to me in the context of a "real search"...
This is strange to me. Even when a (that way) fast move generator has only a very small influence on an engine's final speed, it is a signifcant indicator for the usability of the move's themselves including their bundled information and thus also for the quality of the underlaying data structure. This is even more important if the tested move generator has been developed for a more abstract target, that is to generate moves for different geometries and piece sets like done in ChessV or SMIRF, where e.g. 8x8 and 10x8 Chess variants ar supported by unique move generators.

Reinhard.

How does it say anything at all other than about the speed of the generation and make/unmake? The quality of the other stuff produced by the generator has to be discovered elsewhere, say in the evaluation where incremental updating of some values might save time if done in Make/Unmake rather than re-computed over and over. But for perft, what "qualities" could possibly be important? and, indeed, some even bypass updating the hash signature and such to make perft even faster. That certainly won't make the program any better.

smrf · Post by **smrf** » Sun Oct 14, 2007 11:04 pm

bob wrote:
smrf wrote:
bob wrote:
smrf wrote:I agree with H.G.. If Perft (Performance Test) is done to validate a move generator's true performance, it should be done under realistic circumstances. That is to make use not only of the existence of a move but of all related necessary properties, e.g. if it would be a check threat or castling etc. - thus to prove that ability, it seems to be necessary to also provide statistics about that kind of information. SMIRF has done such a thing as an example, avoiding best move count results by intention, but providing a lot of move related information.

Reinhard.
I don't see how HG said what you are supposedly agreeing with. Many programs hash the move counts. We don't do that in a _real_ search. I make moves and unmake them in the last ply of search, unless I don't generate any moves in the first place and just return the static eval.

None of that makes any sense to me in the context of a "real search"...
This is strange to me. Even when a (that way) fast move generator has only a very small influence on an engine's final speed, it is a signifcant indicator for the usability of the move's themselves including their bundled information and thus also for the quality of the underlaying data structure. This is even more important if the tested move generator has been developed for a more abstract target, that is to generate moves for different geometries and piece sets like done in ChessV or SMIRF, where e.g. 8x8 and 10x8 Chess variants ar supported by unique move generators.

Reinhard.
How does it say anything at all other than about the speed of the generation and make/unmake? The quality of the other stuff produced by the generator has to be discovered elsewhere, say in the evaluation where incremental updating of some values might save time if done in Make/Unmake rather than re-computed over and over. But for perft, what "qualities" could possibly be important? and, indeed, some even bypass updating the hash signature and such to make perft even faster. That certainly won't make the program any better.

Using cache for making Perft faster is misleading. But calculating Perft on cached move generation makes sense for to compare the results with values generated in uncached mode to verify the correctness of caching.

To name those "mysterious" qualities of moves: knowing, whether a move is: capture / e.p. / check / mate / castling / promotion. All that knowledge could simplify follow up procedures making use of generated moves.

Reinhard.

nczempin · Post by **nczempin** » Sun Oct 14, 2007 11:09 pm

bob wrote: The only reason perft was even timed in the first place was to let us quantify the differences in various approaches to generate moves and represent the board. Rotated bitmaps, for example, have extra make/unmake overhead that a non-rotated bitboard program doesn't have. But optimizing perft itself is pointless. It has no use in and of itself.

But if the timing was incidental, why was it called a performance test? Wouldn't it have been more accurate to call it "poscount" or something like that. Sorry to dwell on it, it really is not that important, I'm just a curious guy.

hgm · Post by **hgm** » Sun Oct 14, 2007 11:20 pm

bob wrote:Then explain it in a way that is theoretically sound. Moving a piece that is absolutely pinned is rare. Because absolute pins themselves are rare. So which is more efficient, to discover later that the piece was pinned and spend a little extra time doing this, or spending a little extra time on _every_ move to determine the same thing?

Make the common case fast...

It seems you did still not read what I wrote...

I spend no time on _any_ move determining anything. So obviously what I am doing is more efficient then what you are doing, as you sometimes generate a move with a pinned pieces, and have to discover it later. It might be a rare event, but the time generating that move was non-zero, and something is more than nothing. Plus that doing the illegal move to find out that your King is captured in the daughter node is an outrageously expensive way to discover it, so that the product of the frequency of the effect and the cost to correct it when it occurs is not nearly so insignificant as the rarity of the event would suggest.

In my move generator moves with pinned pieces are simply not generated, which saves time. And because of that the moves that are generated do not have to be tested for being pinned on the King, which saves even more time. And finally I never make moves with pieces that are pinned on the King, which saves enorously more time compared to the earlier two savings.

Are you reading what people are doing? they are _not_ doing what they normally do in their engines. Not updating hash signature in Make/Unmake. Hashing move counts which is not used anywhere in the tree search. Etc. So how is _that_ reflective of the actual gen/make/unmake overhead? Why not just measure it as it really is?

If they are hashing the move counts, they obviously must be updating the hash key. A normal search stores the score in the hash table, and the perft scores the move count in stead. Not storing anything would be a gross deviation from what you do in a real engine.

You only have to look at the speed differences to figure out they are _not_ testing it in the same way...

Which is fine, if you are aware how everyone is testing, and only compare the tests that are done the same way (e.g. all numbers with hashing and not making the final ply). Then they reflect the true speed differences.

So extensions take no time? My evaluation takes more than 50% of the total search time as measured by profile runs. That is "no time"?

If that is the time it takes you to do a material-only + PST evaluation, it must be very poorly coded...

Extensions have virtually no effect on the nps. Which is what we are talking about here. So indeed they take no time.

Please get serious. The only reason perft was even timed in the first place was to let us quantify the differences in various approaches to generate moves and represent the board. Rotated bitmaps, for example, have extra make/unmake overhead that a non-rotated bitboard program doesn't have. But optimizing perft itself is pointless. It has no use in and of itself.

This is exactly the reason why it should be timed with the proper ratio of make/unmake vs move generation, or you would get a totally wrong impression how the different methods would compare in the engine, spedwise.

Many people would disagree with your last statement. They badly need the perft(N) numbers of various test positions, and they couldn't care less how they are generated, and if the program generating them has any resemblance or relation to a Chess engine, or wat that relation is. They want the numbers, and they want them fast.

bob · Post by **bob** » Mon Oct 15, 2007 1:15 am

nczempin wrote:
bob wrote: The only reason perft was even timed in the first place was to let us quantify the differences in various approaches to generate moves and represent the board. Rotated bitmaps, for example, have extra make/unmake overhead that a non-rotated bitboard program doesn't have. But optimizing perft itself is pointless. It has no use in and of itself.
But if the timing was incidental, why was it called a performance test? Wouldn't it have been more accurate to call it "poscount" or something like that. Sorry to dwell on it, it really is not that important, I'm just a curious guy.

Initially it had no name. I don't remember the command name in Cray Blitz although I could probably find it. But it wasn't "perft". For lack of a better name, that was the command I used and it stuck. Not that it was a good one, but it has been a consistent one. Eventually some of used it as a performance measure, to compare the speed of make/unmake/generate using different data structures and different algorithms (there are at least a dozen ways of generating moves with bitboards, for example). The assumption that went into the original comparisons was that evaluation and search were independent of the code that generates moves and updates the board. So if algorithm X could make/unmake/generate faster than another approach, it was a better solution...

bob · Post by **bob** » Mon Oct 15, 2007 1:27 am

hgm wrote:
bob wrote:Then explain it in a way that is theoretically sound. Moving a piece that is absolutely pinned is rare. Because absolute pins themselves are rare. So which is more efficient, to discover later that the piece was pinned and spend a little extra time doing this, or spending a little extra time on _every_ move to determine the same thing?

Make the common case fast...
It seems you did still not read what I wrote...

I spend no time on _any_ move determining anything. So obviously what I am doing is more efficient then what you are doing, as you sometimes generate a move with a pinned pieces, and have to discover it later. It might be a rare event, but the time generating that move was non-zero, and something is more than nothing. Plus that doing the illegal move to find out that your King is captured in the daughter node is an outrageously expensive way to discover it, so that the product of the frequency of the effect and the cost to correct it when it occurs is not nearly so insignificant as the rarity of the event would suggest.

So you can avoid generating moves of absolute-pinned pieces, with _zero_ cost?

Then why doesn't your purely-legal move generator generate moves at an infinite speed since there is no cost in making the test at move generation time?

If your make/unmake is a small fraction of total cost, then it is not "outrageously expensive" since it is a _very_ rare occurrence. On the other hand, avoiding generating them in the first place means a conditional test is done for every move generated...

In my move generator moves with pinned pieces are simply not generated, which saves time. And because of that the moves that are generated do not have to be tested for being pinned on the King, which saves even more time. And finally I never make moves with pieces that are pinned on the King, which saves enorously more time compared to the earlier two savings.

You keep saying (a) pinned piece moves are simply not generated and (b) there is no overhead in doing that. Doesn't compute to me, never will. But there is no point in arguing about it. Nothing is free, so let's just leave it at that..

Are you reading what people are doing? they are _not_ doing what they normally do in their engines. Not updating hash signature in Make/Unmake. Hashing move counts which is not used anywhere in the tree search. Etc. So how is _that_ reflective of the actual gen/make/unmake overhead? Why not just measure it as it really is?
If they are hashing the move counts, they obviously must be updating the hash key. A normal search stores the score in the hash table, and the perft scores the move count in stead. Not storing anything would be a gross deviation from what you do in a real engine.

I agree. Also not making moves at the last ply makes zero sense. And using the hash counts to avoid generating moves in other parts of the tree makes no sense. The idea was to compare total nodes, and total time, to see how fast you can generate/make/unmake moves. Hashing completely invalidates that comparison because now you are not generating/making/unmaking all the moves, yet claiming with the perft node count that those nodes were actually done.

that's what is wrong with this. There are _many_ ways to make perft run faster, if that is the goal. But it isn't, because perft by itself is completely useless. There is no game where the fastest perft wins.

You only have to look at the speed differences to figure out they are _not_ testing it in the same way...
Which is fine, if you are aware how everyone is testing, and only compare the tests that are done the same way (e.g. all numbers with hashing and not making the final ply). Then they reflect the true speed differences.

So extensions take no time? My evaluation takes more than 50% of the total search time as measured by profile runs. That is "no time"?
If that is the time it takes you to do a material-only + PST evaluation, it must be very poorly coded...

If that is all that is in your evaluation, it is poorly _designed_. My evaluation has a whole lot more than just material and PST data.

Extensions have virtually no effect on the nps. Which is what we are talking about here. So indeed they take no time.

They have a lot to do with the tree that is searched, which _is_ what we are talking about. And extensions _do_ affect the NPS. I can certainly see a change when I search simple positions vs positions with lots of checks...

Please get serious. The only reason perft was even timed in the first place was to let us quantify the differences in various approaches to generate moves and represent the board. Rotated bitmaps, for example, have extra make/unmake overhead that a non-rotated bitboard program doesn't have. But optimizing perft itself is pointless. It has no use in and of itself.
This is exactly the reason why it should be timed with the proper ratio of make/unmake vs move generation, or you would get a totally wrong impression how the different methods would compare in the engine, spedwise.

Many people would disagree with your last statement. They badly need the perft(N) numbers of various test positions, and they couldn't care less how they are generated, and if the program generating them has any resemblance or relation to a Chess engine, or wat that relation is. They want the numbers, and they want them fast.

So the idea becomes important to design the fastest perft possible to provide perft numbers for _others_ so that they can compare to those numbers? That sounds highly logical. Most people use perft for a short while and then never again, once they get their move generator debugged. The idea of spending a lot of time optimizing something that I will never use myself again leaves me blinking...

bob · Post by **bob** » Mon Oct 15, 2007 1:31 am

smrf wrote:
bob wrote:
smrf wrote:
bob wrote:
smrf wrote:I agree with H.G.. If Perft (Performance Test) is done to validate a move generator's true performance, it should be done under realistic circumstances. That is to make use not only of the existence of a move but of all related necessary properties, e.g. if it would be a check threat or castling etc. - thus to prove that ability, it seems to be necessary to also provide statistics about that kind of information. SMIRF has done such a thing as an example, avoiding best move count results by intention, but providing a lot of move related information.

Reinhard.
I don't see how HG said what you are supposedly agreeing with. Many programs hash the move counts. We don't do that in a _real_ search. I make moves and unmake them in the last ply of search, unless I don't generate any moves in the first place and just return the static eval.

None of that makes any sense to me in the context of a "real search"...
This is strange to me. Even when a (that way) fast move generator has only a very small influence on an engine's final speed, it is a signifcant indicator for the usability of the move's themselves including their bundled information and thus also for the quality of the underlaying data structure. This is even more important if the tested move generator has been developed for a more abstract target, that is to generate moves for different geometries and piece sets like done in ChessV or SMIRF, where e.g. 8x8 and 10x8 Chess variants ar supported by unique move generators.

Reinhard.
How does it say anything at all other than about the speed of the generation and make/unmake? The quality of the other stuff produced by the generator has to be discovered elsewhere, say in the evaluation where incremental updating of some values might save time if done in Make/Unmake rather than re-computed over and over. But for perft, what "qualities" could possibly be important? and, indeed, some even bypass updating the hash signature and such to make perft even faster. That certainly won't make the program any better.
Using cache for making Perft faster is misleading. But calculating Perft on cached move generation makes sense for to compare the results with values generated in uncached mode to verify the correctness of caching.

To name those "mysterious" qualities of moves: knowing, whether a move is: capture / e.p. / check / mate / castling / promotion. All that knowledge could simplify follow up procedures making use of generated moves.

Reinhard.

I don't know about mates in my move generator and don't want to. captures, ep captures, promotions, etc are all a part of a standard move generator output I would think. And then make/unmake might update information such as which files are open or now closed, or which pawns are passed, etc (we did all of that in Cray Blitz, I have never taken the time to add it to Crafty).

But who cares if the "cachine" is correct? It's _only_ purpose is to make perft faster. Once you have perft working, so that you can test your move generator and then move on, why bother going back to make it faster and faster. For me, for every bit of effort expended, I expect a payoff. I don't always get it of course, but at least the potential is always there. Speeding up perft beyond working correctly has zero payoff... no matter how the improvement works.

Jacob · Post by **Jacob** » Mon Oct 15, 2007 4:50 am

So you can avoid generating moves of absolute-pinned pieces, with _zero_ cost? Smile

At the end of MakeMove(), my engine (magic bitboards) examines slider rays to the king to determine distance check, pins, and (I could add) potential discovery checks. Knight/pawn checks are handled in their respective MakeMove() switch blocks. Using the pin and check information, I generate only legal moves and don't have to waste a make/unmake to test their legality later. The overhead required to maintain pin data and generate pinned-piece moves almost cancels with the time no longer wasted on failed makes (in my tests). But I get the extra information, most notably checks in leaf nodes.

However, it adds some complexity to the move generation code. H.G., I think we have a problem

Code: Select all

5q1q/4PPP1/5K2/8/8/5k2/4ppp1/5Q1Q w - - 0 1

(bitboard perft)
...
 4.         771124    0.03s    1.00 mnps
 5.       22802123    0.94s   24.31 mnps

(Quick Perft)
...
perft(4)=792239 ( 0.093 sec)
perft(5)=23866992 ( 2.282 sec)

(Hamsters)
...
4 = 796217
5 = 23973750

(Crafty)
...
total moves=796217  time=0.14
total moves=23973750  time=4.58

I get similar profile results for move generation in Etude, ~15%. There isn't much point in making my perft faster. But I find it entertaining. ... and I do need to fix this apparent bug,...

Alex Brunetti · Post by **Alex Brunetti** » Mon Oct 15, 2007 5:09 am

Jacob wrote:
Code: Select all
5q1q/4PPP1/5K2/8/8/5k2/4ppp1/5Q1Q w - - 0 1

(bitboard perft)
...
 4.         771124    0.03s    1.00 mnps
 5.       22802123    0.94s   24.31 mnps
...and I do need to fix this apparent bug,...

You can do that in a very fast time by implementing a divide function and a perftsuite function. I debugged my move generator in a few minutes that way. In my engine full package you can find a nice perftsuite file with useful positions (and the engine has a divide function too.)

Alex

What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?