What is the best known speed of move generation?

nczempin · Post by **nczempin** » Mon Oct 15, 2007 7:13 am

Jacob wrote: There isn't much point in making my perft faster. But I find it entertaining. ... and I do need to fix this apparent bug,...

I think it is a perfectly legitimate reason to be able to say "my perft is bigger than your perft"

hgm · Post by **hgm** » Mon Oct 15, 2007 1:02 pm

bob wrote:So you can avoid generating moves of absolute-pinned pieces, with _zero_ cost?

practically so, yes. The absolutely pinned pieces are not in the move list during move generation.

I know which pieces are pinned as an almost free side-effect of the in-check test. For every enemy slider aligned with the King I have to scan the board to see if there are obstacles in between King and slider. If in that process (starting from the King) the first thing I encounter is not the enemy slider, but an own piece, that piece is almost always pinned.

Moving absolutely pinned pieces is _only_ rare because the alignment is rare. But if the alignment is already an established fact (and I am not in check), an own piece blocking the check is _almost always_ pinned. So not taking action at that point will lead almost certainly to much more expensive action later, even just generating moves with that piece is comparatively expensive. If no alignment is detected, the pin test (and the entire board-ray scan) is never performed.

Now you might argue that testing for in-check by going through the enemy slider list is needlessly expensive, and that it is better to make the test differentially, based on the previous move. Apart from the fact that we have entered the realm of totally negligible effort, even this is not true. To derive in-check information from the previous move you would have to test both alignment with the ToSquare, and alignment _over_ the FromSquare. The latter test is more expensive, as it requires a board scan even in cases when no slider was lurking behind the From square, and it is in general cheaper to scan through the slider list to test the sliders for alignment, than it is to step over the board and test the squares for the presence of sliders. Especially if the board population thins: the slider list might be empty, and the board ray would continue to the edge. And of course you would always have to test if the last move was an e.p. capture, because then the capture square might reveal checks as well.

All this has been metaculously designed, timed, and tested. This is why the speed of qperft is so hard to beat.

Then why doesn't your purely-legal move generator generate moves at an infinite speed since there is no cost in making the test at move generation time?

Babble... Testing and generation are independent tasks.

If your make/unmake is a small fraction of total cost, then it is not "outrageously expensive" since it is a _very_ rare occurrence. On the other hand, avoiding generating them in the first place means a conditional test is done for every move generated...

You have to do better than that. Normal Making / Unmaking will not at all tell you if the move exposed you to check. You either would have to do some testing on it after the Make (which you would then have to do for _every_ move you make). Or you would just blindly stumble into the daughter node, starting move generation from an illegal position, and discovering it only after generating a King capture. And only if you would actually test _every_ generated move for King capture.

Doesn't sound competative at all...

You keep saying (a) pinned piece moves are simply not generated and (b) there is no overhead in doing that. Doesn't compute to me, never will. But there is no point in arguing about it. Nothing is free, so let's just leave it at that..

Not true. Some things incur negative 'cost', if your reference is highly suboptimal.

If we define move generation as the task of filling the move stack (or list) to hand to the move sorter, based on a description of the current position, then testing for a pin each time the in-check detection finds an alignment costs on the average less time than you save on not putting the moves with the pinned pieces on the move stack. So if the reference is a move generator that tests for in-check does pseudo-legal generation if not in check, implementing the pin test takes negative time. It is the pseudo-legal move generation that wastes time.

If the reference is a move generator without check awareness (as uMax uses, for simplicity), then adding the check awarenes of course represent an even much larger negative cost. Not to mention the enormous cost it saves later, outside the move generator: it is already bad enough having to test each move for not exposing the King, but it would be far worse to have to test in addition if there aren't any pre-existing checks that are not solved by it...

I agree. Also not making moves at the last ply makes zero sense. And using the hash counts to avoid generating moves in other parts of the tree makes no sense.

It is not reall clear what you agree to, then. But it is interesting to know that all those things your engine does actually make no sense...

The idea was to compare total nodes, and total time, to see how fast you can generate/make/unmake moves.

And if you don't generate and make/unmake in a realistic ratio, you will get the wrong result.

Hashing completely invalidates that comparison because now you are not generating/making/unmaking all the moves, yet claiming with the perft node count that those nodes were actually done.

Bullshit. No one is claiming that. The oonly thing that is claimed is that a certain number of move paths exist, so that people can test their own engines to see if they actually find that number of move paths. And the can test if they find it without hashing, but they also can test to see if they find it with hashing, and how fast, to test if their hashing works.

that's what is wrong with this. There are _many_ ways to make perft run faster, if that is the goal. But it isn't, because perft by itself is completely useless. There is no game where the fastest perft wins.

There is also no game where the best score on a gauntlet of Silver matches wins... What kind of narrow-mindedness is that???

If that is the time it takes you to do a material-only + PST evaluation, it must be very poorly coded...
If that is all that is in your evaluation, it is poorly _designed_. My evaluation has a whole lot more than just material and PST data.

hgm wrote:
Aleks Peshkov wrote:What is average NPS drop ratio when converting average Perft engine to Alpha-Beta?
Almost none (with material + piece-square table differential eval only).

So contesting other peoples statements for you is not based on what they actually claim about the thing they are talking about, but on if those statements would also hold for what you do in stead? Are you never concerned that this makes you look like an idiot?

Extensions have virtually no effect on the nps. Which is what we are talking about here. So indeed they take no time.
They have a lot to do with the tree that is searched, which _is_ what we are talking about.

NO! That is what _you_ are talking about, but completely off topic to everyone else.

And extensions _do_ affect the NPS. I can certainly see a change when I search simple positions vs positions with lots of checks...

Well, it would be interesting if you could point out why that is the case. As a first assumption I would say that this is merely because in-check positions have fewer legal moves, and not directly an effect of the extensions.

So the idea becomes important to design the fastest perft possible to provide perft numbers for _others_ so that they can compare to those numbers?

Exactly. For others, and for myself when I am building a new engine from scratch. But I like to share stuff that might also be useful to others, and this is why I put it on my web-site.

That sounds highly logical. Most people use perft for a short while and then never again, once they get their move generator debugged. The idea of spending a lot of time optimizing something that I will never use myself again leaves me blinking...

Well, different people have different hobbies. And doing something that might help others is never illogical to me. My hobby happens to be to optimize things to perfection. Who are you to say that this is a less worthy pass-time than trying to play better moves in a silly game with wooden dolls?

Aleks Peshkov · Post by **Aleks Peshkov** » Mon Oct 15, 2007 2:23 pm

hgm wrote:
bob wrote:So you can avoid generating moves of absolute-pinned pieces, with _zero_ cost?
practically so, yes. The absolutely pinned pieces are not in the move list during move generation.

I know which pieces are pinned as an almost free side-effect of the in-check test. For every enemy slider aligned with the King I have to scan the board to see if there are obstacles in between King and slider. If in that process (starting from the King) the first thing I encounter is not the enemy slider, but an own piece, that piece is almost always pinned.

Scanning for obstacles in between King is similar to ordinal in-check test. It may be faster or slower depending of board representation. In BitBoard engine direct in-check function for all opponent checkers together is as cheap as a test for single potential pinner.

grant · Post by **grant** » Mon Oct 15, 2007 4:56 pm

Hi

Why spend time optimizing perft if you cannot use it in your program? Well, my perft runs at 18.2million nps, make/unmake all moves, no hashing, check testing every move, optimized to the best of my ability, and I thoughly enjoyed the experience which on it's own is a good enough reason for me.

But by splitting the real_move_generator into three fully optimized functions - 1) generate_moves 2) update_hash 3) update_piece_counts etc., I am able to use my perft in my program. This increased my real_move_generator speed by 25% so it was not all a waste of time. Also the functions can be used individually e.g. when installing the pv into the hash table I just use the update_hash function.

Grant

bob · Post by **bob** » Mon Oct 15, 2007 6:58 pm

hgm wrote:
bob wrote:So you can avoid generating moves of absolute-pinned pieces, with _zero_ cost?
practically so, yes. The absolutely pinned pieces are not in the move list during move generation.

I know which pieces are pinned as an almost free side-effect of the in-check test.

OK, I think it is time for me to exit stage right. This discussion has taken more twists and turns than the "twisty passages" in the old adventure game. _I_ don't have any "in check test" in my move generation, unless I am in check before I start. My move generator simply enumerates moves. There is no way to get "pinned piece" information at zero cost. If you are doing something _else_ most of us don't do, that's fine. But the pinned piece cost has to be factored into that extra overhead.

If you think it better to eliminate such moves up front, where they are rare cases, rather than after discovery where they cost very little and the cost is incurred infrequently, that's OK.

But I don't want to continually be involved in discussions where the topic shifts around all over. If we are going to compare movgen costs, we have to compare _exactly_ that. And anything you use to eliminate moves has to be part of that cost or the comparison is meaningless. My Make/Unmake is running at 50M cycles per second. That includes all the overhead to reach the next ply, and then return back to the current point when I discover the move was illegal due to an absolute pin. If yours is more efficient, particularly in normal positions where absolute pins are infrequent, then good. But I find it difficult to discuss when you keep introducing new angles (it is free because my in-check test does most of the work) when "in-check" was not even part of the discussion.

For every enemy slider aligned with the King I have to scan the board to see if there are obstacles in between King and slider. If in that process (starting from the King) the first thing I encounter is not the enemy slide

"almost" isn't good enough here..

r, but an own piece, that piece is almost always pinned.[/q

Moving absolutely pinned pieces is _only_ rare because the alignment is rare. But if the alignment is already an established fact (and I am not in check), an own piece blocking the check is _almost always_ pinned. So not taking action at that point will lead almost certainly to much more expensive action later, even just generating moves with that piece is comparatively expensive. If no alignment is detected, the pin test (and the entire board-ray scan) is never performed.

Apparently we disagree about the "much more expensive" point above.

Now you might argue that testing for in-check by going through the enemy slider list is needlessly expensive, and that it is better to make the test differentially, based on the previous move. Apart from the fact that we have entered the realm of totally negligible effort, even this is not true. To derive in-check information from the previous move you would have to test both alignment with the ToSquare, and alignment _over_ the FromSquare. The latter test is more expensive, as it requires a board scan even in cases when no slider was lurking behind the From square, and it is in general cheaper to scan through the slider list to test the sliders for alignment, than it is to step over the board and test the squares for the presence of sliders. Especially if the board population thins: the slider list might be empty, and the board ray would continue to the edge. And of course you would always have to test if the last move was an e.p. capture, because then the capture square might reveal checks as well.

the speed of qperft is so hard to beat.

My point is, "who cares"? the perft test has no impact on how the program plays chess at all. Even if you make it run to depth 12 in 0 time, it won't win one more game for you...

Then why doesn't your purely-legal move generator generate moves at an infinite speed since there is no cost in making the test at move generation time?

Babble... Testing and generation are independent tasks.[/quote]

More babble. I asked a simple question that apparently can not be answered??? Anything you do to generate moves is a part of generating moves...

If your make/unmake is a small fraction of total cost, then it is not "outrageously expensive" since it is a _very_ rare occurrence. On the other hand, avoiding generating them in the first place means a conditional test is done for every move generated...
You have to do better than that. Normal Making / Unmaking will not at all tell you if the move exposed you to check. You either would have to do some testing on it after the Make (which you would then have to do for _every_ move you make). Or you would just blindly stumble into the daughter node, starting move generation from an illegal position, and discovering it only after generating a King capture. And only if you would actually test _every_ generated move for King capture.

The last part is wrong. When generating captures, if one just notices (after sorting based on SEE) that the most valuable piece is a king, that search won't be necessary. If the event is very rare, and it is, then the overhead is small. Again I will note that if I start off in check, most moves are illegal so I have a specific move generator that only generates legal moves in that particular case. And unfortunately it only makes my code less than 1% faster, showing just how low the overhead for making/unmaking an illegal move actually is.

So "stumbling into the next ply" is not that expensive, depending on how you design the code...

Doesn't sound competative at all...

Can we run some comparative tests and look at the tree sizes, time to solution, and NPS? That should point out which way is fastest... then you can _really_ determine what is competitive or not, rather than guessing.

You keep saying (a) pinned piece moves are simply not generated and (b) there is no overhead in doing that. Doesn't compute to me, never will. But there is no point in arguing about it. Nothing is free, so let's just leave it at that..
Not true. Some things incur negative 'cost', if your reference is highly suboptimal.

OK, to play that same game, sometimes _anything_ is an improvement if the original design is so poor...

If we define move generation as the task of filling the move stack (or list) to hand to the move sorter, based on a description of the current position, then testing for a pin each time the in-check detection finds an alignment costs on the average less time than you save on not putting the moves with the pinned pieces on the move stack. So if the reference is a move generator that tests for in-check does pseudo-legal generation if not in check, implementing the pin test takes negative time. It is the pseudo-legal move generation that wastes time.

Of course, a few dozen years of computer chess _still_ sees most programs generating pseudo-legal moves. Ever wonder why that is???

If the reference is a move generator without check awareness (as uMax uses, for simplicity), then adding the check awarenes of course represent an even much larger negative cost. Not to mention the enormous cost it saves later, outside the move generator: it is already bad enough having to test each move for not exposing the King, but it would be far worse to have to test in addition if there aren't any pre-existing checks that are not solved by it...

You keep resorting to hyperbole that is fictitious. "enormous cost"... I don't have any "enormous cost". If I turn off my legal-move generator completely, it saves me less than 1% of the total search time except for specific tactical circumstances where there are _many_ checks. <1% is hardly "enormous" by any measure I know of...

I agree. Also not making moves at the last ply makes zero sense. And using the hash counts to avoid generating moves in other parts of the tree makes no sense.
It is not reall clear what you agree to, then. But it is interesting to know that all those things your engine does actually make no sense...

A little re-reading will clearly show what I "agreed with". One only has to comprehend, rather than be thinking about what he is going to write next, to get the point... it wasn't disguised.. nor unclear.

The idea was to compare total nodes, and total time, to see how fast you can generate/make/unmake moves.
And if you don't generate and make/unmake in a realistic ratio, you will get the wrong result.

That is simply nonsense. There is no "realistic ratio". For each position you have to generate moves and then make/unmake each one. That's what the test was originally defined as doing...

Hashing completely invalidates that comparison because now you are not generating/making/unmaking all the moves, yet claiming with the perft node count that those nodes were actually done.
Bullshit. No one is claiming that. The oonly thing that is claimed is that a certain number of move paths exist, so that people can test their own engines to see if they actually find that number of move paths. And the can test if they find it without hashing, but they also can test to see if they find it with hashing, and how fast, to test if their hashing works.

So if someone says "my perft 7" runs in 5 seconds, while yours runs in 30 seconds..." they are not saying that their generate/make/unmake is 7x faster? That is _really_ "bullshit" to use your terminology. The implication is clear, because perft has been clearly defined for 20 years...

that's what is wrong with this. There are _many_ ways to make perft run faster, if that is the goal. But it isn't, because perft by itself is completely useless. There is no game where the fastest perft wins.
There is also no game where the best score on a gauntlet of Silver matches wins... What kind of narrow-mindedness is that???

Sorry. If you want to resort to ignorance, go ahead. If you feel that doing better on the silver or nunn positions does not help your program play better chess, then hang on to that belief. I _know_ it is a false one. One of these days I will take a 100,000 game match between crafty and the 5 opponents I usually use and post the results, before/after silver tuning. I originally played 100K games with all programs using their own books. I will run that again one of these days. Then you can see just how stupid that comment actually is...

If that is the time it takes you to do a material-only + PST evaluation, it must be very poorly coded...
If that is all that is in your evaluation, it is poorly _designed_. My evaluation has a whole lot more than just material and PST data.

hgm wrote:
Aleks Peshkov wrote:What is average NPS drop ratio when converting average Perft engine to Alpha-Beta?
Almost none (with material + piece-square table differential eval only).

Again, "bullshit" comes to mind. "swap" (SEE) takes 13% of my total search time, exactly the same as make/unmake/movgen added together. Search() takes 10.5% of the time. I am now up to 25% after you add in HashProbe() which is almost 3%. So those pieces take 1/4 of my total search time and are nowhere to be found in the perft code. And my search will run as fast as the perft numbers?

Try doing a profile run every now and again to keep grounded in reality. There are other parts. My eval is about 50% total, so there is more stuff I haven't enumerated.

So contesting other peoples statements for you is not based on what they actually claim about the thing they are talking about, but on if those statements would also hold for what you do in stead? Are you never concerned that this makes you look like an idiot?

No, you seem to have that market cornered so I'll let that continue, thank you. I actually do produce real data, do profile runs, analyze what is going on, etc. I don't "guess". Something you would do well to try it seems.

Extensions have virtually no effect on the nps. Which is what we are talking about here. So indeed they take no time.
They have a lot to do with the tree that is searched, which _is_ what we are talking about.
NO! That is what _you_ are talking about, but completely off topic to everyone else.

So even though I had the first such "test" known to man, and clearly defined what it should do, _I_ am discussing it improperly?

And extensions _do_ affect the NPS. I can certainly see a change when I search simple positions vs positions with lots of checks...
Well, it would be interesting if you could point out why that is the case. As a first assumption I would say that this is merely because in-check positions have fewer legal moves, and not directly an effect of the extensions.
And that is surprising? That "extensions" make the "tactical" parts of the tree (where there are more checks) bigger, leading to more nodes with fewer moves leading out (which drops NPS since the cost of going to a new node is amortized over fewer outgoing branches? I see the same thing in endgames...

So the idea becomes important to design the fastest perft possible to provide perft numbers for _others_ so that they can compare to those numbers?
Exactly. For others, and for myself when I am building a new engine from scratch. But I like to share stuff that might also be useful to others, and this is why I put it on my web-site.
That sounds highly logical. Most people use perft for a short while and then never again, once they get their move generator debugged. The idea of spending a lot of time optimizing something that I will never use myself again leaves me blinking...
Well, different people have different hobbies. And doing something that might help others is never illogical to me. My hobby happens to be to optimize things to perfection. Who are you to say that this is a less worthy pass-time than trying to play better moves in a silly game with wooden dolls?

perhaps because the topic is "computer chess" from the get-go???

bob · Post by **bob** » Mon Oct 15, 2007 7:00 pm

grant wrote:Hi

Why spend time optimizing perft if you cannot use it in your program? Well, my perft runs at 18.2million nps, make/unmake all moves, no hashing, check testing every move, optimized to the best of my ability, and I thoughly enjoyed the experience which on it's own is a good enough reason for me.

But by splitting the real_move_generator into three fully optimized functions - 1) generate_moves 2) update_hash 3) update_piece_counts etc., I am able to use my perft in my program. This increased my real_move_generator speed by 25% so it was not all a waste of time. Also the functions can be used individually e.g. when installing the pv into the hash table I just use the update_hash function.

Grant

I have the same sort of design. But I designed that way in trying to make the search as fast as possible, not "perft" which was (for me) a pure debugging tool that I did not use for years, until I switched to magic move generation where I used it to validate the generator.

hgm · Post by **hgm** » Mon Oct 15, 2007 8:25 pm

bob wrote:OK, I think it is time for me to exit stage right. This discussion has taken more twists and turns than the "twisty passages" in the old adventure game. _I_ don't have any "in check test" in my move generation, unless I am in check before I start.

Very intersting. And how do you know that you are in check without a check test???

My move generator simply enumerates moves. There is no way to get "pinned piece" information at zero cost. If you are doing something _else_ most of us don't do, that's fine. But the pinned piece cost has to be factored into that extra overhead.

If you think it better to eliminate such moves up front, where they are rare cases, rather than after discovery where they cost very little and the cost is incurred infrequently, that's OK.

But I don't want to continually be involved in discussions where the topic shifts around all over. If we are going to compare movgen costs, we have to compare _exactly_ that. And anything you use to eliminate moves has to be part of that cost or the comparison is meaningless.

Of course. I always did that.

My Make/Unmake is running at 50M cycles per second. That includes all the overhead to reach the next ply, and then return back to the current point when I discover the move was illegal due to an absolute pin.

That still leaves the question: how do you discover that? That can't take zero time. So how much shorter would that 20 ns it takes you to make and unmake be if you would not have to do that test?

If yours is more efficient, particularly in normal positions where absolute pins are infrequent, then good. But I find it difficult to discuss when you keep introducing new angles (it is free because my in-check test does most of the work) when "in-check" was not even part of the discussion.

This was not the claim. It is free because it is free in the check test.

For every enemy slider aligned with the King I have to scan the board to see if there are obstacles in between King and slider. If in that process (starting from the King) the first thing I encounter is not the enemy slider, but an own piece, that piece is almost always pinned.
"almost" isn't good enough here..

Yes it is. The cost / gain balance is determined by how much you save times how often you save, vs how much it costs times how often it costs. So if it is "almost always", say 90% of the cases, a pinned piece, what I do to actually test that can by 9 times as expensive then what I save on generating illegal moves when the piece is pinned. But of course in practice it is more the other way around: the test is far cheaper than generating and making + testing the moves. So in fact it would already be good enough if the pin occurered "reasonably often", rather than "almost always"

Apparently we disagree about the "much more expensive" point above.

Well, this is a simple matter of counting instructions / clock cycles.

My point is, "who cares"? the perft test has no impact on how the program plays chess at all. Even if you make it run to depth 12 in 0 time, it won't win one more game for you...

As my engine uses the same move generator and make/unmake code as the perft, the relation is somewhat stronger than you imagine.

Then why doesn't your purely-legal move generator generate moves at an infinite speed since there is no cost in making the test at move generation time?
Babble... Testing and generation are independent tasks.
More babble. I asked a simple question that apparently can not be answered??? Anything you do to generate moves is a part of generating moves...

Do you really expect me to address you like an idiot? Very well then:

The move generator does not run at infinite speed because it takes a finite time to generate a move. Testing and generating is not the same. So if testing does not take any time, that does not imply that generating takes any time.

Capice?

The last part is wrong. When generating captures, if one just notices (after sorting based on SEE) that the most valuable piece is a king, that search won't be necessary.

Ah, so the test involves generating captures, which is done completely in vain if the move was illegal. At least, by not generating illegal moves I save that time.

If the event is very rare, and it is, then the overhead is small.

And indeed it is. About as rare as that I have to do the pin test because there a blocker is found on a potential check ray. Except that doint that test is an order of magnitude cheaper than generating captures (about the same as generating a single capture), and almost always results in the suppression of generation of many captures. And each capture would have triggered a complete capture generation for the opponent in your scheme.

That you can get away with sloppy and inefficient handling of rare cases is obvious and does not need discussion. But never claim that this is faster than more careful handling of the situation. There just isn't so much to be gained that you would care. Fine, but I like a tidy engine.

Again I will note that if I start off in check, most moves are illegal so I have a specific move generator that only generates legal moves in that particular case. And unfortunately it only makes my code less than 1% faster, showing just how low the overhead for making/unmaking an illegal move actually is.

Well, 1% is 1%, not to be despised.

So "stumbling into the next ply" is not that expensive, depending on how you design the code...

Only two orders more expensive than the pin test, which you started making a fuss about...

Can we run some comparative tests and look at the tree sizes, time to solution, and NPS? That should point out which way is fastest... then you can _really_ determine what is competitive or not, rather than guessing.

I would say it would only point out which engine has the better evaluation, extension scheme, move ordering. But I know a better solution, to eliminate all those variables: compare the perft speed!

OK, to play that same game, sometimes _anything_ is an improvement if the original design is so poor...

Exactly.

Of course, a few dozen years of computer chess _still_ sees most programs generating pseudo-legal moves. Ever wonder why that is???

Chess programmers are not very innovaive, and all copy each other?

You keep resorting to hyperbole that is fictitious. "enormous cost"... I don't have any "enormous cost". If I turn off my legal-move generator completely, it saves me less than 1% of the total search time except for specific tactical circumstances where there are _many_ checks. <1% is hardly "enormous" by any measure I know of...

Well, like I said, compared to the pin test. You seem to reverse role know, as originally it was you who started complaining of an alleged inefficiency of suppressing generation of moves with pinned pieces, which is a very small fraction of this time. (And then earned back as a multiple.)

That is simply nonsense. There is no "realistic ratio". For each position you have to generate moves and then make/unmake each one. That's what the test was originally defined as doing...

Not at all. I am sure you generate many moves that need not be made. Because they are bad captures, or futile captures.

So if someone says "my perft 7" runs in 5 seconds, while yours runs in 30 seconds..." they are not saying that their generate/make/unmake is 7x faster? That is _really_ "bullshit" to use your terminology. The implication is clear, because perft has been clearly defined for 20 years...

Apparently not.

Sorry. If you want to resort to ignorance, go ahead. If you feel that doing better on the silver or nunn positions does not help your program play better chess, then hang on to that belief. I _know_ it is a false one. One of these days I will take a 100,000 game match between crafty and the 5 opponents I usually use and post the results, before/after silver tuning. I originally played 100K games with all programs using their own books. I will run that again one of these days. Then you can see just how stupid that comment actually is...

I consider it in the same league as your remark of a faster move generation, make/unmake or hashing not making your engine do better...

Again, "bullshit" comes to mind. "swap" (SEE) takes 13% of my total search time, exactly the same as make/unmake/movgen added together. Search() takes 10.5% of the time. I am now up to 25% after you add in HashProbe() which is almost 3%. So those pieces take 1/4 of my total search time and are nowhere to be found in the perft code. And my search will run as fast as the perft numbers?

I was not saying it could never be made to run slower. I am sure you are doing a very good job on that, by including all kind of stuff. But it is all dispensible.

Try doing a profile run every now and again to keep grounded in reality. There are other parts. My eval is about 50% total, so there is more stuff I haven't enumerated.

Guess what? Routines that are not there never take any time in my profile!

So contesting other peoples statements for you is not based on what they actually claim about the thing they are talking about, but on if those statements would also hold for what you do in stead? Are you never concerned that this makes you look like an idiot?
No, you seem to have that market cornered so I'll let that continue, thank you. I actually do produce real data, do profile runs, analyze what is going on, etc. I don't "guess". Something you would do well to try it seems.

Well, people can read, (most of them, anyway), so I am not too concerned about remarks like that.

So even though I had the first such "test" known to man, and clearly defined what it should do, _I_ am discussing it improperly?

If we are discussing an engine without eval, and you keep babbling about the cost of eval, I would say: Yes!

And that is surprising? That "extensions" make the "tactical" parts of the tree (where there are more checks) bigger, leading to more nodes with fewer moves leading out (which drops NPS since the cost of going to a new node is amortized over fewer outgoing branches? I see the same thing in endgames...

Well, that confirms my guess then, doesn't it. It is just branching ratio, not extensions causing this effect.

perhaps because the topic is "computer chess" from the get-go???

Well, if you think discussions about perft have nothing to do with computer Chess, your opinion is noted...

bob · Post by **bob** » Mon Oct 15, 2007 9:52 pm

hgm wrote:
bob wrote:OK, I think it is time for me to exit stage right. This discussion has taken more twists and turns than the "twisty passages" in the old adventure game. _I_ don't have any "in check test" in my move generation, unless I am in check before I start.
Very intersting. And how do you know that you are in check without a check test???

Err, I do it somewhere _else_??? Determining "in check" is not very complicated in bitboards... I do, and always have, generates pseudo-legal moves. For many reasons:

(1) at every other ply, only one move needs to be searched. Why screen all moves for legality as they are generated if I am only going to search one or two? In fact, I avoid generating moves a significant part of the time because I can try the hash move prior to generating _anything_.

(2) most moves are legal. Testing for legality when generating them is a waste. And when you consider (1) above culls about 1/2 of them on average anyway, this is further justification for putting off those tests until I am certain they are needed, so that I avoid doing it most of the time.

(3) I certainly don't care about "in check" in the q-search as that is a simple part of the search only following captures. Most captures are not illegal, so again, testing them for legality wastes time over discovering this later, at least in my code. This gives me the _same_ move generator for normal search and quiescence search, which is cache-friendlier.

My move generator simply enumerates moves. There is no way to get "pinned piece" information at zero cost. If you are doing something _else_ most of us don't do, that's fine. But the pinned piece cost has to be factored into that extra overhead.

If you think it better to eliminate such moves up front, where they are rare cases, rather than after discovery where they cost very little and the cost is incurred infrequently, that's OK.

But I don't want to continually be involved in discussions where the topic shifts around all over. If we are going to compare movgen costs, we have to compare _exactly_ that. And anything you use to eliminate moves has to be part of that cost or the comparison is meaningless.
Of course. I always did that.
My Make/Unmake is running at 50M cycles per second. That includes all the overhead to reach the next ply, and then return back to the current point when I discover the move was illegal due to an absolute pin.
That still leaves the question: how do you discover that? That can't take zero time. So how much shorter would that 20 ns it takes you to make and unmake be if you would not have to do that test?

Here's a hint. Not one single design feature in Crafty was done "just for the hell of it"... _EVERYTHING_ I (and now we) do in Crafty is thoroughly tested, both for performance in terms of speed, and then for performance in terms of playing against other opponents. I used to not do in_check at all, letting capturing the king at the next ply indicate that the previous move was illegal. But when I added null-move, making such when you are in check is bad and has to be avoided. But I at least follow the old rule "never compute today what you can put off until tomorrow, because tomorrow might never come." That is a golden rule when dealing with alpha/beta and the cutoffs it produces.

If yours is more efficient, particularly in normal positions where absolute pins are infrequent, then good. But I find it difficult to discuss when you keep introducing new angles (it is free because my in-check test does most of the work) when "in-check" was not even part of the discussion.
This was not the claim. It is free because it is free in the check test.

Then last year I got myself a free truck. Didn't have to pay a penny the day I drove it off the lot. Of course I had to start paying some later on. So the definition of "free" you are using is bad. Free means _zero cost ever_. Not "zero cost now but I either did some work earlier or will do some more later."

For every enemy slider aligned with the King I have to scan the board to see if there are obstacles in between King and slider. If in that process (starting from the King) the first thing I encounter is not the enemy slider, but an own piece, that piece is almost always pinned.
"almost" isn't good enough here..
Yes it is. The cost / gain balance is determined by how much you save times how often you save, vs how much it costs times how often it costs. So if it is "almost always", say 90% of the cases, a pinned piece, what I do to actually test that can by 9 times as expensive then what I save on generating illegal moves when the piece is pinned. But of course in practice it is more the other way around: the test is far cheaper than generating and making + testing the moves. So in fact it would already be good enough if the pin occurered "reasonably often", rather than "almost always"
Apparently we disagree about the "much more expensive" point above.
Well, this is a simple matter of counting instructions / clock cycles.
I do that all the time. But I do it with profile runs rather than looking at raw ASM code, but then that is just as accurate either way. My overhead for advancing to the next ply is _really_ minimal. I don't do copy/make for example.

My point is, "who cares"? the perft test has no impact on how the program plays chess at all. Even if you make it run to depth 12 in 0 time, it won't win one more game for you...
As my engine uses the same move generator and make/unmake code as the perft, the relation is somewhat stronger than you imagine.

Unless you do like _some_ and change them for perft. That is the case I was arguing against. Including the idea of hashing perft results from one node and using that if you reach the same position again rather than walking that part of the tree. That can even hide bugs...

Then why doesn't your purely-legal move generator generate moves at an infinite speed since there is no cost in making the test at move generation time?
Babble... Testing and generation are independent tasks.
More babble. I asked a simple question that apparently can not be answered??? Anything you do to generate moves is a part of generating moves...
Do you really expect me to address you like an idiot? Very well then:

The move generator does not run at infinite speed because it takes a finite time to generate a move. Testing and generating is not the same. So if testing does not take any time, that does not imply that generating takes any time.

Capice?

Nope. If you can "test for free" can you not formulate that into a methodology to generate moves for free as well? The question is similar (which squares are occupied vs empty, and what is on them). There is absolute no free "test" on a computer. Unless you can show me a few instructions that execute in zero cycles...

The last part is wrong. When generating captures, if one just notices (after sorting based on SEE) that the most valuable piece is a king, that search won't be necessary.
Ah, so the test involves generating captures, which is done completely in vain if the move was illegal. At least, by not generating illegal moves I save that time.
Ah, but most of the time it is not done. So you spend time trying to eliminate something that is not there, when I don't. That's my point. With alpha/beta, it is _always_ better to defer when possible/practical, because often it is a permanent deferral because of a cutoff. You are hung up on avoiding something I do that doesn't happen very often. I don't carry a meteor umbrella around since I don't expect to be hit by one very often. That's the idea here.

If the event is very rare, and it is, then the overhead is small.
And indeed it is. About as rare as that I have to do the pin test because there a blocker is found on a potential check ray. Except that doint that test is an order of magnitude cheaper than generating captures (about the same as generating a single capture), and almost always results in the suppression of generation of many captures. And each capture would have triggered a complete capture generation for the opponent in your scheme.

Still going in circles however. If we don't do that very often, but you do your test for _every_ move, then I wonder which takes the most time? That's my point.

That you can get away with sloppy and inefficient handling of rare cases is obvious and does not need discussion. But never claim that this is faster than more careful handling of the situation. There just isn't so much to be gained that you would care. Fine, but I like a tidy engine.

I want the fastest/strongest possible. Tidy and optimized are at opposite ends of the spectrum.

Again I will note that if I start off in check, most moves are illegal so I have a specific move generator that only generates legal moves in that particular case. And unfortunately it only makes my code less than 1% faster, showing just how low the overhead for making/unmaking an illegal move actually is.
Well, 1% is 1%, not to be despised.

No, but that is worst-case for my approach as well. If you start off in check, almost all moves are illegal. Trying them only adds 1%. What about normal positions where most positions are not "in check" positions? The difference can't be measured, at least on my code. I only see advantage to the "legal generator only" when the position is highly tactical with lots of deep checking lines. Then I get 1%. I didn't know that until I wrote the code or I would not have even done that...

So "stumbling into the next ply" is not that expensive, depending on how you design the code...
Only two orders more expensive than the pin test, which you started making a fuss about...

I doubt it. but I suppose we will never know. I am certain enough that moving an absolute pinned piece is rare enough it is not worth worrying about. If you feel differently, go for it. I suppose I could add a counter and run some positions to see how often it happens... I am pretty sure I know the answer already however...

Can we run some comparative tests and look at the tree sizes, time to solution, and NPS? That should point out which way is fastest... then you can _really_ determine what is competitive or not, rather than guessing.
I would say it would only point out which engine has the better evaluation, extension scheme, move ordering. But I know a better solution, to eliminate all those variables: compare the perft speed!

Fine by me, so long it is the originally defined perft algorithm. Start at an initial position, generate all moves and make each one. After each move is made, change sides, and generate all moves and make each one. Repeat until some fixed depth. Illegal moves must be discarded and not counted. Give me the position and I'll run it.

OK, to play that same game, sometimes _anything_ is an improvement if the original design is so poor...
Exactly.

Of course, a few dozen years of computer chess _still_ sees most programs generating pseudo-legal moves. Ever wonder why that is???
Chess programmers are not very innovaive, and all copy each other?

Perhaps there is a less arcane reason? Or perhaps you are the most innovative chess programmer of all time. That should become evident within a year or two if it is true. Otherwise, perhaps there's a lesson to be learned...

You keep resorting to hyperbole that is fictitious. "enormous cost"... I don't have any "enormous cost". If I turn off my legal-move generator completely, it saves me less than 1% of the total search time except for specific tactical circumstances where there are _many_ checks. <1% is hardly "enormous" by any measure I know of...
Well, like I said, compared to the pin test. You seem to reverse role know, as originally it was you who started complaining of an alleged inefficiency of suppressing generation of moves with pinned pieces, which is a very small fraction of this time. (And then earned back as a multiple.)

Actually my original complaint was about the corrupted approach many are using to do perft in the first place. Which makes the numbers (at least the speed numbers) impossible to compare.

[quote[

That is simply nonsense. There is no "realistic ratio". For each position you have to generate moves and then make/unmake each one. That's what the test was originally defined as doing...

Not at all. I am sure you generate many moves that need not be made. Because they are bad captures, or futile captures.[/quote]

Can I say this any clearer than:

"NOT IN PERFT"

We are talking about perft. We are not talking about alpha/beta. Not about anything else. Just "perft" which has been around since 1981 or 1982 I don't remember which.

So if someone says "my perft 7" runs in 5 seconds, while yours runs in 30 seconds..." they are not saying that their generate/make/unmake is 7x faster? That is _really_ "bullshit" to use your terminology. The implication is clear, because perft has been clearly defined for 20 years...
Apparently not.

Not to some, I agree.

Sorry. If you want to resort to ignorance, go ahead. If you feel that doing better on the silver or nunn positions does not help your program play better chess, then hang on to that belief. I _know_ it is a false one. One of these days I will take a 100,000 game match between crafty and the 5 opponents I usually use and post the results, before/after silver tuning. I originally played 100K games with all programs using their own books. I will run that again one of these days. Then you can see just how stupid that comment actually is...
I consider it in the same league as your remark of a faster move generation, make/unmake or hashing not making your engine do better...

Pretty infantile, eh? I spend 10% of my time or less in make move. If I make it 50% faster, my program becomes 5% faster. That is not going to be a big Elo boost, and I am not going to get 50% anyway. I don't believe I ever said that faster movgen had _zero_ effect. I did (and still do) say it has _minimal_ effect. That can be proven easily enough.

So what is your point? That you can make gratuitous comments when you run out of real data to talk about?

Again, "bullshit" comes to mind. "swap" (SEE) takes 13% of my total search time, exactly the same as make/unmake/movgen added together. Search() takes 10.5% of the time. I am now up to 25% after you add in HashProbe() which is almost 3%. So those pieces take 1/4 of my total search time and are nowhere to be found in the perft code. And my search will run as fast as the perft numbers?
I was not saying it could never be made to run slower. I am sure you are doing a very good job on that, by including all kind of stuff. But it is all dispensible.

Now I really have no idea where this is heading. SEE can be disposed of? And blow up my tree by a factor of 3-4? If you want, I can easily hack up my eval and compare the perft speed to actual search speed using just material to make it as favorable for you as possible. In fact:

rnbq1b1r/p1pp1p1p/4k3/1p1NP1p1/2QP1p2/5N2/PP1B1KPP/n6R w - - 0 1

perft 5 ->
total moves=69252968 time=11.10

By my reckoning, that is 6.2M nodes per second. Searching that position:

time=11.57 mat=-10 n=23518178 fh=98% nps=2.0M

So I search (material only) at just under 1/3 the speed of the perft code. According to my recent numbers I posted, eval was 50%. No eval in the above. That leaves 3x the work being done that is inside perft.

Since perft is doing make/unmake/movgen, the other 2/3 of the time being spent has to be the search itself, hash probing, repetition checking, etc.

So my comment about converting perft NPS to search was dead on. Saying "almost none" is actually pretty comical.

Try doing a profile run every now and again to keep grounded in reality. There are other parts. My eval is about 50% total, so there is more stuff I haven't enumerated.
Guess what? Routines that are not there never take any time in my profile!

Good. Then I don't have to worry about direct competition from your program for many years. If I am doing things you are not doing, and I consider myself to be doing the bare minimum to keep speed up, where does that leave you? Or, again, does your comment become gratuitous but meaningless???

So contesting other peoples statements for you is not based on what they actually claim about the thing they are talking about, but on if those statements would also hold for what you do in stead? Are you never concerned that this makes you look like an idiot?
No, you seem to have that market cornered so I'll let that continue, thank you. I actually do produce real data, do profile runs, analyze what is going on, etc. I don't "guess". Something you would do well to try it seems.
Well, people can read, (most of them, anyway), so I am not too concerned about remarks like that.
So even though I had the first such "test" known to man, and clearly defined what it should do, _I_ am discussing it improperly?
If we are discussing an engine without eval, and you keep babbling about the cost of eval, I would say: Yes!

OK, back up a few dozen lines. I now provided you with some data without evaluation included. Now dance and bobble around (again) real data, not what you "think". perft speed is only marginally related to actual search speed. I removed my eval, made a run, and the perft numbers were _still_ 3x faster than my program. Exactly as I had said they would be. Begin to think that I _do_ know what my program is doing???

And that is surprising? That "extensions" make the "tactical" parts of the tree (where there are more checks) bigger, leading to more nodes with fewer moves leading out (which drops NPS since the cost of going to a new node is amortized over fewer outgoing branches? I see the same thing in endgames...
Well, that confirms my guess then, doesn't it. It is just branching ratio, not extensions causing this effect.

Jeez. No extensions, no effect. So extensions are not causing the effect. Let me get this right. Extensions on, effect observed. Extensions off, effect not observed. Yet the two are independent?

perhaps because the topic is "computer chess" from the get-go???
Well, if you think discussions about perft have nothing to do with computer Chess, your opinion is noted...

Aleks Peshkov · Post by **Aleks Peshkov** » Mon Oct 15, 2007 10:27 pm

I want to ask Robert Hyatt about Crafty timing profile:
25% movegen/makemove/search
50% static eval

What is the rest? What part (search or eval) used the rest of time?

bob · Post by **bob** » Mon Oct 15, 2007 11:38 pm

Aleks Peshkov wrote:I want to ask Robert Hyatt about Crafty timing profile:
25% movegen/makemove/search
50% static eval

What is the rest? What part (search or eval) used the rest of time?

I'm not sure what you are asking. I have already provided these:

50% eval.
15% make/unmake/movgen
13% SEE
11% Search()
7% Quiesce()

The rest is in things like NextMove() and other relatively small contributions...

What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?