move count based pruning

Don · Post by **Don** » Sun Sep 05, 2010 3:16 am

I was thinking about this after my last post:

Let me put it another way. If you suspect that EVERY moves drops material, you still have to search the position in order for the alpha/beta procedure to properly score ancestor positions. You cannot just say, "all of these moves stink so I don't have to try any of them."
Sure I can. If the moves are bad, and the current beta bound tells me that these moves are not going to help, I can dump every last one of them. Safely.

I believe this is in error. Think about it.

You said "if the moves are bad ... ", but how do you know they are bad? You have to search them to prove they are bad. You cannot just say the moves cannot reach alpha and dump them "safely." You do not know that the moves cannot reach the window unless you search them.

If the moves all drop material, but the score is so good that you can still take a beta cutoff, you still have to try ONE of the moves. Of course you might get a null move cutoff but we are talking about LMR, not null move pruning.

No matter how you spin this it's out of context anyway - I'm talking about the general case where you are trying to make decisions about what to reduce with LMR with the emphasis on L for LATE. In other words you are at a node where you need to evaluated (at least in a reduced way) ALL the moves.

jdart · Post by **jdart** » Sun Sep 05, 2010 4:41 am

Unless you have a _really_ strong ordering algorithm, using the count is not very good. I have tried using a static evaluation call, and ordering based on those values, and found no gain at all. No loss either. But no gain. (measured in Elo on cluster-testing).

Most of the programs I've seen that do order based pruning are sorting the moves based on history scores, which I would consider a fairly weak ordering algorithm. Plus Stockfish and others exclude moves that for one reason or another look likely to be worth examining (pawn push is one example).

I tried this - the version that played in the recent ACCA tourney had a version of it - but I didn't like it and took it out. Now I have a fairly non-aggressive version back in. Empirically it seems to work for many programs, but it still seems risky to me, more so than LMR or futility pruning, because you are trusting the ordering + selection rules and those are not reliable.

bob · Post by **bob** » Sun Sep 05, 2010 4:58 am

Don wrote:
bob wrote: I don't think like that at all. I _always_ consider a move in the context of the current position. What I don't see you explaining at all is this: You have a list of moves. The moves at the bottom are considered "bad" and can be reduced or pruned. Why? Suppose the actual score difference between move 1 and move 40, after you do a normal search for each, is only 0.05??? So what makes reducing the last move a good idea compared to reducing all but the first, since they end up so close together? There are lots of similar cases. The point being that the last move in the list can be a lemon, or it can be good, but only slightly worse than the first. One can't tell based on the position in the list.

You continually made it seem that you don't have any interest in the values of the moves relative to each other.

Can you give me a quote where you believe I have suggested such a thing? I have clearly stated that I do not believe that a move's relative _position_ with respect to where it is in the move list means anything. I have also quite clearly said that the _value_ of each move _is_ important. Look at the title of the thread, "move count based pruning" which is about using the move's position within the move list as the criterion for whether to reduce or not. So again, to be clear _again_, I am definitely saying that the score is a useful thing for making pruning or reduction decisions. Just not the move's position number within the move list...

So it's a relief to see that you are not quite as clueless as I had believed about this particular point.

Had you simply noticed the thread's subject/title, you could have figured that out a whole lot quicker and a whole lot easier.

I think you are now trying to make the point that if the scores of a move are close that you should not reduce any of them. Is that right or wrong?

Maybe or maybe not. I am thinking that if the scores are "close" then the same rule should apply to all, rather than just using a move's relative position in the list.

[/quote]

My answer is that if moves late in the list are scoring very close to moves late in the list, it doesn't matter if you prune. I don't care if a miss a good move but instead play a different move that is just as good. The results are the same.

So I guess I'm saying the distribution of "plausibility scores" has only a minor effect one how many moves you should look at. In ALL cases, if you want a really strong program you must prune aggressively.

But I say it's undefined unless I am allowed to also consider the other moves too. If Ng5 hangs the knight, but opens up the king and every other moves leads to checkmate, then Ng5 is a GOOD move. Do you see the difference now?
See my above point. You simply do not get the point that the moves position in the list means _nothing_ except that by some quantitative measure, it is worse than the moves above it and better than the ones below. How much better or worse? Nothing you are saying tells me, because the L in LMR just means "moves near the bottom get reduced, moves near the top don't". That's the part I do not like.

I now understand your point of view - even if I don't fully agree with you.

But I believe you are obsessing over this unduly. It's probably the case that the relative difference in scores (by some quantifiable measurement) has some impact on which exact moves you look at, but as I just pointed out there is a bit of a feedback or corrective effect. If ALL the moves are good, you can prune quite severely and still know you are including a good move. So I don't see what the big deal is.[/quote]

I've played enough chess to realize that there are positions where _accurate_ calculation is essential, else you lose. And in lots of positions, you do not have the flexibility to exclude the moves that are critical to produce correct analysis. Fine's book is full of such positions where one single misstep is all that is needed to turn a 1 into a 0. I'm not a fan of pruning moves that might influence the actual PV. So the concept of "good move" becomes relative to what I have found so far, and what I have measured for each of the moves I am about to consider. In a lost position, you might reach a position where every move gains material, but still leaves you in a lost position. I'd reduce those in an instant. It is all about what is happening at the time.

One thing I want to point out that it is FAR easier to compare moves than to produce accurate estimates of their scores. This is sort of a computer chess fundamental principle in my opinion. I'm not saying there is not some value in pruning what superificially appears to be "really bad" moves, but it would be easy to take this too far.

And I still think its wasteful to not take advantage of good move ordering. This is not useless information by any means and it should be taken advantage of.

By the same token, choosing a "magic number" where you don't reduce until that number of moves has been examined is _also_ throwing away good information, namely the individual move scoring information you used to order the moves.

You also said this:

And if a frog had pockets, he'd carry a gun and never worry about snakes in the pond. But move ordering is _not_ perfect. In reality, we hope that moves near the top are better than moves near the bottom. And I'd hope I don't need to point out how many moves near the bottom turn out to be good. Take WAC #141 and look at the Qf4 move that is best, but is almost certainly ordered right at the bottom of the list.
LMR is a form of highly speculative pruning so what you do is play the odds. I know that move ordering is not perfect but you HAVE to reduce something in order to get the huge performance (in speed) benefit and I'm certainly not going to ignore useful information that I can utilize just because I can find examples of where it is wrong once in a while.

If you'd just read my question above, you didn't answer it. What is it that makes moves below some magic point bad, and moves above that magic point good, particularly when the magic point is _static_ for _all_ positions in the tree???

I'm not going to sit here and tell you I have found the perfect formula, I haven't. What I actually do is not perfectly consistent with my philosophy but it's almost impossible to make that happen.

My philosophy in a nutshell is this. First of all, try not to reduce the best move. Secondly, try to reduce every move you can, even really good moves, as long as you have not reduced the best one. And finally, if you DO reduce the very best move, it's probably ok if you have still included a very good move.

The devil, as they say, is in the details. Notice that I never mentioned the actual score relative to other moves. There could be some slight gain by doing so, but I'm pretty sure I would have a difficult time making anything major out of that.

So you have two reasonable moves, why does one get searched deeper than the other, just because one is one one side of some oddball boundary (L value in LMR) and the other is on the other side? Their ordering scores are close. Why not treat both the same since the ordering score says they are the same? Yet that is not what is being done.
If you have a serious hangup with oddball boundaries, you should not be involved in computer chess. I assume you do some form of futility pruning no? How did you choose your cutoff value?

I don't _just_ have a cutoff value. There are other rules that are in play at the same time. The cutoff values are quite obviously statistical averages. The real formula to choose which moves to prune and which to search is not a linear function. Yet we approximate using such. And tune it to produce the best result we can. With absolutely no clue about how far we are from the "optimal" solution.

In operating systems, when discussing demand paging, we always discuss a pure random replacement strategy, and the "optimal strategy" (which needs access to all future page references before they happen.) Now we have a lower bound (any algorithm worse than random would be thrown out and replaced by random) and an upper bound which can't possibly be improved on. Now we find out where our new replacement strategy stands with respect to those two bounds to discover whether there is room for improvement or not.

That same idea holds true here. If we were doing these 24+ ply searches, and _not_ reducing good moves, the programs would be hundreds of Elo stronger than they are today.

If you are so uncomfortable with "boundaries" then how is it that you can forward prune a whole tree branch if the score is X, but if the score is X-1 you have to search it? Isn't that a rather artificial boundary?

To answer your question, if they are both the same score then if I search one and reduce the other I save time. Isn't that the whole idea of LMR?

The basic idea of LMR is to simply reduce more and more as a node is searched and you become more and more convinced it is an ALL node. I don't do what one would call "LMR" in that context. I exclude hash, good captures and killers for a start. And then have been experimenting with various other move-specific rules, as opposed to being concerned about where a move sits in the list.

I think it's stupid to search moves endlessly brute force just because some score estimate is nearly the same. At some point you have to be a man and make a decision and pick some arbitrary (but presumably well tuned set of rules) that work well. Please don't try to tell me that nothing you do in Crafty is arbitrary or boundary-less.

I can certainly say nothing we do is "arbitrary". Everything has been done via testing. So any "magic numbers" at least have some basis in their origin. I've been working on this from both ends, as I mentioned, because I also felt it was stupid to extend moves just because they are checks. There's Elo to be had by not doing that. In fact, I discovered that excepting the check extension, all the others I had were Elo losers, not gainers, and all except the check extension have been gone for a couple of years.

I personally believe that any "magic number" whether produced by testing or guessing, leaves a hole that can be repaired with effort. Our evaluation certainly has lots of specific code to handle specific things, rather than just finding the optimal number to return for pawn structure, regardless of pawn positions. Seems to me the same sort of very specific/explicit rules can be used everywhere to make decisions more accurate.

I think you have this idea that if there are a lot of good moves that they all MUST be searched.

Yes I do have that idea. Good moves need to be searched to be sure that my PV is as accurate as possible and I don't make mistakes due to simple lack of searching the good moves my opponent might play against me. If I get a cutoff on one, of course the rest can be ignored. But I want to reduce effort on lousy moves, and search normal moves to usual depth, and perhaps even extend a few really good moves to search them even deeper than normal.

I think this is wrong. Everything is a tradeoff so sure you can search more moves in order to "sure" that you don't throw out a move but you don't get a free ride - you have to pay for this safety with a slower search.

I have an idea for you. Don't reduce ANY moves, then you won't make any mistakes due to accidentally throwing out a good move.

Here's an idea for you. Do you have any idea what doing that would do to Elo? I don't have an "idea" I have an _exact_ answer. -40 Elo. At least the last time I tested that explicitly. So we gain 8-10 plies of depth, for maybe 40 Elo. Doesn't that sound just the least bit like something that could be improved???

And if you randomly search some of those moves to D, and some to D-2, and a few more to D-3, and even a few more to D-4, you _really_ have a good picture of what is going on?
Yes.
Only if those reduced moves really are not going to be a part of what happens.
Please don't keep saying "randomly search" as there is nothing random about it.

When you order a list of moves, by some useful value (say static eval after the move - static eval before the move), but then simply say that the first N moves get searched to normal depth, next M get reduced by 2, next O get reduced by 3, etc. That is the definition of randomness. You totally ignore the score information you had. And instead use a move's position within the list. Regardless of how many moves there are that might be good, bad or ugly or anywhere in between. I call choosing the values for M, N, etc pretty random things.

But yes, this is basically how Komodo works. I reduce more as I progress through the move list.

And as I said, you are doing pure LMR. Which is a solution that offers some gains. But I happen to believe there is a much better way than the "L" in LMR, a way that actually uses the move information itself, rather than just the position of the move in the list, to make those decisions.

I think you believe that if a move is losing it should not be searched, but that is erroneous. A move should not be searched if it's WORSE than the best move, but it's absolute value is not relevant.
I almost believe what you say. I believe, for example, that if I am within 3-4 plies of the frontier, and I am down a queen, I can safely prune moves that don't recapture a major part of that missing material back. But notice that I am using the score (material gain or loss of a specific move) along with the alpha bound which reflects the current state of the game, to decide whether the move can be tossed. Not the move's position in the move list, which is not used in pruning decision or reduction decision at all...
I will think of giving something like that a try in Komodo.

Let me put it another way. If you suspect that EVERY moves drops material, you still have to search the position in order for the alpha/beta procedure to properly score ancestor positions. You cannot just say, "all of these moves stink so I don't have to try any of them."
Sure I can. If the moves are bad, and the current beta bound tells me that these moves are not going to help, I can dump every last one of them. Safely.
For the sake of this discussion, alpha and beta should be ignored. It's understood that all kinds of shortcuts are possible when you factor in alpha and beta but it's not relevant to this discussion. Komodo knows about alpha and beta of course and takes the window into consideration in it's decisions about which moves to reduce or forward prune.

Why is it irrelevant to this discussion? I believe it is an integral part of the overall process. We are talking about an alpha/beta search, correct?

bob · Post by **bob** » Sun Sep 05, 2010 5:06 am

Don wrote:I was thinking about this after my last post:

Let me put it another way. If you suspect that EVERY moves drops material, you still have to search the position in order for the alpha/beta procedure to properly score ancestor positions. You cannot just say, "all of these moves stink so I don't have to try any of them."
Sure I can. If the moves are bad, and the current beta bound tells me that these moves are not going to help, I can dump every last one of them. Safely.
I believe this is in error. Think about it.

You said "if the moves are bad ... ", but how do you know they are bad?

by doing the analysis exactly as I said.

If alpha = 0, the current material score is -20 pawns, and the move I am considering is within 4 plies of the horizon and doesn't recapture _anything_, doesn't check the opponent (to possibly fork something), then it is a pretty safe move to dump. Could a non-capture non-check win more than two queens? Possibly. Once a century when you only have 4 plies including the current one to work with.

So, quite simply, "bad" is with respect to (a) current lower bound; (b) current material score; (c) expected material gain or threat level of the current move...

You have to search them to prove they are bad. You cannot just say the moves cannot reach alpha and dump them "safely." You do not know that the moves cannot reach the window unless you search them.

I believe that when doing a 24+ ply search, and you get to ply 21 down major material, there is little chance that a quite move is going to recover the missing material. Seems safe enough. Yes there is a risk. Yes you can set the threshold so that you make fewer errors. I would almost bet that over the next 20-30 years, search will go backward. We started off selective in 1968 (my program, greenblatt, etc). We moved to full-width in 1975 or so, starting with Slate. now back to selective search again. And as hardware speeds continue to improve, wouldn't surprise me to see some start to drive out this inherent error by backing off of the pruning.

If the moves all drop material, but the score is so good that you can still take a beta cutoff, you still have to try ONE of the moves. Of course you might get a null move cutoff but we are talking about LMR, not null move pruning.

No matter how you spin this it's out of context anyway - I'm talking about the general case where you are trying to make decisions about what to reduce with LMR with the emphasis on L for LATE. In other words you are at a node where you need to evaluated (at least in a reduced way) ALL the moves.

Where you "hope" you need to evaluate all moves. And if you reduce them, you guarantee this to be true, where in reality a move way down the list is a winning tactical shot that goes overlooked.

mcostalba · Post by **mcostalba** » Sun Sep 05, 2010 10:22 am

bob wrote: In operating systems, when discussing demand paging, we always discuss a pure random replacement strategy, and the "optimal strategy" (which needs access to all future page references before they happen.) Now we have a lower bound (any algorithm worse than random would be thrown out and replaced by random) and an upper bound which can't possibly be improved on. Now we find out where our new replacement strategy stands with respect to those two bounds to discover whether there is room for improvement or not.

That same idea holds true here. If we were doing these 24+ ply searches, and _not_ reducing good moves, the programs would be hundreds of Elo stronger than they are today.

In chess we have still to find the "pure random replacement strategy".

When you search millions of nodes per sec there is some powerful statistics involved and we have just scratched the surface.

This is the key point that I think you and Don both miss to properly evaluate.

Let's consider the case of totally unsorted non-captures moves.

Here is a rule:" Reduce 1 ply after 5 moves and 2 plies after 15 moves"

Here is another rule"Reduce 1 ply after 3 moves"

What is the best ?

We can write thousand of rules for LMR and for pruning that work in different way and yield completely different results _even_ if the underlying non-captures moves are always generated in a randomly / unsorted way.

So, coming back to your (not appropriate) OS example: We still have to find the "pure random replacement strategy" for chess engines.

My guess is that the understanding of the pure statistical behaviour of LMR and pruning has still an unexploited potential, and that it is possible to get a bigger gain from there then from some "magic" ad-hoc knowledge about which moves to prune/reduce and which not.

A recent example is the multi reduction LMR idea that you have copied in your last Crafty version, even if you have stated above that has no sense (that's the reason I used the verb "to copy") and has proven to be a big improvement. This improvement is 100% statistical based: a different compromise between tree size and accuracy modeled just on statistical behaviour in real games.

This has very little to do with move ordering, of course a good ordering helps, but it could work even in the case of completely unsorted non-captures (as is in Crafty where the last version proved it works the same).

I think this is the real _new_ gold mine that was too deep for 90's engines, that due to hardware limitations were forced to dig in the more accessible but poorer mine of extensions, ad-hoc rules, pseudo-knowledge etc.

Today we have access to this new powerful source and we have just scratched the surface, looking at "which type of move prune/reduce and which not" is looking back IMHO.

Don · Post by **Don** » Sun Sep 05, 2010 4:07 pm

mcostalba wrote:
bob wrote: In operating systems, when discussing demand paging, we always discuss a pure random replacement strategy, and the "optimal strategy" (which needs access to all future page references before they happen.) Now we have a lower bound (any algorithm worse than random would be thrown out and replaced by random) and an upper bound which can't possibly be improved on. Now we find out where our new replacement strategy stands with respect to those two bounds to discover whether there is room for improvement or not.

That same idea holds true here. If we were doing these 24+ ply searches, and _not_ reducing good moves, the programs would be hundreds of Elo stronger than they are today.
In chess we have still to find the "pure random replacement strategy".

When you search millions of nodes per sec there is some powerful statistics involved and we have just scratched the surface.

This is the key point that I think you and Don both miss to properly evaluate.

There is no doubt that I'm missing a lot, but I'm not sure I'm missing this point - but I'm also not sure I understand what you are saying.

My program uses a statistical technique to oder the moves. It's similar to how Stockfish does it but there are lot of details that are different.

The moves are sorted according to these statistics. I'm not evaluating them in the traditional sense, for example I'm not applying a static evaluation or piece square tables or anything like that.

But the statistics are powerful, it tends to place the best moves first with surprising accuracy. In fact I studied this by remove LMR and taking samples. I studied cutoff positions and positions that did not cut off.

This same technique could be applied in "bob-like" fashion by not ordering the moves at all and somehow using the scores that fall out of this procedure. In this way Bob would be happy because I would be going by some kind of magic score instead of some kind o magic move number.

None of this changes my belief that you must try to always include the best move and that with any reasonable move ordering you can expect to find this move early in the list. The law of diminishing returns kicks in with a vengeance.

Let's consider the case of totally unsorted non-captures moves.

Here is a rule:" Reduce 1 ply after 5 moves and 2 plies after 15 moves"

Here is another rule"Reduce 1 ply after 3 moves"

What is the best ?

We can write thousand of rules for LMR and for pruning that work in different way and yield completely different results _even_ if the underlying non-captures moves are always generated in a randomly / unsorted way.

So, coming back to your (not appropriate) OS example: We still have to find the "pure random replacement strategy" for chess engines.

My guess is that the understanding of the pure statistical behaviour of LMR and pruning has still an unexploited potential, and that it is possible to get a bigger gain from there then from some "magic" ad-hoc knowledge about which moves to prune/reduce and which not.

A recent example is the multi reduction LMR idea that you have copied in your last Crafty version, even if you have stated above that has no sense (that's the reason I used the verb "to copy") and has proven to be a big improvement. This improvement is 100% statistical based: a different compromise between tree size and accuracy modeled just on statistical behaviour in real games.

This has very little to do with move ordering, of course a good ordering helps, but it could work even in the case of completely unsorted non-captures (as is in Crafty where the last version proved it works the same).

I think this is the real _new_ gold mine that was too deep for 90's engines, that due to hardware limitations were forced to dig in the more accessible but poorer mine of extensions, ad-hoc rules, pseudo-knowledge etc.

Today we have access to this new powerful source and we have just scratched the surface, looking at "which type of move prune/reduce and which not" is looking back IMHO.

mcostalba · Post by **mcostalba** » Sun Sep 05, 2010 4:18 pm

Don wrote: None of this changes my belief that you must try to always include the best move and that with any reasonable move ordering you can expect to find this move early in the list. The law of diminishing returns kicks in with a vengeance.

Also with no ordering at all you can expect to find a cut-off move early in the list....or don't find at all.

IMHO good ordering is just a second order optimization above something that already works independently from the order of the moves. And this something that works is move counting based pruning / reducing.

But my point is that IMHO we are _still_ far to find the optimal move counting rule (out of possible thousands) to prune / reduce the move under hypotesis of totally unsorted non-captures. We had a breakthrough about one year ago when in Ippo was shown a working multi reducing LMR approach that now have been adopted by almost all the engines. This demonstartes that IMHO there is still big room for improvment in statistical shaping the reducing / pruning rules, before to go to consider a better non-captures ordering.

Strangely people is concentrated in finding a better ordering (possibly because is a more traditional and well unnderstood argument) then in finding a rule that better takes advantage of the statistical properites of teh search, as the multi reduced LMR was.

bob · Post by **bob** » Sun Sep 05, 2010 4:31 pm

mcostalba wrote:
bob wrote: In operating systems, when discussing demand paging, we always discuss a pure random replacement strategy, and the "optimal strategy" (which needs access to all future page references before they happen.) Now we have a lower bound (any algorithm worse than random would be thrown out and replaced by random) and an upper bound which can't possibly be improved on. Now we find out where our new replacement strategy stands with respect to those two bounds to discover whether there is room for improvement or not.

That same idea holds true here. If we were doing these 24+ ply searches, and _not_ reducing good moves, the programs would be hundreds of Elo stronger than they are today.
In chess we have still to find the "pure random replacement strategy".

When you search millions of nodes per sec there is some powerful statistics involved and we have just scratched the surface.

This is the key point that I think you and Don both miss to properly evaluate.

Let's consider the case of totally unsorted non-captures moves.

Here is a rule:" Reduce 1 ply after 5 moves and 2 plies after 15 moves"

Here is another rule"Reduce 1 ply after 3 moves"

What is the best ?

We can write thousand of rules for LMR and for pruning that work in different way and yield completely different results _even_ if the underlying non-captures moves are always generated in a randomly / unsorted way.

So, coming back to your (not appropriate) OS example: We still have to find the "pure random replacement strategy" for chess engines.

That is simply wrong. I have _already_ tested a purely random algorithm. One just picks a move from the list and then randomly reduces or doesn't reduce that move. Where do you think I came up with the idea that the current approach is nowhere near optimal???

There are a couple of simple observations one can make, if interested.

(1) Can we do better? Simplest idea is to turn LMR completely off and see what we lose in search depth. Then turn it on and see what we gain in depth. Then compare the loss in Elo with the gain. Then map Elo gain to search depth gain. And finally, go basck to pre-reduction / pre-forward-pruning days and see what we got per ply back then.

(2) given a tactical chess position, run it with LMR off, and then with LMR on. How much deeper do we need to search to find the solution with LMR on? Optimally, zero. Actually, quite a bit. There's lots of room for improvement there.

My guess is that the understanding of the pure statistical behaviour of LMR and pruning has still an unexploited potential, and that it is possible to get a bigger gain from there then from some "magic" ad-hoc knowledge about which moves to prune/reduce and which not.

Intuition says otherwise. If you have a complex mathematical function (and certainly which moves to reduce or not reduce fits) but you approximate it with a 1st degree (or 2nd degree and beyond) function, there will always be inherent error, until your approximating function is actually the real function you are trying to model. At present, we are using a simple linear function at best.

I do not believe that statistical sampling will ever beat "the real deal" in terms of what to reduce or not reduce. It could be improved, but more direct application of chess knowledge makes more sense. Otherwise we could do evaluation in the same way and not have so many different terms for various chess features.

A recent example is the multi reduction LMR idea that you have copied in your last Crafty version, even if you have stated above that has no sense (that's the reason I used the verb "to copy") and has proven to be a big improvement. This improvement is 100% statistical based: a different compromise between tree size and accuracy modeled just on statistical behaviour in real games.

Not quite sure what you mean about "copied". My code addresses one _very specific issue that no one else does. I have a couple of cases that I wanted to deal with:

(1) If I have a hash move, or non-losing captures, or killers, I am trying those first because I expect a cutoff. And I don't reduce them because I hope for that cutoff. I then start reducing moves immediately after those moves are searched. Even If there is just one of those moves, then I reduce moves beyond those by 3 plies.

(2) special (and odd) case where there is no hash move, no good capture, and no killer at all. I do not want to reduce the first move I search by the full 3 plies, because if this is a CUT node, I am taking a big risk. So for this case, if the move passes my normal "ok to reduce" tests, I reduce it by 2 plies. This gave 2-3 Elo over what I was doing previously. Since there is no hash move, no good capture, and most specifically, no killer, this is not a common position, and testing showed this happened most often in ALL nodes, not CUT nodes. Hence the current idea.

But notice, if I have any move that has a reasonable prospect of producing a cutoff, all others get reduced by 3 plies. If I have no hint about a potential good move, the first is reduced by 2, the rest by 3. I have tried other approaches, non has worked any better. I still believe there is a far better way to choose what to reduce or not, which doesn't rely on linear position in a list, which mine still does.

This has very little to do with move ordering, of course a good ordering helps, but it could work even in the case of completely unsorted non-captures (as is in Crafty where the last version proved it works the same).

I think this is the real _new_ gold mine that was too deep for 90's engines, that due to hardware limitations were forced to dig in the more accessible but poorer mine of extensions, ad-hoc rules, pseudo-knowledge etc.

Today we have access to this new powerful source and we have just scratched the surface, looking at "which type of move prune/reduce and which not" is looking back IMHO.

The thing is, statistical analysis gives us good averages. Not best-case behaviour. The distance between where we are, and where we could be, is enormous. And is a product of the error we introduce. Yes, adding a new idea that works better 51% of the time will give us a boost. But that missing 49% is still there for the taking. But it does take significant effort to take it.

bob · Post by **bob** » Sun Sep 05, 2010 4:44 pm

mcostalba wrote:
Don wrote: None of this changes my belief that you must try to always include the best move and that with any reasonable move ordering you can expect to find this move early in the list. The law of diminishing returns kicks in with a vengeance.
Also with no ordering at all you can expect to find a cut-off move early in the list....or don't find at all.

IMHO good ordering is just a second order optimization above something that already works independently from the order of the moves. And this something that works is move counting based pruning / reducing.

But my point is that IMHO we are _still_ far to find the optimal move counting rule (out of possible thousands) to prune / reduce the move under hypotesis of totally unsorted non-captures. We had a breakthrough about one year ago when in Ippo was shown a working multi reducing LMR approach that now have been adopted by almost all the engines. This demonstartes that IMHO there is still big room for improvment in statistical shaping the reducing / pruning rules, before to go to consider a better non-captures ordering.

Strangely people is concentrated in finding a better ordering (possibly because is a more traditional and well unnderstood argument) then in finding a rule that better takes advantage of the statistical properites of teh search, as the multi reduced LMR was.

This is still mixing apples and oranges.

There are two _different_ issues. I am talking about the first, below.

(1) what to reduce or not to reduce? we want to reduce moves that are irrelevant, while not reducing moves that will influence the final score or PV.

(2) we want to reduce effort, and here the count-based idea works, because at any node in the tree, you can categorize it as PV, CUT or ALL. If ALL, we want to avoid as much effort as possible since ALL nodes (real ALL nodes of course, not nodes that become artificial ALL nodes because of improper reductions or pruning decisions) don't influence anything and cost us in terms of effort. The more moves you search at any node in the tree, the more likely it is to be a real ALL node. And the more convinced you become it is an ALL node, the more you want to reduce the effort expended. Current count-based ideas capture this idea effectively. Whether you use a single reduction step, or a continuous function that ramps up the reduction as you search more and more moves, does not matter. If you are confident enough in this idea, you can stop reducing and start pruning, which really saves effort (you might experiment here, I have some interesting notes about this idea that are food for thought.) But doing any of these introduces error. So we have to tune the idea to control error and effort simultaneously, and end up with a non-optimal (but workable) solution.

I give two cases because they are completely orthogonal to each other. If this is an ALL node, then reduction or no-reduction is a moot issue, as all moves should be reduced or pruned away entirely. But we can't predict ALL vs CUT better than about 90% of the time. Which means that 9 out of 10 moves that we reduce are ok to reduce. But what about that last one out of every 10 moves that is _not_ safe to reduce? Doing so introduces error. Significant error.

This is one reason that things seem to go by so quickly when you find a strong tactical shot in a game. Alpha goes way up, everything that should be ALL remains ALL, and our effort reduction works well. I see positions where the fail-high-on-first-move in Crafty almost reaches 100%. But I also see positions where it is in the 80's, which leaves room for a _lot_ of reduction-based or pruning-based error to creep in. And, as always, error is cooperative and creeps in whenever asked, gleefully...

mcostalba · Post by **mcostalba** » Sun Sep 05, 2010 4:46 pm

bob wrote:I still believe there is a far better way to choose what to reduce or not, which doesn't rely on linear position in a list, which mine still does.

You have to forget the "position in a list" point of view. To understand the concept you have to look at that as a diminishing probability of finding a cut-off move: that's what move counting pruning / reducing is all about. The list is just an implementation detail.

The point is that "after I have tried n moves I have smaller probability to find a cut-off in the rmaining m-n moves that I had at move 0"

That's the concept in its plain and simple terms. The list and the counter is just an implementation that fits well.

move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning