When does a cut-node became an all-node?

lkaufman · Post by **lkaufman** » Wed Mar 28, 2012 11:26 pm

diep wrote: 3. What is your opinion of the idea of increasing the null move reduction with increased depth? Stockfish pushes this to the extreme, adding another ply of reduction for each 4 plies of search. I imagine you will think it is a bad idea, but I don't like to assume such things.

Larry

Your 3 would of course deserve its own subthread.

I look at it maybe from a different viewpoint than you do.
My viewpoint is: "what actually are we doing with nullmove?"

In principle we're doing a move. So we are replacing making a move by one of the worst possible moves.

In fact we replace our normal search by a refutation search carried out by the opponent and we do that RECURSIVE. So we just need to look at THIS position, not to the total search length. We need to look. How MUCH can we reduce THIS move, assuming it's the worst move in history?

So calculating a mathematical optimum there is very complicated, as we do some heavy assumptions and factual give the opponent the move.

Not possible in the same 'simple manner' like with reductions. This is a total different type of thing.

Yet in the end we simply replace a move by a search for the opponent.

If we're at search depth D, then nullmove gets carried out at D-R-1
Don't forget that '-1' as well.

In principle however we continue our normal search just at a reduced depth. We do this to basically prove how strong our position "at least" is.

So the first observation of course is that our R is going to be dependant upon how much we do our normal reductions.

If you take more risk there, then it makes sense to do that in nullmove as well.

Not that they are any similar, but let's put it this way, if we have some sort of crappy way of hashting, say we hash positions to 32 bits, it doesn't make sense to use ECC memory.

Now if i assume a reduction of 1 ply for the reductions.
We see there that we have at most

D - Red - 1 = D - 1 - 1 = D - 2

With R=2 for nullmove we get to:

D - R - 1 = D - 3

As for a given depth, we can risk reducing by 100% the same depth,
that means that we can double our reduction effort.

So that means D-4 is ok. Which is R=3 for nullmove.

Of course this is for the mainsearch, what happens the last few plies in chess engines - no one knows what is best to do there. That also changes each few years. In 80s they razored. Start 90s Ed Schroder had optimized it even further and just forward pruned even bigger search depths.

Then mid 90s with a tad more computing power it was only nullmove no nothing razoring, as nullmove picked up more tactics as well then and to quote Frans Morsch: "We don't have the system time to do any sophisticated pruning".

So i basically skip the last few plies in the above guessing.

Now if you are doing reductions of 2 ply.
that means effectively D - 2 - 1 = D-3

So i would then use R=5 everywhere for nullmove,
as R+1 == 6 which is double the 3 of D-3.

that would mean that you search 1 ply at:

1 = D - R - 1 => D = 2+R = 2+5 = 7 ply

So i'd design then some sort of superrazoring last so many plies, say last 4 plies and at plydepths 5..6 i'd do some fast tactics only nullmove, maybe just a sophisticated qsearch, to pick up some tactics.

From my viewpoint doing a nullmove R=2,3,4 doesn't make much sense if you are gonna reduce with a ply or 2 anyway (on top of the normal 1 ply reduction), as that means that just doing 2 moves reduced already reduces more than a nullmove, making nullmove pretty much worthless.

Of course it's unclear what to do last few plies.

Maybe works for you.

This answers your question?

What i intend to do with Diep is not such thin search for now. I intend to use R=3 everywhere with nullmove and reduce 1 ply and try to make an efficient parallel search on a cluster is yet another challenge from a shared memory box - if i lose too much to the compiler GCC and to the MPI overhead and to clumsy parallel overhead then already a 2 socket Sandy Bridge is gonna outsearch my 64 power efficient oldie L5420 Xeon cores.

Then just improve evaluation. My blindfolded guess was that if i can get a ply or 23 and improve evaluation bigtime, that's enough to win.

Vincent[/quote]

OK, so your answer to my question seems to be that the null reduction should not depend on depth remaining, except on the last few plies (we already do special stuff on those plies). This is very hard to test as it requires testing at deep levels where it is not very practical to get large sample sizes.
You also say that the amount of null reduction should go up with the amount of LMR reduction. But do you mean maximum reduction, average reduction, or what? A typical program might not reduce the first 3 moves, might reduce the next seven by one ply, and might reduce the rest by two plies. So how would you calculate the "proper" null reduction for such a program?

Larry

diep · Post by **diep** » Wed Mar 28, 2012 11:37 pm

lkaufman wrote:
diep wrote: 2. It's not a proof of anything of course, pure mathematical seen. It's a simple calculation, yet quite lengthy. Realize very popular also back then was fractional plydepths, so when calculating the optimum we c an use broken depths as well. That makes it however a lineair programming problem which as we know is solvable even without complex numbers.

The answer to this problem is not 42 in this case, yet 1, assuming the normal reduction for the depthlimited search also is 1. So that makes the total reduction 2 in c ase of having areduction and 1 in case of not reducing or research. Note back in 1999 i didn't research, in case of a fail high.

Vincent
Not re-searching after a fail high is a huge difference. I wouldn't want to reduce more than one ply either in that case.

Look there is no disagreement here for the current.

However we speak 1999 now.

Larry i need to quote Ed Schroder there. Back then search depths were not so huge. Researching really hardly mattered back then. With todays search depths that definitely is different. Also realize that for LMR doing a research is more crucial, than if you reduce based upon chessknowledge, which is what i did do.

Ed reported also that he didn't research back then and that he had tested it carefully (in contradictionto me) and that it won him no elo to research, just lost him nodes.

Diep back then got very little nps. So the search depth win by reductions was real little. It just didn't break even back then.

What i do nowadays, which isn't LMR, just agressive doing reductions; it is far more sophisticated and wins a lot of plies.

Back then really the problem was that if i didn't reduce nearly everything, that it won worst case just 1 ply search depth, as the nps was that low.

GCP: "Reductions are total trivial to invent"

He did however invent them when he got quite some more nps with DeepSjeng - lucky him

Vincent

diep · Post by **diep** » Wed Mar 28, 2012 11:59 pm

You also say that the amount of null reduction should go up with the amount of LMR reduction. But do you mean maximum reduction, average reduction, or what? A typical program might not reduce the first 3 moves, might reduce the next seven by one ply, and might reduce the rest by two plies. So how would you calculate the "proper" null reduction for such a program?

Larry

Larry to say it less careful - what i claim is that because we do nullmove recursive, i never would want to use myself a huge reduction factor for it.

However, if you already take huge other risks in search, knowing nullmove is a very strong assumption, much stronger than any reduction, it doesn't make sense to have the reductions get more effective in reducing than nullmove; so i would not want small reduction factors at all then.

With 1 ply reduction : R=3 for nullmove as a constant
with 2 ply reduction : R=5 for nullmove as a constant

Note that i'm not a big fan of 2 ply reduction right now.
Might change in time, but i'm not right now.

So no R that gradually becomes slowly larger. A constant reduction.

In fact i find R=3 already pretty tricky. It's a huge reduction for just
1 bad move, knowing that we do this RECURSIVE.

bob · Post by **bob** » Thu Mar 29, 2012 12:01 am

sedicla wrote:
bob wrote:
sedicla wrote:According to chess wiki programming is when after all candidate moves are tried, but fruit does it after first move is tried.

I thinking of trying doing things differently according node type, for example:
cut-node - use static null move (like stockfish) and don't do razoring (eval + margin < beta).
all-node - no static null move, try razor, no ordering for quiet moves...

What you guys think is a good time to switch a cut-node to all-node?

Thanks.
First, why does it matter? Are you somehow trying to type a node as PV, CUT or ALL, and then do things differently if it is CUT as opposed to ALL? A
"CUT" node becomes an ALL node when there is no cutoff to be found. Most common reason is poor move ordering. This is not the best move here although we thought it was, so another move will cause a cutoff later. On occasion, it is not ordering, but is a depth issue. Suddenly you see deep enough to see that the score is worse than you expect, no matter which
move you try, so none cause a cutoff...
Well, thats what im trying to figure it out, if it matters or not. I can see that is your opinion that it does not matter.
My idea is exactly that, do different things at all and cut nodes.

Here's the BIG question:

Why?

More commonly, people seem to be doing different things at PV vs non-PV nodes. I don't follow that reasoning either... the idea of the search is to find the best move. If you spend less effort on non-PV nodes, you will miss better moves. But for ALL vs CUT, the only advantage I see is the one I used in DTS, that is you want to split at an ALL node, never at a CUT node. Beyond that, I can't think of any reason why I might prune diffierently, reduce differently, or extend differently, just because a node was ALL rather than CUT or vice-versa...

diep · Post by **diep** » Thu Mar 29, 2012 12:02 am

Look what i feel is the crucial question that will determine what form of search is best, is "how deep do we need to search for strategical plans?"

I never researched this subject very well. Nor did anyone else do as far as i know.

Yet it's pretty crucial.

If a plan is on average say 20 ply, it really makes little sense to search 40 ply. 40 ply gives more accuracy than 20 ply, sure, but it won't change the plan.

Vincent

lkaufman · Post by **lkaufman** » Thu Mar 29, 2012 12:31 am

bob wrote:
sedicla wrote:
bob wrote:
sedicla wrote:According to chess wiki programming is when after all candidate moves are tried, but fruit does it after first move is tried.

I thinking of trying doing things differently according node type, for example:
cut-node - use static null move (like stockfish) and don't do razoring (eval + margin < beta).
all-node - no static null move, try razor, no ordering for quiet moves...

What you guys think is a good time to switch a cut-node to all-node?

Thanks.
First, why does it matter? Are you somehow trying to type a node as PV, CUT or ALL, and then do things differently if it is CUT as opposed to ALL? A
"CUT" node becomes an ALL node when there is no cutoff to be found. Most common reason is poor move ordering. This is not the best move here although we thought it was, so another move will cause a cutoff later. On occasion, it is not ordering, but is a depth issue. Suddenly you see deep enough to see that the score is worse than you expect, no matter which
move you try, so none cause a cutoff...
Well, thats what im trying to figure it out, if it matters or not. I can see that is your opinion that it does not matter.
My idea is exactly that, do different things at all and cut nodes.
Here's the BIG question:

Why?

More commonly, people seem to be doing different things at PV vs non-PV nodes. I don't follow that reasoning either... the idea of the search is to find the best move. If you spend less effort on non-PV nodes, you will miss better moves. But for ALL vs CUT, the only advantage I see is the one I used in DTS, that is you want to split at an ALL node, never at a CUT node. Beyond that, I can't think of any reason why I might prune diffierently, reduce differently, or extend differently, just because a node was ALL rather than CUT or vice-versa...

Bob, I understand why you don't see the logic behind treating PV nodes differently than non-PV nodes. But it is quite obvious now that it does work. Every strong program makes this distinction now, and we would all drop a ton of elo points if we didn't. I'm sure Crafty would go up a lot if you made the distinction. I can't explain why it works, but it does, that's a fact. It's clearly a probabilistic thing; the odds of a reduced or pruned move being important are higher at PV nodes, so the rules should be different.
But the questioner asked about CUT vs. ALL nodes. Here too there could be probabilistic reasons why one type should be reduced or pruned more than the other, but in this case the empirical evidence is much less clear, and the top programs do not all agree. I believe that in the specific case of SINGULAR EXTENSIONS the distinction is considered important, even according to the Deep Blue originators of the idea. But then you don't believe in those either.
My opinion is that making the CUT/ALL distinction does have some value if you have the resources to detect small gains over many thousands of games. It's not worthwhile for most hobbyists I think.

lkaufman · Post by **lkaufman** » Thu Mar 29, 2012 12:46 am

diep wrote:
You also say that the amount of null reduction should go up with the amount of LMR reduction. But do you mean maximum reduction, average reduction, or what? A typical program might not reduce the first 3 moves, might reduce the next seven by one ply, and might reduce the rest by two plies. So how would you calculate the "proper" null reduction for such a program?

Larry
Larry to say it less careful - what i claim is that because we do nullmove recursive, i never would want to use myself a huge reduction factor for it.

However, if you already take huge other risks in search, knowing nullmove is a very strong assumption, much stronger than any reduction, it doesn't make sense to have the reductions get more effective in reducing than nullmove; so i would not want small reduction factors at all then.

With 1 ply reduction : R=3 for nullmove as a constant
with 2 ply reduction : R=5 for nullmove as a constant

Note that i'm not a big fan of 2 ply reduction right now.
Might change in time, but i'm not right now.

So no R that gradually becomes slowly larger. A constant reduction.

In fact i find R=3 already pretty tricky. It's a huge reduction for just
1 bad move, knowing that we do this RECURSIVE.

Okay, so you are not talking about LMR, but about reducing moves based on chess criteria, for example moves that don't attack or defend anything or improve the score in some obvious way (I'm sure your own criteria are more complex than this). So it sounds like you mean the reduction that is applied to the majority of moves, excluding CUT nodes where you never need try such moves. If so, in my above example the reduction would be most typically 2 ply at ALL nodes, so you would advocate R = 5. Is that correct?
I never thought that null move reduction and LMR reduction were that closely related, but perhaps you are right.

Larry

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 29, 2012 2:28 am

I haven't followed the discussion but I agree it don't make much difference to make distinction between CUT/ALL. If at a possible CUT node you don't get a cut off within the first 3 moves, then it is 99% likely that was infact an ALL node. So to benefit from CUT/ALL distinction you have to make pruning decisions within those 3 moves. Most had n_moves >= 3 for LMR which makes this rare. Infact switching the node type after searching 1 move may also be very safe. A typical example is YBW algorithm that does parallel search after searching exactly 1 move. ALL nodes are good for YBW. Another YBW variants with CUT/ALL node distinction improved it only slightly (about <= 5% from memory?) according to original inventors. PV/non-PV is a gamble where you try to have a longer PV compared to other parts of the tree. Do more extensions, less reductions at PVs etc.. Don't be surprised if that didn't work for some! You are usually comparing a certain class of engines ( engines close to Komodo) which may all be exploiting a certain optima. The rest could be working on another hill! See the discussion on singular extensions some time back that supposedly gave 40 elo to some, but actually didn't for others...

lkaufman · Post by **lkaufman** » Thu Mar 29, 2012 3:51 am

Daniel Shawul wrote:I haven't followed the discussion but I agree it don't make much difference to make distinction between CUT/ALL. If at a possible CUT node you don't get a cut off within the first 3 moves, then it is 99% likely that was infact an ALL node. So to benefit from CUT/ALL distinction you have to make pruning decisions within those 3 moves. Most had n_moves >= 3 for LMR which makes this rare. Infact switching the node type after searching 1 move may also be very safe. A typical example is YBW algorithm that does parallel search after searching exactly 1 move. ALL nodes are good for YBW. Another YBW variants with CUT/ALL node distinction improved it only slightly (about <= 5% from memory?) according to original inventors. PV/non-PV is a gamble where you try to have a longer PV compared to other parts of the tree. Do more extensions, less reductions at PVs etc.. Don't be surprised if that didn't work for some! You are usually comparing a certain class of engines ( engines close to Komodo) which may all be exploiting a certain optima. The rest could be working on another hill! See the discussion on singular extensions some time back that supposedly gave 40 elo to some, but actually didn't for others...

I'm not so sure that your reasoning is correct. Just because you are (let's say) on the 7th move at an expected CUT node which will probably turn out to be an ALL node, it does not follow that the previous node (or the one before that) was mis-typed; the next move tried on the previous ply may produce the cut. So it may well be that reducing the 7th move more (or less) at a CUT node than at an ALL node is still justified based on the earlier node type. At least, if this is not so, then Rybka, Ippo, Ivanhoe, Houdini, and Critter are all doing something silly. I don't believe this.

BubbaTough · Post by **BubbaTough** » Thu Mar 29, 2012 4:05 am

lkaufman wrote:At least, if this is not so, then Rybka, Ippo, Ivanhoe, Houdini, and Critter are all doing something silly. I don't believe this.

Your conclusion may (or may not) be right, but I abhor your reasoning. You give much too much respect to the techniques used in the top couple of engines in my opinion. I think its more useful, and more accurate, to assume almost everything in the top engines can be improved until you prove to your satisfaction otherwise. If you look at the best programs 10 years ago, and identify the things they got perfect, you will find very few. There is no reason the same won't be true 10 years from now.

-Sam

When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?