Questions on SMP search

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions on SMP search

Post by bob »

Gian-Carlo Pascutto wrote:
bob wrote: Please listen again. For best first you have to store _everything_. I am not talking about UCT, minimax, alpha/beta, or anything else. The question was about best-first. It is completely unreleated to depth-first search as we use in chess programs.
Bob, I already pointed out that there is a hard, mathematical proof that many best-first algorithms, including those relevant to chess, search an identical tree to alpha-beta with a sufficiently large hashtable. So what you say above is already proven wrong in the strictest way possible.

Your statement is simply wrong. here is why.

For best-first, I _MUST_ store the entire tree as it is searched, because there is no way to know when you will need to come back to a lousy-looking position, simply because all the other positions eventually become worse-than-lousy.

With alpha/beta there is no such requirement at all. And, in fact, alpha/beta works with no transposition table at all if you are willing to give up the efficiency.

Yes, you can prove that they search the same tree under exceptional circumstances. But alpha/beta is still order(D) memory usage while best-first has to store the entire thing. That is a significant difference.

Proofs that consider "no bounds" on resources are not very useful. Alpha/beta works with any memory bound you care to impose, so long as you have enough memory to store state for each ply [O(D) in space]. Best-first won't give you that option.

that was, and is, my point on the difference between the two. Try to do a best-first search with enough memory to store the state for 25 nodes, and get to any remotely acceptable depth, while I do an alpha/beta search to depth=25 with that same amount of memory...



Even if you fail to grasp this, saying that UCT is not best-first is insane. It has the exact start-at-root,pick-best-node, expand-best-node, backtrack-node-result-up-to-root cycle that characterizes best-first algorithms.
Did I say that? If so it was simply improperly worded. Certainly was not intentional.

And you know what? It can be rewritten as depth-first!

Wait, let's take another example. Proof-number-search. Another classical example of a best-first algorithm.

You know what? It's been rewritten as depth-first, too!

Notice a pattern here?
Most theory books give a proof for a 1-tape turing machine, and then an N-tape turing machine, and then they give a proof that the 1-tape TM can exactly emulate the N-tape machine, although at a huge cost.

Nobody has said that you can't make BF and DF searches search the same tree. "In theory". In practice, is another issue entirely... With different issues (storage requirement is one big one).
Best-first is more human like, to be sure. But it has never approached depth-first in chess or checkers. Perhaps in GO or another game, who knows.
Again, this is already proven wrong. Chinook uses MTD(f), which is best-first (again, a hard, mathematical proof is in the thesis of Aske Plaat). Cilkchess also used MTD(f). All the strongest suicide and losers chess engines (which btw has a very NARROW tree): best-first.
MTD(F) has nothing to do with best-first, so I am not sure what you are talking about there. MTD(F) is simply a null-window alpha/beta (depth-first) search that requires several passes at each iteration to zero in on the best score. What on earth does that have to do with "best first"?

Your "argument" about storing everything is a complete straw man.
From one side, we can make the hashtable large enough (or use a normal std::map) and just store everything, which works until we run out of memory. From another side, *practical implementations* of best-first searches have to deal with the possibility of memory running out and will do garbage-collection. There is no difference between the two.

The distinction between best-first and depth-first is an artifact of days long gone when it was not realized yet that they were one and the same.

But it was ended in 1995. It's time to move on.
They are far from "one and the same" except at the theoretical level. However, I am more interested in your "MTD(F) is best-first. As I did an MTD(F) crafty years ago when Don Daily was a big proponent. It didn't work very well for me and was significantly slower. I notice he no longer uses it either.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Questions on SMP search

Post by Laskos »

Rein Halbersma wrote:Just to abstract from the name-calling and confusion.

Bob's definition of EBF (depth) = nodes(depth) / nodes(depth-1). For a perfectly regular tree, nodes(depth) = width^depth and EBF(depth) = width, independent of depth. So far so good. Marco's observation was that in irregular trees the extra nodes at the next depth iteration are not all leaf nodes. The pyramid analogue really explains this nicely: adding a layer to the sides, some stones to the bottom etc. In such irregular trees, a single EBF doesn't contain all the information about the shape of the tree.

Generalizing slightly, you can define an EBF at each ply level of the tree as EBF(depth, ply) = nodes(depth, ply) / nodes(depth-1, ply). The claim is that for fixed ply, EBF(depth, ply) is increasing in depth, simply because LMR and NMR are reducing more heavily at lower remaining depth. This is what people mean with "widening the tree". A better term would perhaps be that high in the tree, the tree is "thicker" or "denser". Similarly, for fixed depth, EBF(depth, ply) is decreasing in ply. So lower in the tree (near the leafs), the tree is quite "thin".

So at ply=1, (the root moves) the EBF is probably around 30+, whereas at ply=depth (the leaf nodes) the EBF might be 1. The overall EBF of a tree can still remain fairly constant around 2.

I guess these observations are fairly trivial and could have been made a lot earlier if people were to take the courtesy of interpreting each other words a bit more constructively...

Thanks.

Rein
Just to rehash this old discussion.
I have just constructed a model tree shape of engines using logarithmic reductions a la Stockfish. The math is a bit complicated and is not very illuminating. The shape of the tree as a function of (depth, ply) is given by
(depth)^(log(2)) * log(ply+1)/4,
with the normalization factor chosen as to have the EBF(depth) at a constant=2. The tree shapes are plotted here for depths 10, 20, 35.
Image

From ply 20 to 21 the number of nodes for widening are ~2.5 millions, for deepening ~2.1 million, so 54% for widening.
From ply 31 to 30 the number of nodes for widening are ~3.0 billion, for deepening ~2.1 billion, so 59% for widening. Larger depth you go, proportionally more nodes are going to widening.

And here are EBF(depth, ply) with a constant EBF(depth)=2
Image

At shallow plies, deeper you go in depth, the search becomes almost full width. At the same time, EBF(depth, ply) for example for depth=35, is becoming smaller than 1 at ply 19, so saying for iteration 35 that it's depth 35 search is misleading, as the tree to the depth 35 is searched very sporadically. If we say that we consider "depth" the region where EBF(depth, ply) is larger than 1, then iteration 35 gives something like depth 19 search.