syzygy wrote: duncan wrote:
syzygy wrote:Bob's argument is logically flawed in more than one way.
to avoid what bob will call handwaving perhaps you could spell it out or reference it if you have already done so, so bob can respond.
It has been spelled out quite a few times already and there is no use in Bob responding to it once more (although I know he will do it).
But since you are asking, I will respond for you (not for Bob).
The usual single-threaded search algorithms are inherently serial, which means it cannot be the same as the parallel search algorithm. The parallel search algorithm is inherently less efficient: an N-threaded search on N cores is not as effective as an 1-threaded search on an N times faster single core.
This inefficiency (and I am talking about common sense inefficiency and not about Bob's out-of-context textbook (in)efficiency), although unavoidable, is not fixed. It of course depends on the SMP implementation. It may also depend on the underlying single-threaded search.
You obviously don't understand parallel search. There is no "out-of-context" inefficiency here. There is only ONE place in a modern engine where the parallel search loses efficiency. Only ONE. Extra nodes searched due to inaccurate split point detection. Nothing else. And that overhead (extra nodes) is the ONLY thing that prevents 4x speedup using 4 cores. There's no waiting or synchronizing overhead to speak of, certainly nothing rising to even 1% of total overhead.
So how about sticking to factual information for a change?
Mimicking the single-threaded search as much as possible (so purely going for added depth) may, depending on these factors, be problematic.
It may be difficult or even impossible to take the same reduction/extension decisions as the single search would, so that it may turn out to be cheaper or even necessary to accept a wider search (at the cost of depth, but in return for higher quality at the same depth). If this is the case, then Bob's argument does not come into view: we would have loved to have the same selectivity as the default single threaded search, but it was simply not possible / too hard / too expensive to get right.
It may also be the case that the "SMP inefficiency" just happens to be significantly greater for the serial reduction/extension scheme of a particular engine than it is for a less agressive reduction scheme. Bob has stated multiple times that the speed up values from old papers don't anymore apply to today's highly selective searches. If a slightly less selective search has better SMP efficiency (in Bob's textbook sense) when going parallel than the default single-threaded selective search, then the gain in SMP efficiency may outweigh the loss due to lower selectivity. If this is the case, then Bob's argument clearly does not hold: going wider on a single core would only incur the loss of lower selectivity and not profit from better SMP efficiency, simply because there is no SMP on a single core.
What is with all the "ifs"??? One can absolutely do the same reductions and pruning in a SMP search that they do in a serial search. Such a statement is simply ridiculous on the surface, much more so if you go deeper into the concept.
I've asked for technical information. You again produce hand-waving. If you go wider, you lose depth. Period. The only way to recover that depth is through a traditional parallel search. If the implementation only produces a 1.4x speedup at 4 cores, you are NOT going to regain what you lose from taking the EBF from 2.0 to 2.1 (numbers others mentioned).
So, again, no hand-waving. Explain EXPLICITLY how this is going to be done, giving a little math to go with the hand-waving, so that it can be analyzed and critiqued. At present, you offer ZERO details, just "this must be so."
It is quite clear that a speedup of > 3.0 with 4 cores is easy enough to produce. No one cares about 128 cores at the moment. Such machines are not available to 99.999% of the users due to cost. But 4 / 8 core boxes are readily available and >3x on 4 cores or >6x on 8 cores is pretty common.
All I have claimed, regarding older programs, was that it was somewhat easier to get closer to optimal because the trees were more uniform. But they were far from perfectly uniform. I have been tweaking reduction and pruning rules for a while, and none of those changes influences the SMP speedup in any significant way. So I do not believe that is an issue worth discussing. Houdini seems to be highly selective, yet it speeds up pretty normally from what I have seen. I know my program does. So again, a pointless debate about an unimportant detail.
What also must be kept in mind is that the development of a chess engine is full of trade-offs. It is safe to assume that Komodo's current SMP implementation is the best that Don could achieve in the time he had available for it. It might be that further improvements will allow him to go purely for depth (at the cost of an investment in development time). We have no way of knowing this, but one thing should be clear: nobody can ever be criticised for not yet achieving the optimal implementation.
Nobody (at least not me) is criticizing ANYTHING here other than the vague, hand-waving arguments about what is going on and why. The ONLY viable place one can use SMP hardware is to parallel search the tree. Parallel move generation and such simply are not nearly as effective. One can change the shape of the tree all they want, but the SMP search still has to produce a speedup or it is worthless.
If you want to claim "OK, Komodo searches a bigger tree so that the time to depth is only 1.4x, but if he had gone purely for depth then he would see >3x, then fine. Now you have to explain why the bigger tree is better than the extra 1.5 plies the >3x would produce. As now you are factually trading depth for more width. And that tradeoff would be just as effective for the serial search.
Just define the parameters, and this can be mathematically examined, without all the "ifs and buts".