I think that "widening" in context of LazySMP usually means accidental widening.TomKerrigan wrote:Sorry, I go for months (sometimes years) without reading this forum or posting to it. Searching it via Google seems to be impossible and sifting through the keyword search results is difficult. If you could give me a link to the discussion you're talking about, I would be interested to read it.lucasart wrote: ...
I've already posted my TTD speedup numbers in the other thread on ABDADA.
But you still don't get it. TTD is the wrong measure, because the result of Depth=N with Threads=T doesn't just depend on N. It also depends on T. This is the so-called "widening effect". And it gives incredible elo scaling to "lazy SMP" approch, provided well tuned (as shown in Stockfish, Cheng, Demolito, among other).
As for the idea of "widening," I've seen this mentioned a few times and infer that it has to do with late move reductions. The idea being that you search moves to different depths depending on move ordering. And some Lazy SMP implementations randomize move ordering to some degree, meaning that the late moves are pushed forward and searched deeper and that results in an improvement in playing strength. Is that right?
The fact that "widening" works pretty well seems to be proof that these depth reductions are unsound though, doesn't it? I mean, if it's causing the single-threaded search to miss a lot of good moves?
Which is to say, it is an unintended consequence of the method itself.
When you have a giant pile of threads analyzing a position (some possibly with minor offsets) and all reading from the hashes of the others you will get some extra "bushyness" to the tree. The outcome of this extra bushyness seems to be stronger play on the whole.
It is not hard to understand why this might be the case. All the threads are looking for pv nodes and each new bit of information written to the hash can make the searches a little bit smarter. So (for instance) we might find some clever new pv nodes as a result of this searching overlap.
There is a down side to the widening effect also. The search does not get as deep. You can easily verify this with the last non-Lazy Stockfish verses the Lazy version. The newer version was measurably stronger, and the non-lazy version got measurably deeper.
Other Lazy verses Non-Lazy experiments see the same thing. As for strength, there are some experiments that do not show Lazy to be superior, but I think that is really an open question.
What is not in dispute is that LazySMP works really well and is fairly simple to implement correctly.
There is something counter-intuitive about wider and bushier being stronger than deep, but LazySMP has a lot of counter-intuitive things going for it.
For instance, we should not be able to simply add up the nodes going from one thread to two. Yet two LazySMP threads seem to scale magically to nearly 2x both in nodes and Elo. Quite frankly, I don't get it and I would live to hear a truly sensible explanation as to why it should be like that.
Mental picture of 4 threads with no stagger trying to analyze the same root position:
Lots and lots of hash table hits, but these are not zero cost. Good that we avoided searching, but we spent time attempting work that was already completed.
Given enough threads, one would think we would have almost all hits except for the last ply. I would think that would be bad and not good, since there would be so many lookups getting compute energy instead of searching unsearched positions.
Unless we count hash hits as new nodes, I would also think that the NPS would plummet.
And if the hash hits are new nodes, then I think we are getting a false indication of nodes searched.