### back to the Komodo SMP issue

Posted:

**Mon Jul 01, 2013 7:12 pm**Here's the key point that nobody has even begun to address. Let's take a hypothetical position, where a program takes exactly 10 minutes to search to depth D. And when it finishes we measure an EBF of 2.0 exactly (to keep the numbers simple).

You run this with 4 cpus, and discover that to search to depth D, we get a time of 7 minutes. Speedup = 10 / 7 = 1.4x. SMP efficiency = 1.4 / 4.0 = .35 (or 35%)

Now, since the SMP efficiency is so low (as is the speedup) let's relax the rules a bit to make the search wider. This eliminates errors due to pruning/reductions, but it slows the search down, and now our serial search is obviously going to take much longer than 10 minutes. If you make the tree 2x larger, which is probably the minimum you would expect to produce a +130 Elo gain, you just doubled the search time for one cpu to 20 minutes. 20 / 1.4 = 14.2 minutes for the parallel search. So we made the search wider, only to give us a larger tree that we don't search very efficiently, and the time is longer than the original serial search, whcih means it is not going to gain a thing.

All this is predicated on the serial search being well-tuned regarding reductions and pruning so that there is little expected gain from further tweaking, when using only one cpu. After thinking about this from every reasonable angle, I am STILL not convinced that this concept is valid. If you could claim "OK, if we use 2 cpus, we get 1.4x, or if we use 4 cpus we STILL get 1.4x, that clearly shows the extra 2 cpus are not helping at all. But exactly HOW would you get them to doing something helpful by making the tree wider, once you know that the FINAL tree result is still just 1.4x faster with 4 cores.

If one uses the usual doubling speed = +70 Elo, we need two doublings to reach that +130 Elo number that was discussed. It is pretty straightforward to figure out what kind of EBF increase one needs (given fixed time) to produce about 2/3 of that Elo gain (widening the tree, given an SMP speedup of 1.4x) I almost went down that road, but as I started the idea simply looked completely unsound for the reasons given above. As you go wider, you lose depth since the SMP search is doing so poorly. I don't see any way around this.

So perhaps someone is willing to participate in a technical discussion that is based on something beyond vague comments to come up with a way to make this actually work.

I think a reasonable starting point is depth=24, time=10 minutes, EBF=2.0, 1 cpu. Depth=24, time=7 minutes, EBF=2.0, 4 cpus, and figure out how we get to +130 Elo given that we are only expecting maybe 15 - 20 Elo from the SMP search gain...

Of course, in a real game, that last depth=24 will increase somewhat, but will not reach 25 since we need 2x speedup to reach the next ply with an EBF of 2.0...

You run this with 4 cpus, and discover that to search to depth D, we get a time of 7 minutes. Speedup = 10 / 7 = 1.4x. SMP efficiency = 1.4 / 4.0 = .35 (or 35%)

Now, since the SMP efficiency is so low (as is the speedup) let's relax the rules a bit to make the search wider. This eliminates errors due to pruning/reductions, but it slows the search down, and now our serial search is obviously going to take much longer than 10 minutes. If you make the tree 2x larger, which is probably the minimum you would expect to produce a +130 Elo gain, you just doubled the search time for one cpu to 20 minutes. 20 / 1.4 = 14.2 minutes for the parallel search. So we made the search wider, only to give us a larger tree that we don't search very efficiently, and the time is longer than the original serial search, whcih means it is not going to gain a thing.

All this is predicated on the serial search being well-tuned regarding reductions and pruning so that there is little expected gain from further tweaking, when using only one cpu. After thinking about this from every reasonable angle, I am STILL not convinced that this concept is valid. If you could claim "OK, if we use 2 cpus, we get 1.4x, or if we use 4 cpus we STILL get 1.4x, that clearly shows the extra 2 cpus are not helping at all. But exactly HOW would you get them to doing something helpful by making the tree wider, once you know that the FINAL tree result is still just 1.4x faster with 4 cores.

If one uses the usual doubling speed = +70 Elo, we need two doublings to reach that +130 Elo number that was discussed. It is pretty straightforward to figure out what kind of EBF increase one needs (given fixed time) to produce about 2/3 of that Elo gain (widening the tree, given an SMP speedup of 1.4x) I almost went down that road, but as I started the idea simply looked completely unsound for the reasons given above. As you go wider, you lose depth since the SMP search is doing so poorly. I don't see any way around this.

So perhaps someone is willing to participate in a technical discussion that is based on something beyond vague comments to come up with a way to make this actually work.

I think a reasonable starting point is depth=24, time=10 minutes, EBF=2.0, 1 cpu. Depth=24, time=7 minutes, EBF=2.0, 4 cpus, and figure out how we get to +130 Elo given that we are only expecting maybe 15 - 20 Elo from the SMP search gain...

Of course, in a real game, that last depth=24 will increase somewhat, but will not reach 25 since we need 2x speedup to reach the next ply with an EBF of 2.0...