There's a reason. It has been explained many times. Try this:Dann Corbit wrote:Crafty does not ponder the opponent's move after making a move. Instead, crafty ponders on the expected response. This is an example of speculation. You told me that you got a speedup from that. Don Homan does the same thing in his parallel search and he saw a speedup. But people do not do that in the normal alpha-beta serial search.bob wrote:IF it would work for a parallel algorithm, EVERYBODY would be doing it in the serial algorithm.Dann Corbit wrote:The fastest parallel sorting algorithms are different than the fastest serial sorting algorithms because you can do things in parallel that are not efficiently done serially.Michel wrote:I think this is essentially Bob Hyatt's argument. He has another argument which I actually like more. A SMP search with N threads can be emulated using virtual threads with a single threaded search which runs on a clock which is N times faster.Evert wrote:I've seen this argument before, but no one has ever given an explanation I can understand for this claim.BBauer wrote: NPS speedup may be an upper bound on parallel search speedup
but
it is *not* an upper bound on Elo improvement.
Suppose that we have a super-speedup Elo improvement from a parallel search. Where does this improvement come from? The only possibility I can see is that it comes from the parallel search examining nodes that the single thread search did not examine, but should have (counter argument: if it didn't have to examine them, then the parallel search would not have needed to examine them either). In other words, the parallel search has a super-speedup Elo improvement because it fixes a defect in the normal linear search. If you would fix that defect, the super-parallel improvement would go away.
So as far as I can see, yes, you can have super-speedup Elo gain from a parallel search, but only if the non-parallel search is sub-optimal. Can someone explain why that wouldn't be true?
Note that saying that the non-parallel search is sub-optimal and can be improved doesn't imply that it would be in any way easy to do. Relying on the non-determinism of a parallel search to repair defects in the normal search sounds unreliable though, and I would also qualify it as bad engineering.
So this shows that any (hypothetical) super speedup with an SMP search can be realized using a single threaded search as well.
Yes, you could do things serially like look ahead with your one thread and this is even done with pondering. But it is a risky strategy for a single threaded app and I doubt if anyone is doing it.The fastest serial comparison sorts are O(N*log(N)).Your sort statement is not exactly correct. The fastest sort algorithms are always going to be N log N. But those are hard to do in parallel due to the synchronization while sub-splitting the lists. But going to something that works better in parallel takes a big performance hit (for example, a simple bubble sort scales really well in terms of processors, but it starts off at N*N vs N log N. It therefore starts off significantly worse than the N log N sorts.
Bitonic Sort is a so-so sort on serial machines, but one of the fastest on parallel machines because of the difference in architecture.
Sample Sort is generally considered the fastest sort on SMP machines, but it is not the fastest sort on serial machines.I agree that minimax is the wrong algorithm for parallel search in chess.Alpha/Beta is sqrt(W^D) basically. Even 1024 processors doesn't make minimax overtake alpha/beta, and minimax is the most efficient parallel search one can do since everything has to be searched.
For example, given W^D of 10 billion (pretty easy to search today even on one CPU)
alpha/beta = 100K
minimax = 10B. How many processors needed to reduce 10B to 100K (the break-even point)? 100,000. Not doable in any affordable machine today or in 5 years. So alpha/beta rules the roost, and traditional parallel search rules alpha/beta. And note that 100K processors just breaks even with alpha/beta, it would be no faster or better. To even 2x faster would require 200K processors which is even more "out there". A good parallel search today can certainly produce 8x faster with 16 cores. Minimax would be utterly hopeless.
I just looked at a 120 minute game played on ICC (very long time control). In that game Crafty correctly predicted its opponent's move exactly 75% of the time.
Two choices:
(1) ponder just the expected move. 75% of the time you are correct, and 75% of the time you save ALL of the pondering time.
(2) Do a search over all of the opponent's moves. When he moves, you have been searching everything, so you can't move yet. You may have spent 1/4 of the time on the right move, 3/4 (or worse) of the time on the other moves. You get to keep right on searching, saving 1/4 of the time for each move you ponder.
If the game lasts 60 moves, option (1) will give you 45 moves for free (pondering correct move for correct amount of time). Option 2 will give you 1/4 of 60 moves in times saved. That is 1/3 the savings. So which is better? This is an old topic. Many have tried alternatives. Everyone ends up predicting a move and pondering that, because it is the most efficient/effective use of time...
For your sort issue, how do you measure speedup? Hopefully not by taking an inferior sort algorithm and then running on 1 and then 32 cores. Correct is an optimal serial sort for 1 cpu, then any sort you want for the parallel machine. For reasonable numbers of cores, it is very difficult to beat the n log n sorts.