Werewolf wrote:bob wrote:
That assumes that if you have 4 real cores, and you test with 4 threads, and then use 8 logical cores (HT on) and rul with 8 threads, then the tree will grow about 30% in size due to the parallel search overhead.
alpha/beta is a purely sequential algorithm as defined. You need to establish a bound at each node, by searching the best move first, then you use that bound to search the remaining nodes more efficiently. When you don't do this (and you can't in a parallel search) you search a larger tree to reach the same depth..
For (b) yes. It is not a "core" issue but a "number of threads" issue.
(c) think about it. Going from 4 to 8 threads makes the tree 30% larger. If you don't speed up enough with the extra 4 threads to offset that loss, you see a net decrease in performance. If the NPS increases by more than that amount, you see a (small) net gain.
Bob, yet again, very informative.
Please can you evaluate this statement which I've tried to write on the basis of what you've said:
i) Alpha-beta search is a sequential.
ii) When good moves are identified they can be prioritised which increases search efficiency as the search goes on.
iii) But parallel search prevents ii) to a certain extent, due to each thread potentially examining the same positions as other threads at the same time. This creates a degree of redundancy.
Not necessarily the "same position", but redundant work. For example, suppose I give you the following task:
There is a wall in front of you with three arm-sized holes in it. Your assignment is to insert your hand into each hole, for one minute, and at the end of the test, tell me which one was the most pleasant / least unpleasant, depending on what happens.
You insert your hand into hole one and you are greeted with warm water. Then warm air. After one minute you remove your hand and insert it into hole two. You are immediately stuck with a needle. No need to stick around any longer, that is already worse than hole # 1, and it might get even worse if you wait around. You move to hole three. Here you are greeted by frigid ice-water and you again immediately bail out as it could get worse. After 1 minute + maybe 1 second for each hole, you conclude "hole one was the most pleasant / least unpleasant."
Now do that with three people, which should go 3x faster, right? Each sticks his hand into his respective hole. The guy in hole 2 can't give up after the pin prick, because the other two don't know how good or bad their respective holes are. All three have to stay the entire minute. After a total of one minute spent, helper 1 reports on his warm/wet hand, helper two reports he lost two fingers, and helper three reports his hand is frozen solid. After comparing notes, they conclude, also, that hole 1 was better.
That's how alpha/beta works. If you search the best move first, and we typically get that right 90%+ of the time, then we can quickly dismiss the inferior moves. But if we don't know the value of the first when we start to work on the second and third (in parallel) we have to do extra work...
Hope that helps.
iv) Thus, for every doubling of threads the search tree grows by 30% without increasing depth, due to overlap.
Not necessarily "overlap". Just doing "unnecessary work" (work that the normal serial search would not do).
v) Therefore, in order to see an elo performance gain when doubling threads the NPS must >1.3 times what is was before.
In addition to that statement can I make one more?
a) HT works by dividing the core's time between two threads
b) Therefore for HT to be effective a thread cannot demand more than 50% of the Core's processing power.
No. The biggest HT gain comes from memory accesses. When you get a L1 cache miss and have to wait for 20 or so cycles or whatever for L2, or longer for L3, or MUCH longer for main memory, the other logical "core" can use the resources to continue since the first thread is "blocked" (much like what happens in a multiprogramming operating system when a process does I/O and others run while it is blocked.
c) Chess demands 100% processing power of each core
d) Therefore HT will simply decrease performance for chess by 30%, for reasons stated above.