Uri Blass wrote:bob wrote:Uri Blass wrote:bob wrote:Werner wrote:Hi,
I took 10 positions from WM-Test and had following results:
1->2CPU: 142% and
2->4CPU: 146% average
What does this mean?
For example, if it takes 2 minutes to solve N positions, and it take 6 minutes to solve the other N, than a 2 minute search with 2x the processors is going to have no effect on the results.
The right way to measure SMP performance is to take a large set of positions, and search with 1, 2, 4, ..., N processors, and measure the time required to reach a specific depth for each position.
A _much_ less accurate way is to play a one cpu program against a bunch of opponents, then keep everything the same but use two cpus for that program and see if it performs better. It is much less accurate because the variance/randomness in a basic game of chess is extremely high. And when you factor in the randomness produced by a parallel search, it goes even higher. You need thousands of games (at a bare minimum) to get a reasonable estimate on improvement. Most can't pull that off.
You assume that depth means the same with 1 processor and more processors.
It is not obvious.
For example it may be possible that the program does less late move reductions with more processors.
Uri
I don't see why. But if you search the _same_ depth, why would there be any difference with respect to number of processors used? That gives an accurate SMP speedup number. Which is a different number than what you get if you try the same program against the same opponents, but you vary the number of processors. now, rather than measuring exact speedup, you are measuring the effect of searching faster with respect to Elo improvement. Which is also an interesting number, but is independent of the parallel speedup as it is generally defined.
My poiint is that parallel search may change the tree so the same depth may not be equivalent.
if your late move reduction has a rule not to reduce the first 3 moves then with parallel search you may get different moves as the first 3 moves.
Uri
If you implement a parallel algorithm like that, then those of us teaching this stuff would call that "badly flawed".
The first principle of parallelizing something is to ask the question "If I do these "things" in different (or random) order, will it change the result?" If the answer is YES, then we can't do a parallel operation here. If someone implements reductions as you describe, then they have an algorithm flaw that is going to cause problems. Crafty does not behave like this at all. It doesn't consider the first N as non-reducible, rather, it doesn't start reducing until the first few move selection passes are completed to remove the captures, killers, etc, and then it doesn't reduce checks and such. So The same move would not be reduced if I search in one order and be reduced if I search in a different order. To do so would violate an important parallel programming principle.
Now, that said, all parallel chess searches violate the above in a way that can cause an issue here and there, because the pure order the moves are searched in can influence the final result because of the transposition table. This is unsolvable unless the hash table is removed, which is an unreasonable solution, so we just accept the fact that sometimes the search will not return exactly the same value (or best move) if the same search is repeated several times.