Does anyone have experience with this simple form of SMP? If so, what kind of results did you get? My early results are below, and I would be interested to know if they are atypical.
Below are some preliminary results for elo improvement for 1 thread, 2 threads, and 4 threads versus a gauntlet of 10 opponents. For comparison, I've included the 1 thread version run with twice the search time to give a baseline for expected improvement for an exact doubling of search speed. So they are named as follows:
vsmp1 through vsmp 4 --> 1 through 4 threads respectively
vsmp1x2 --> 1 thread version run with twice search time
Code: Select all
Rank Name              Elo    +    - games score oppo. draws 
   1 EXchess vsmp4     140   16   15  1456   69%     1   22% 
   3 EXchess vsmp1x2   105   18   18  1000   64%    -1   24% 
   5 EXchess vsmp2      70   18   18  1000   60%    -1   24% 
   7 EXchess vsmp1       0    7    7  6900   50%    -1   24% 
One additional fact that surprised me is that the time-to-depth improvement was not nearly as good as the actual playing results. Below are the time-to-depth results for various numbers of cores.
Code: Select all
Results for time to complete iteration depth 12 for 317 positions
------------------------------------------------------------------
EXchess_vsmp1: Total NPS = 845331  <depth> = 12 <time to depth> = 0.305s
EXchess_vsmp2: Total NPS = 1637837 <depth> = 12 <time to depth> = 0.233s
EXchess_vsmp4: Total NPS = 3037585 <depth> = 12 <time to depth> = 0.185s
A couple of ideas that might explain this...
(1) My implementation actually has half of the threads searching at the specified iteration depth, and the other half search at iteration depth+1, so the average depths in the above table are an under-estimate of the quality of the search for the smp2 and smp4 cases which have some threads returning hash table results from 1 ply deeper in some positions.
(2) EXchess prunes extensively, and this is influenced to a great extent by the history table, reply table, and killer moves. All of these things are independent between the threads, so different threads can, in principle, prune quite differently. The results are communicated back through the various hash tables, including the combination move hash table that I described in a post a few months ago.... so perhaps I am seeing some additional benefit from the threads searching different moves and thus being less likely for all of them to prune a key move from a position?
- Dan