SMP: on same branch instead splitting?

Highluder · Post by **Highluder** » Fri Jan 23, 2015 10:09 am

Hi,

has somebody experience what happens when in SMP (all?) threads analyse the same position?

I've read that an engine uses this with success. Because other processes fill evaluated positions in the same hashtable this is supposed to speedup the (main) process.

Has someone tested this? How is avoided that different processes do redundant work and analyse the same position at the same time before it is stored in hash?

What happens when hashtable gets full? Slowdown or not?

How does it work with many (e.g.16) cpus? Is it's scalability better than normal SMP?

Highluder · Post by **Highluder** » Fri Jan 23, 2015 11:35 am

In original was written:

"So, it seems instead of parking threads when they have no work to do, it's better to spin them on the same search tree, since the hashtable was already mostly filled out."

This may improve SMP performance.

Perhaps this can be more generalized, in some kind of way i don't know. Maybe s.b. finds a clever trick to do SMP other than classical splitting. Maybe searching some knodes again is faster than splitting overhead.

Komodo 8 has a very effective SMP. Everyone is asking himself how they do it.

Frank

bob · Post by **bob** » Fri Jan 23, 2015 1:11 pm

Highluder wrote:Hi,

has somebody experience what happens when in SMP (all?) threads analyse the same position?

I've read that an engine uses this with success. Because other processes fill evaluated positions in the same hashtable this is supposed to speedup the (main) process.

Has someone tested this? How is avoided that different processes do redundant work and analyse the same position at the same time before it is stored in hash?

What happens when hashtable gets full? Slowdown or not?

How does it work with many (e.g.16) cpus? Is it's scalability better than normal SMP?

Speedup is lousy for more than 2 processors. In fact, it is lousy for 2 as well... there are lots of old threads on this topic...

mar · Post by **mar** » Fri Jan 23, 2015 1:47 pm

Hard to say, I'm using lazy smp as described by Dan Homan, i.e. each other thread starts crunching on depth+1.
Whenever one of the helpers (or slaves if you wish) finishes the iteration, all others are aborted immediately.
The good thing is there is zero synchronization/copying overhead except at the start of each iteration, no need to specify minimum split depth and most importantly the implementation is trivial compared to YBW.
I don't have enough data on this but judging from CCRL I get about 100 elo for 4 cores vs 1.
When I look at YBW engines they get about the same.
Also from what I've seen in TCEC it had no problems competing with state of the art smp implementations (could have been luck of course).
I also suspect that (some?) YBW engines don't scale at all above 8 cores but I have no data about how lazy smp (if at all) scales above 4 cores.
I certainly don't plan to switch to anything else.

SMP: on same branch instead splitting?

SMP: on same branch instead splitting?

Re: SMP: on same branch instead splitting?

Re: SMP: on same branch instead splitting?

Re: SMP: on same branch instead splitting?