strategies for finding slowdows in lazy smp

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

smatovic
Posts: 2640
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: strategies for finding slowdows in lazy smp

Post by smatovic »

flok wrote: Wed Jun 05, 2019 1:36 pm
smatovic wrote: Wed Jun 05, 2019 1:33 pm
flok wrote: Wed Jun 05, 2019 1:27 pm Currently my main thread is the master-thread. If that one decides the search is finished, then all others terminate as well.
And what happens if a helper finishes its search?
It goes on with the next iteration if applicable. Else it'll busy-loop :oops: until the main-thread catches up.
Hmm, not sure, but such an busy-loop could cause your nps drop.

--
Srdja
User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: strategies for finding slowdows in lazy smp

Post by flok »

smatovic wrote: Wed Jun 05, 2019 1:43 pm
flok wrote: Wed Jun 05, 2019 1:36 pm
smatovic wrote: Wed Jun 05, 2019 1:33 pm
flok wrote: Wed Jun 05, 2019 1:27 pm Currently my main thread is the master-thread. If that one decides the search is finished, then all others terminate as well.
And what happens if a helper finishes its search?
It goes on with the next iteration if applicable. Else it'll busy-loop :oops: until the main-thread catches up.
Hmm, not sure, but such an busy-loop could cause your nps drop.
Could be but this won't happen at the start-position, only when detecting mate/draw.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: strategies for finding slowdows in lazy smp

Post by Dann Corbit »

I guess that if your threads quit working when they complete the depth, and the parent monitor waits for all of them to complete, you would have loss for that reason. It would also explain the greater loss with more threads.

Here is an idea:
When a thread finishes the required depth iteration, if it has not received a stop signal, have the thread do a winfinder zero window search 1/2 pawn above the current best score, and just keep going deeper and deeper.

It could still signal the main thread that it is done before it starts the winfinder search, and it would be doing something useful.

If my theory is correct, then longer time control searches will show the NPS loss less and less with longer and longer time control with your current code base.

Your code won't compile on Windows even under msys2 + gcc, so I can only experiment on my UNIX boxes which are not at my disposal right now.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: strategies for finding slowdows in lazy smp

Post by Sesse »

jdart wrote: Wed Jun 05, 2019 12:20 am A profiler might also be helpful. You could try OProfile for Linux (http://oprofile.sourceforge.net/news/).
oprofile? Is this 1999? :-)

perf is really what you want these days; it has basically all the mindshare. If you want something more GUI-ish and commercial, there's stuff like VTune.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: strategies for finding slowdows in lazy smp

Post by syzygy »

flok wrote: Tue Jun 04, 2019 9:06 pm Now my question is: what are strategies for finding what causes this slow down?
The threads share no common variables apart from the transposition table. That tt has no locks, it uses the xor-trick.
Most likely the threads share common cache lines, which has the same effect as sharing common variables.
User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: strategies for finding slowdows in lazy smp

Post by flok »

syzygy wrote: Thu Jun 13, 2019 11:57 pm
flok wrote: Tue Jun 04, 2019 9:06 pm Now my question is: what are strategies for finding what causes this slow down?
The threads share no common variables apart from the transposition table. That tt has no locks, it uses the xor-trick.
Most likely the threads share common cache lines, which has the same effect as sharing common variables.
But why doesn't stockfish have this problem?
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: strategies for finding slowdows in lazy smp

Post by syzygy »

flok wrote: Sat Jun 15, 2019 1:59 pm
syzygy wrote: Thu Jun 13, 2019 11:57 pm
flok wrote: Tue Jun 04, 2019 9:06 pm Now my question is: what are strategies for finding what causes this slow down?
The threads share no common variables apart from the transposition table. That tt has no locks, it uses the xor-trick.
Most likely the threads share common cache lines, which has the same effect as sharing common variables.
But why doesn't stockfish have this problem?
The Stockfish threads do not share common cache lines. Apart from the TT, but shared TT cache lines are rare enough. The problem occurs when two or more threads keep reading and writing the same cache line.

However, it seems that in your case the slowdown occurs only when there are (almost) as many search threads as processor threads. Maybe it is not false or true sharing of cache lines but background tasks that are consuming cpu.