The basic approach is LazySMP. I'll try to explain what I understand (and as always, explaining also helps a bit in understanding more). Certainly a SF developer can do this much better, though.
"Threads" is a global instance of type "ThreadPool" that contains as many "Thread" objects as configured. In "uci.cpp", go() calls Threads.start_thinking() which wakes up the main thread (thread no. 0 in the pool), which in turn wakes up all other (helper) threads. The latter is done in MainThread::search(). Each thread, including the main thread, now runs its main function Thread::search() which contains the main iterative deepening loop.
After setting up all necessary data structures the iterative deepening loop runs until there is a request to stop or the target search depth (if given) is reached. For each iteration the desired search depth is determined in order to distribute different search depths among all threads. Main thread simply increments depth by 1 each time, helper threads skip some depths based on a clever formula (see SkipSize and SkipPhase in "search.cpp").
When the main thread has completed its search it waits for all helper threads to terminate, too (MainThread::search()). Afterwards the final PV is determined by "voting" for the best thread based on score and depth.