Thread overhead in C++

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

JohnWoe
Posts: 491
Joined: Sat Mar 02, 2013 11:31 pm

Thread overhead in C++

Post by JohnWoe »

In Mayhem 4.1 I added extra thread for checking time.

Is there something wrong?
Here: https://github.com/SamuraiDangyo/mayhem ... .hpp#L2346

But there's a massive overhead w/ threads.

std::mutex was super slow.
atomic<bool> is faster. But still slow.

Both worked well, but super slow.

All this thread locking/mingling costs time.

Am I missing some obvious speedup?
Getting back to 1 thread.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Thread overhead in C++

Post by hgm »

Why would you need a mutex for checking time? Or even, why have an extra thread for his? I can see why it solves some problems when you have one for reading input. But reading the clock is not something that can block you. If you don't want to poll you could set an alarm signal at the time you want to check for, and catch that with a handler that sets an outOfTime flag, which you search thread can test as often as it likes with almos no cost.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Thread overhead in C++

Post by mvanthoor »

Why don't you just check if you exceeded your allotted time, within the search thread?

I check for all termination conditions at every 2047 nodes, which works well. A termination condition can be a received stop, quit, depth, moves, nodes reached, max-depth exceeded, or time up. You could have a different thread to check for those conditions, but as it is the search that needs to make sure that it adheres to these conditions, I don't really see why.

I could, for example, have a different thread for generating moves. Instead of calling the move generator's functions, I could send messages to this thread, and then catch the move list. But why? It just makes search slower because inter-thread communication is slower than handling things in the thread itself.

Personally, I use threads for things that are distinctly separated:
- One "engine" thread (the main thread)
- One OI-thread that reads the input to the engine, and writes responses to the GUI
- One search thread, which runs iterative deepening/alpha beta. (In the future, either the engine or the search thread will spawn more search threads when implementing Lazy SMP. I haven't researched this yet so I don't yet know how I'm going to do it.)

In this engine, everything communicates with the engine thread:
engine <-> io
engine <-> search

So if the search has to send something to the GUI, it'll report that to the engine, which will then send it using the IO-thread. If the GUI wants something, the IO-thread reports that to the engine, which will then send a command to the search thread.

What I DON'T do, is make one thread dependent on the other, because it makes the architecture much more difficult to maintain, and it makes everything slower. So there will never be communication between IO and Search directly. In my architecture, I could add a "timing" thread, but it would then also have to communicate through the engine thread, which would make it even slower than your implementation.

If I were you, I'd just keep it simple and poll for the termination conditions. Time management is finicky enough to get right on its own, let alone if you have to take extra threading overhead into account.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Thread overhead in C++

Post by mar »

whaat? :)

atomics are super-fast if no contention

on x86, atomic load (no matter what ordering) is simply a mov reg, mem plus a compiler barrier before that!
how is that slow?!

since your thread only potentially stores each 5ms, no heavy contention is possible whatsoever

I agree with marcel, just poll each n nodes from within search
Martin Sedlak
JohnWoe
Posts: 491
Joined: Sat Mar 02, 2013 11:31 pm

Re: Thread overhead in C++

Post by JohnWoe »

hgm wrote: Wed Apr 21, 2021 11:19 pm Why would you need a mutex for checking time? Or even, why have an extra thread for his? I can see why it solves some problems when you have one for reading input. But reading the clock is not something that can block you. If you don't want to poll you could set an alarm signal at the time you want to check for, and catch that with a handler that sets an outOfTime flag, which you search thread can test as often as it likes with almos no cost.
Yes, I'm back to the simple method.

The idea was that when I read clock every (like mrvanthoor every 2047) cycles. How long are these cycles exactly on 1000's different CPUs?
Mayhem may run out of time. That's giving away free points. If I read the clock every 5 ms in a separate thread. I know exactly that Mayhem will never run out of time.

But that comes with a cost. This thread implementation is as simple as possible ( std::atomic not mutex ). Still there's some overhead.

7.2 MNPS -> 6.7 MNPS

Not that much but measurable.
mvanthoor wrote: Thu Apr 22, 2021 12:10 am Why don't you just check if you exceeded your allotted time, within the search thread?

I check for all termination conditions at every 2047 nodes, which works well. A termination condition can be a received stop, quit, depth, moves, nodes reached, max-depth exceeded, or time up. You could have a different thread to check for those conditions, but as it is the search that needs to make sure that it adheres to these conditions, I don't really see why.

I could, for example, have a different thread for generating moves. Instead of calling the move generator's functions, I could send messages to this thread, and then catch the move list. But why? It just makes search slower because inter-thread communication is slower than handling things in the thread itself.

Personally, I use threads for things that are distinctly separated:
- One "engine" thread (the main thread)
- One OI-thread that reads the input to the engine, and writes responses to the GUI
- One search thread, which runs iterative deepening/alpha beta. (In the future, either the engine or the search thread will spawn more search threads when implementing Lazy SMP. I haven't researched this yet so I don't yet know how I'm going to do it.)

In this engine, everything communicates with the engine thread:
engine <-> io
engine <-> search

So if the search has to send something to the GUI, it'll report that to the engine, which will then send it using the IO-thread. If the GUI wants something, the IO-thread reports that to the engine, which will then send a command to the search thread.

What I DON'T do, is make one thread dependent on the other, because it makes the architecture much more difficult to maintain, and it makes everything slower. So there will never be communication between IO and Search directly. In my architecture, I could add a "timing" thread, but it would then also have to communicate through the engine thread, which would make it even slower than your implementation.

If I were you, I'd just keep it simple and poll for the termination conditions. Time management is finicky enough to get right on its own, let alone if you have to take extra threading overhead into account.
I agree. Except I'm not responding to any UCI input while searching.
This experiment was more of a simplification.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Thread overhead in C++

Post by Joost Buijs »

In that thread you don't only check time but also if there is console input, this could be the reason that it takes a noticeable amount of time. Checking time with GetTickCount64() or std::high_resolution_clock() goes so fast that you won't notice this when you do this every 5ms.

Starting and joining a thread takes several milliseconds, this only happens once each search and should have no influence, only with very fast hyper-bullet games this could have a detrimental effect, in this case it's better to use events or condition-variables.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Thread overhead in C++

Post by hgm »

This seems pretty much a wild goose chase. You will always have to use an ample safety margin, because no matter how often and how accurately you read the clock, there will be unknown and unpredicable communication delays with the GUI.

And, like I pointed out, polling the clock is not necessary anyway. You can just schedule an interrupt at the critical time.