Thread overhead in C++

JohnWoe · Post by **JohnWoe** » Wed Apr 21, 2021 11:04 pm

In Mayhem 4.1 I added extra thread for checking time.

Is there something wrong?
Here: https://github.com/SamuraiDangyo/mayhem ... .hpp#L2346

But there's a massive overhead w/ threads.

std::mutex was super slow.
atomic<bool> is faster. But still slow.

Both worked well, but super slow.

All this thread locking/mingling costs time.

Am I missing some obvious speedup?
Getting back to 1 thread.

hgm · Post by **hgm** » Wed Apr 21, 2021 11:19 pm

Why would you need a mutex for checking time? Or even, why have an extra thread for his? I can see why it solves some problems when you have one for reading input. But reading the clock is not something that can block you. If you don't want to poll you could set an alarm signal at the time you want to check for, and catch that with a handler that sets an outOfTime flag, which you search thread can test as often as it likes with almos no cost.

mvanthoor · Post by **mvanthoor** » Thu Apr 22, 2021 12:10 am

Why don't you just check if you exceeded your allotted time, within the search thread?

I check for all termination conditions at every 2047 nodes, which works well. A termination condition can be a received stop, quit, depth, moves, nodes reached, max-depth exceeded, or time up. You could have a different thread to check for those conditions, but as it is the search that needs to make sure that it adheres to these conditions, I don't really see why.

I could, for example, have a different thread for generating moves. Instead of calling the move generator's functions, I could send messages to this thread, and then catch the move list. But why? It just makes search slower because inter-thread communication is slower than handling things in the thread itself.

Personally, I use threads for things that are distinctly separated:
- One "engine" thread (the main thread)
- One OI-thread that reads the input to the engine, and writes responses to the GUI
- One search thread, which runs iterative deepening/alpha beta. (In the future, either the engine or the search thread will spawn more search threads when implementing Lazy SMP. I haven't researched this yet so I don't yet know how I'm going to do it.)

In this engine, everything communicates with the engine thread:
engine <-> io
engine <-> search

So if the search has to send something to the GUI, it'll report that to the engine, which will then send it using the IO-thread. If the GUI wants something, the IO-thread reports that to the engine, which will then send a command to the search thread.

What I DON'T do, is make one thread dependent on the other, because it makes the architecture much more difficult to maintain, and it makes everything slower. So there will never be communication between IO and Search directly. In my architecture, I could add a "timing" thread, but it would then also have to communicate through the engine thread, which would make it even slower than your implementation.

If I were you, I'd just keep it simple and poll for the termination conditions. Time management is finicky enough to get right on its own, let alone if you have to take extra threading overhead into account.

mar · Post by **mar** » Thu Apr 22, 2021 12:33 am

whaat?

atomics are super-fast if no contention

on x86, atomic load (no matter what ordering) is simply a mov reg, mem plus a compiler barrier before that!
how is that slow?!

since your thread only potentially stores each 5ms, no heavy contention is possible whatsoever

I agree with marcel, just poll each n nodes from within search

JohnWoe · Post by **JohnWoe** » Thu Apr 22, 2021 12:42 am

hgm wrote: ↑Wed Apr 21, 2021 11:19 pm Why would you need a mutex for checking time? Or even, why have an extra thread for his? I can see why it solves some problems when you have one for reading input. But reading the clock is not something that can block you. If you don't want to poll you could set an alarm signal at the time you want to check for, and catch that with a handler that sets an outOfTime flag, which you search thread can test as often as it likes with almos no cost.

Yes, I'm back to the simple method.

The idea was that when I read clock every (like mrvanthoor every 2047) cycles. How long are these cycles exactly on 1000's different CPUs?
Mayhem may run out of time. That's giving away free points. If I read the clock every 5 ms in a separate thread. I know exactly that Mayhem will never run out of time.

But that comes with a cost. This thread implementation is as simple as possible ( std::atomic not mutex ). Still there's some overhead.

7.2 MNPS -> 6.7 MNPS

Not that much but measurable.

mvanthoor wrote: ↑Thu Apr 22, 2021 12:10 am Why don't you just check if you exceeded your allotted time, within the search thread?

I check for all termination conditions at every 2047 nodes, which works well. A termination condition can be a received stop, quit, depth, moves, nodes reached, max-depth exceeded, or time up. You could have a different thread to check for those conditions, but as it is the search that needs to make sure that it adheres to these conditions, I don't really see why.

I could, for example, have a different thread for generating moves. Instead of calling the move generator's functions, I could send messages to this thread, and then catch the move list. But why? It just makes search slower because inter-thread communication is slower than handling things in the thread itself.

Personally, I use threads for things that are distinctly separated:
- One "engine" thread (the main thread)
- One OI-thread that reads the input to the engine, and writes responses to the GUI
- One search thread, which runs iterative deepening/alpha beta. (In the future, either the engine or the search thread will spawn more search threads when implementing Lazy SMP. I haven't researched this yet so I don't yet know how I'm going to do it.)

In this engine, everything communicates with the engine thread:
engine <-> io
engine <-> search

So if the search has to send something to the GUI, it'll report that to the engine, which will then send it using the IO-thread. If the GUI wants something, the IO-thread reports that to the engine, which will then send a command to the search thread.

What I DON'T do, is make one thread dependent on the other, because it makes the architecture much more difficult to maintain, and it makes everything slower. So there will never be communication between IO and Search directly. In my architecture, I could add a "timing" thread, but it would then also have to communicate through the engine thread, which would make it even slower than your implementation.

If I were you, I'd just keep it simple and poll for the termination conditions. Time management is finicky enough to get right on its own, let alone if you have to take extra threading overhead into account.

I agree. Except I'm not responding to any UCI input while searching.
This experiment was more of a simplification.

Joost Buijs · Post by **Joost Buijs** » Thu Apr 22, 2021 6:54 am

In that thread you don't only check time but also if there is console input, this could be the reason that it takes a noticeable amount of time. Checking time with GetTickCount64() or std::high_resolution_clock() goes so fast that you won't notice this when you do this every 5ms.

Starting and joining a thread takes several milliseconds, this only happens once each search and should have no influence, only with very fast hyper-bullet games this could have a detrimental effect, in this case it's better to use events or condition-variables.

hgm · Post by **hgm** » Thu Apr 22, 2021 8:29 am

This seems pretty much a wild goose chase. You will always have to use an ample safety margin, because no matter how often and how accurately you read the clock, there will be unknown and unpredicable communication delays with the GUI.

And, like I pointed out, polling the clock is not necessary anyway. You can just schedule an interrupt at the critical time.

Thread overhead in C++

Thread overhead in C++

Re: Thread overhead in C++

Re: Thread overhead in C++

Re: Thread overhead in C++

Re: Thread overhead in C++

Re: Thread overhead in C++

Re: Thread overhead in C++