That's what I do (and what you have to do) for commonly counted things like nodes searched, time spent waiting, extensions done, reductions done) unless you choose to ignore the minimal race-condition issues that can make the counters slightly off.AlvaroBegue wrote:Not an expert in multi-threaded code, but... Can you store per-thread counters and add them just before you show them to the user? (Make sure the counters are separated enough that they live in separate cache lines, or what I am suggesting might not help at all.)
The cost of transposition table instrumentation
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The cost of transposition table instrumentation
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The cost of transposition table instrumentation
Shared counters already have a major issue with scaling. When a thread modifies the counter, the local cache has to broadcast a "invalidate" to all other caches since their values are now wrong. When they need the counter, they have to wait for another cache to forward the block with that counter to them. Too much of this and you smoke everything. And that ignores the spin lock which has to do exactly the same thing, almost certainly in a different cache block.Joost Buijs wrote:This is exactly what I'm doing in my engine, for each thread I have a struct which contains everything that belongs to that thread, including node counters etc.AlvaroBegue wrote:Not an expert in multi-threaded code, but... Can you store per-thread counters and add them just before you show them to the user? (Make sure the counters are separated enough that they live in separate cache lines, or what I am suggesting might not help at all.)
When I make sure with an alignment pragma that these structs are at least separated one cache-line apart from each other, I don't see any performance degradation at all.
When I want to show the counters to the user I just add the separate thread counters together, this works beautifully.
Sharing trivial bits of data doesn't scale well. Sometimes the cost is worth it (history counters or counter-moves) but most of the time, no.
-
- Posts: 1563
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: The cost of transposition table instrumentation
Well, the only thing I share between threads is the transposition table and the history table, nothing else.bob wrote:Shared counters already have a major issue with scaling. When a thread modifies the counter, the local cache has to broadcast a "invalidate" to all other caches since their values are now wrong. When they need the counter, they have to wait for another cache to forward the block with that counter to them. Too much of this and you smoke everything. And that ignores the spin lock which has to do exactly the same thing, almost certainly in a different cache block.Joost Buijs wrote:This is exactly what I'm doing in my engine, for each thread I have a struct which contains everything that belongs to that thread, including node counters etc.AlvaroBegue wrote:Not an expert in multi-threaded code, but... Can you store per-thread counters and add them just before you show them to the user? (Make sure the counters are separated enough that they live in separate cache lines, or what I am suggesting might not help at all.)
When I make sure with an alignment pragma that these structs are at least separated one cache-line apart from each other, I don't see any performance degradation at all.
When I want to show the counters to the user I just add the separate thread counters together, this works beautifully.
Sharing trivial bits of data doesn't scale well. Sometimes the cost is worth it (history counters or counter-moves) but most of the time, no.
I don't use shared counters at all.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The cost of transposition table instrumentation
Joost Buijs wrote:Well, the only thing I share between threads is the transposition table and the history table, nothing else.bob wrote:Shared counters already have a major issue with scaling. When a thread modifies the counter, the local cache has to broadcast a "invalidate" to all other caches since their values are now wrong. When they need the counter, they have to wait for another cache to forward the block with that counter to them. Too much of this and you smoke everything. And that ignores the spin lock which has to do exactly the same thing, almost certainly in a different cache block.Joost Buijs wrote:This is exactly what I'm doing in my engine, for each thread I have a struct which contains everything that belongs to that thread, including node counters etc.AlvaroBegue wrote:Not an expert in multi-threaded code, but... Can you store per-thread counters and add them just before you show them to the user? (Make sure the counters are separated enough that they live in separate cache lines, or what I am suggesting might not help at all.)
When I make sure with an alignment pragma that these structs are at least separated one cache-line apart from each other, I don't see any performance degradation at all.
When I want to show the counters to the user I just add the separate thread counters together, this works beautifully.
Sharing trivial bits of data doesn't scale well. Sometimes the cost is worth it (history counters or counter-moves) but most of the time, no.
I don't use shared counters at all.
Isn't your history table just an array of counters???
-
- Posts: 1563
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: The cost of transposition table instrumentation
I would not call them counters.bob wrote:
Isn't your history table just an array of counters???
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The cost of transposition table instrumentation
They certainly behave like one. Whether you do nodes_searched++ or history_counter[index]+=n*n you are doing exactly the same thing, and the occasional interleaved update is not exactly going to wreck things.Joost Buijs wrote:I would not call them counters.bob wrote:
Isn't your history table just an array of counters???
-
- Posts: 589
- Joined: Tue Jun 04, 2013 10:15 pm
Re: The cost of transposition table instrumentation
What you call a 'counter' here is commonly known as a 'variable' in computer science. There are programs that adjust the value of the history table elements in either direction, for example.bob wrote:They certainly behave like one. Whether you do nodes_searched++ or history_counter[index]+=n*n you are doing exactly the same thing, and the occasional interleaved update is not exactly going to wreck things.Joost Buijs wrote:I would not call them counters.bob wrote:
Isn't your history table just an array of counters???
[Account deleted]
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The cost of transposition table instrumentation
Sure. And counters go in either direction. IE "saturating counters" in branch prediction. I am using a saturating counter for my history data for the moment. I used the term "counter" for a variable that has something related to counting. And certainly history counters are just that. As opposed to a pointer, or a bit map, or a move (which has several related pieces of information like from, to, piece, etc) or whatever, where interleaved update might produce a more serious problem than just losing one counter update.mvk wrote:What you call a 'counter' here is commonly known as a 'variable' in computer science. There are programs that adjust the value of the history table elements in either direction, for example.bob wrote:They certainly behave like one. Whether you do nodes_searched++ or history_counter[index]+=n*n you are doing exactly the same thing, and the occasional interleaved update is not exactly going to wreck things.Joost Buijs wrote:I would not call them counters.bob wrote:
Isn't your history table just an array of counters???
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: The cost of transposition table instrumentation
If you are allowing writes to a variable from several threads without proper guards, you are invoking undefined behavior. You are probably right that the only likely consequence is missing the occasional update, but this could change in the future. I am afraid a few years down the line you are going to be screaming about how some compiler "broke" your code.bob wrote:[...] As opposed to a pointer, or a bit map, or a move (which has several related pieces of information like from, to, piece, etc) or whatever, where interleaved update might produce a more serious problem than just losing one counter update.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The cost of transposition table instrumentation
Sorry, but the only problem is a lost update. If the value is a counter, losing an addition or a subtraction is not going to break anything, and is often acceptable over the cost of a lock acquisition which is extremely expensive. No compiler or hardware is going to "break my code" in this instance. Interleaved update is only caused when two different threads read, update and write the same location. Since the orders of those operations can be interleaved. The only effect is that ONE of the updates is lost. For a statistical counter that has no effect on program behavior, missing an update is inconsequential. And is more than acceptable when compared to the cost of the lock. Since most of these are once-per-node-searched at best (and usually less often) the probability is minimal when spread over N counters.AlvaroBegue wrote:If you are allowing writes to a variable from several threads without proper guards, you are invoking undefined behavior. You are probably right that the only likely consequence is missing the occasional update, but this could change in the future. I am afraid a few years down the line you are going to be screaming about how some compiler "broke" your code.bob wrote:[...] As opposed to a pointer, or a bit map, or a move (which has several related pieces of information like from, to, piece, etc) or whatever, where interleaved update might produce a more serious problem than just losing one counter update.
If you were building a move, that might be different if you somehow forced everything to go to memory, and you first add the from, then the to, etc. But not with a counter. And no, I won't be screaming, unless apple decides to intentionally break my code as they did with strcpy(), something that got them raked over the coals by tens of thousands of complaints. If they do what I write, the results will be usable forever. If they take it upon themselves to make the program abort on a perfectly safe operation, yes I will complaint.