Lock-Free Programming on AMD Multi-Core Systems

Gerd Isenberg · Post by **Gerd Isenberg** » Thu Nov 08, 2007 2:17 pm

Michael Sherwin wrote: I know next to nothing about how to make a multiprocessor chess program. However, using hash tables as an example my first thought would be to do the following when reading and writing to the hash from different threads.

See also Bob's paper:
A lockless transposition table implementation for parallel search
http://www.cis.uab.edu/hyatt/hashing.html

Using 128-bit sse2-read/writes might be safe enough as well...

bob · Post by **bob** » Thu Nov 08, 2007 4:50 pm

what do you do if _both_ threads read/modify the counter at the same instant? That's the problem you have to solve, and what you suggested will fail on simultaneous accesses.

Michael Sherwin · Post by **Michael Sherwin** » Thu Nov 08, 2007 5:34 pm

bob wrote:what do you do if _both_ threads read/modify the counter at the same instant? That's the problem you have to solve, and what you suggested will fail on simultaneous accesses.

If two threads increment the counter at the same time would it not read back as two and invalidate both threads? If not then reading the counter twice should solve that.

edit: If two simultaneous increments only means a plus one result period then I suppose that it will not work.

bob · Post by **bob** » Thu Nov 08, 2007 5:51 pm

Michael Sherwin wrote:
bob wrote:what do you do if _both_ threads read/modify the counter at the same instant? That's the problem you have to solve, and what you suggested will fail on simultaneous accesses.
If two threads increment the counter at the same time would it not read back as two and invalidate both threads? If not then reading the counter twice should solve that.

edit: If two simultaneous increments only means a plus one result period then I suppose that it will not work.

Nope.

Here is why

Code: Select all


                                 thread 1                               thread 2
time t&#58;                    mov eax, counter                mov eax, counter
time t+1                inc eax                                   inc eax
time t+2                mov counter, eax                 mov counter, eax

They both fetch the _same_ counter value, they both increment it by one, they both store the value incremented by 1, and the final result is incremented by one, _not_ by two.

That's the problem with non-atomic updates.

wgarvin · Post by **wgarvin** » Fri Nov 09, 2007 2:00 am

They both fetch the _same_ counter value, they both increment it by one, they both store the value incremented by 1, and the final result is incremented by one, _not_ by two.

That's the problem with non-atomic updates.

On some platforms it can even be worse than that. Writes issued in a certain order by one processor (e.g. write address A, then address B, then address C) might be observed by another processor in a different order, even if the second processor is not trying to write to any of that memory (i.e. 2nd processor reads B and gets new value, then reads A and gets old value, even though 1st processor thinks it wrote to A before writing to B). One platform I know of where this can happen is the Xbox360 (which has 3 dual-threaded PPC cores share the same no-allocate L2 cache).

The bottom line with synchronization is that you have to be very, very careful. The best approach is to use other people's primitive operations which are fully debugged and known to behave in predictable ways (such as the Interlocked* primitives on Windows). To go any deeper than that you need to have detailed info about the cache coherency and atomicity guarantees provided by your platform -- and more importantly, the ones NOT provided! =)

Lock-Free Programming on AMD Multi-Core Systems

Re: Lock-Free Programming on AMD Multi-Core Systems

Re: Lock-Free Programming on AMD Multi-Core Systems

Re: Lock-Free Programming on AMD Multi-Core Systems

Re: Lock-Free Programming on AMD Multi-Core Systems

Re: Lock-Free Programming on AMD Multi-Core Systems