Lock-Free Programming on AMD Multi-Core Systems

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Lock-Free Programming on AMD Multi-Core Systems

Post by Gerd Isenberg »

Michael Sherwin wrote: I know next to nothing about how to make a multiprocessor chess program. However, using hash tables as an example my first thought would be to do the following when reading and writing to the hash from different threads.
See also Bob's paper:
A lockless transposition table implementation for parallel search
http://www.cis.uab.edu/hyatt/hashing.html

Using 128-bit sse2-read/writes might be safe enough as well...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Lock-Free Programming on AMD Multi-Core Systems

Post by bob »

what do you do if _both_ threads read/modify the counter at the same instant? That's the problem you have to solve, and what you suggested will fail on simultaneous accesses.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Lock-Free Programming on AMD Multi-Core Systems

Post by Michael Sherwin »

bob wrote:what do you do if _both_ threads read/modify the counter at the same instant? That's the problem you have to solve, and what you suggested will fail on simultaneous accesses.
If two threads increment the counter at the same time would it not read back as two and invalidate both threads? If not then reading the counter twice should solve that.

edit: If two simultaneous increments only means a plus one result period then I suppose that it will not work.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Lock-Free Programming on AMD Multi-Core Systems

Post by bob »

Michael Sherwin wrote:
bob wrote:what do you do if _both_ threads read/modify the counter at the same instant? That's the problem you have to solve, and what you suggested will fail on simultaneous accesses.
If two threads increment the counter at the same time would it not read back as two and invalidate both threads? If not then reading the counter twice should solve that.

edit: If two simultaneous increments only means a plus one result period then I suppose that it will not work.
Nope. :)

Here is why

Code: Select all


                                 thread 1                               thread 2
time t:                    mov eax, counter                mov eax, counter
time t+1                inc eax                                   inc eax
time t+2                mov counter, eax                 mov counter, eax
They both fetch the _same_ counter value, they both increment it by one, they both store the value incremented by 1, and the final result is incremented by one, _not_ by two.

That's the problem with non-atomic updates.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Lock-Free Programming on AMD Multi-Core Systems

Post by wgarvin »

They both fetch the _same_ counter value, they both increment it by one, they both store the value incremented by 1, and the final result is incremented by one, _not_ by two.

That's the problem with non-atomic updates.
On some platforms it can even be worse than that. Writes issued in a certain order by one processor (e.g. write address A, then address B, then address C) might be observed by another processor in a different order, even if the second processor is not trying to write to any of that memory (i.e. 2nd processor reads B and gets new value, then reads A and gets old value, even though 1st processor thinks it wrote to A before writing to B). One platform I know of where this can happen is the Xbox360 (which has 3 dual-threaded PPC cores share the same no-allocate L2 cache).

The bottom line with synchronization is that you have to be very, very careful. The best approach is to use other people's primitive operations which are fully debugged and known to behave in predictable ways (such as the Interlocked* primitives on Windows). To go any deeper than that you need to have detailed info about the cache coherency and atomicity guarantees provided by your platform -- and more importantly, the ones NOT provided! =)