multithreading questions

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 2:19 am
Location: Atlanta, GA
Contact:

Re: multithreading questions

Post by Pradu » Wed Aug 08, 2007 6:32 pm

Martin Fierz wrote:are the windows functions EnterCriticalSection etc. much slower than the assembly code i saw in crafty's source code?
Yes much slower (atleast 3x slower on my P4) because Critical selection does a spin-wait first then does a slow OS call to block. Crafty's spinning assembly locks are also a tad-bit slower when there is high contention because it does an interlocked exchange first then tests. Here's what I'm using in Buzz:

Code: Select all

#ifndef INLINE
	#ifdef _MSC_VER
		#define INLINE __forceinline
	#elif defined(__GNUC__)
		#define INLINE __inline__ __attribute__((always_inline))
	#else
		#define INLINE inline
	#endif
#endif

#if defined(__GNUC__)
	typedef volatile int SpinLock[1];
	typedef volatile int* const SpinLock_P;
	static INLINE int volatile LockedExchange(SpinLock_P Target, const int Value)
	{
		int ret = Value;
		__asm__
		(
			"xchgl %[ret], %[Target]"
			: [ret] "+r" (ret)
			: [Target] "m" (*Target)
			: "memory"
		);
		return ret;
	}
#elif defined(_MSC_VER)
	typedef volatile long SpinLock[1];
	typedef volatile long* const SpinLock_P;
	#include <intrin.h>
	#pragma intrinsic (_InterlockedExchange&#41; 
	#define LockedExchange&#40;Target,Value&#41; _InterlockedExchange&#40;Target,Value&#41;
#else
	#error Unspported Compiler
#endif

#define IsLocked&#40;s&#41; (&#40;s&#41;&#91;0&#93;)
#define SetLocked&#40;s,boolean&#41; (&#40;s&#41;&#91;0&#93;=&#40;boolean&#41;)
#define ResetSpinLock&#40;s&#41; SetLocked&#40;s,0&#41;
static INLINE void volatile Release&#40;SpinLock_P s&#41; &#123;SetLocked&#40;s,0&#41;;&#125;
static INLINE int volatile TryLock&#40;SpinLock_P s&#41; &#123;return !&#40;IsLocked&#40;s&#41; || LockedExchange&#40;s,1&#41;);&#125; 
static INLINE void volatile Lock&#40;SpinLock_P s&#41; &#123;while&#40;IsLocked&#40;s&#41; || LockedExchange&#40;s,1&#41;);&#125;
This does a test first before trying to do an interlocked exchange. However, this method is a tad bit slower when there's low contention but the overhead is relativly nothing in such cases. You can read up on high-performance locking here:
http://www.intel.com/cd/ids/developer/a ... 333935.htm
is there anything (paper, website etc) that describes some fundamentals of multi-threades game playing programs? i found a thread which said that the thesis by valavan manohararajah was a good starting point, so i read it - but i don't think it's a good starting point at all :-(
In my opinion, this is the best starting place:
http://www.netlib.org/utk/lsi/pcwLSI/text/node351.html
no pseudocode for the actual splitting operations, no discussion of sharing stuff etc. is there anything else, preferably on a lower level?
It's slow and boring but for this I read up online on Windows API and POSIX Threads API.
cheers
martin
Good luck with your parallel engine. 8-)

User avatar
Steve Maughan
Posts: 1025
Joined: Wed Mar 08, 2006 7:28 pm
Location: Florida, USA
Contact:

Re: multithreading questions

Post by Steve Maughan » Thu Aug 09, 2007 2:04 pm

Martin,

I haven't written a multi-threaded chess program but I have thought of the issue you raise.

This is how I am planning to tackle the problem. Whenever a thread wants to probe or write to a hash entry it first increments a integer (call it "lock") which is part of the hash table entry. The thread then tests to see if lock = 1. If it is true it proceeds to use (i.e. read or write) the hash entry. After it is finished with the hash entry the thread decrements the lock flag. If on initially testing lock, it is > 1 it knows that the entry is being used by another thread and acts accordingly.

Regards,

Steve

Aleks Peshkov
Posts: 845
Joined: Sun Nov 19, 2006 8:16 pm
Location: Russia

Re: multithreading questions

Post by Aleks Peshkov » Thu Aug 09, 2007 2:18 pm

Integer flag decrement is not thread safe. If you make it safe it will be very slow.

User avatar
Steve Maughan
Posts: 1025
Joined: Wed Mar 08, 2006 7:28 pm
Location: Florida, USA
Contact:

Re: multithreading questions

Post by Steve Maughan » Thu Aug 09, 2007 2:26 pm

Aleks,
Aleks Peshkov wrote:Integer flag decrement is not thread safe. If you make it safe it will be very slow.
I don't know that much about multi-core programming so I'm sure you're right. Nevertheless could you expand upon this? How is it not safe? What would happen if you implemented a raw integer decrement?

Steve

User avatar
hgm
Posts: 22274
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: multithreading questions

Post by hgm » Thu Aug 09, 2007 2:48 pm

Aleks Peshkov wrote:Integer flag decrement is not thread safe. If you make it safe it will be very slow.
I also never have done any multi-core programming, but what would be wrong in simply using the inc/dec instruction with a lock prefix (or using xchg, which is always locked)? Being only a single instruction, that cannot be disastrously slow, can it?

Pradu
Posts: 287
Joined: Sat Mar 11, 2006 2:19 am
Location: Atlanta, GA
Contact:

Re: multithreading questions

Post by Pradu » Thu Aug 09, 2007 3:53 pm

hgm wrote:
Aleks Peshkov wrote:Integer flag decrement is not thread safe. If you make it safe it will be very slow.
I also never have done any multi-core programming, but what would be wrong in simply using the inc/dec instruction with a lock prefix (or using xchg, which is always locked)? Being only a single instruction, that cannot be disastrously slow, can it?
When you add the lock prefix to an instruction it becomes disastorously slow. :) You can test this out pretty quick in a sample program.

Gerd Isenberg
Posts: 2105
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: multithreading questions

Post by Gerd Isenberg » Thu Aug 09, 2007 3:55 pm

hgm wrote:
Aleks Peshkov wrote:Integer flag decrement is not thread safe. If you make it safe it will be very slow.
I also never have done any multi-core programming, but what would be wrong in simply using the inc/dec instruction with a lock prefix (or using xchg, which is always locked)? Being only a single instruction, that cannot be disastrously slow, can it?
The single instruction with lock prefix is not the problem - but you have to enter a critical section before you may modify shared data mutual exclusive. Entering a critical section is something like this...

Code: Select all

EnterCriticalSection&#58;
    mov   eax, 1
    xchg  &#91;sema&#93;, eax
    test  eax, eax
    jnz   EnterCritilaSection

    ; mutual exclusive access
    inc  &#91;someSharedAddress&#93;

LeaveCriticalSection&#58;
    xor   eax, eax
    xchg  &#91;sema&#93;, eax
... polling a semaphore, until it is false - it might become slow if other threads request the shared ressource at the same time.

Pradu
Posts: 287
Joined: Sat Mar 11, 2006 2:19 am
Location: Atlanta, GA
Contact:

Re: multithreading questions

Post by Pradu » Thu Aug 09, 2007 3:59 pm

Gerd Isenberg wrote:
hgm wrote:
Aleks Peshkov wrote:Integer flag decrement is not thread safe. If you make it safe it will be very slow.
I also never have done any multi-core programming, but what would be wrong in simply using the inc/dec instruction with a lock prefix (or using xchg, which is always locked)? Being only a single instruction, that cannot be disastrously slow, can it?
The single instruction with lock prefix is not the problem - but you have to enter a critical section before you may modify shared data mutual exclusive. Entering a critical section is something like this...

Code: Select all

EnterCriticalSection&#58;
    mov   eax, 1
    xchg  &#91;sema&#93;, eax
    test  eax, eax
    jnz   EnterCritilaSection

    ; mutual exclusive access
    inc  &#91;someSharedAddress&#93;

LeaveCriticalSection&#58;
    xor   eax, eax
    xchg  &#91;sema&#93;, eax
... polling a semaphore, until it is false - it might become slow if other threads request the shared ressource at the same time.
With high contention, the above could would be faster with a test before the xchg.

Gerd Isenberg
Posts: 2105
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: multithreading questions

Post by Gerd Isenberg » Thu Aug 09, 2007 4:12 pm

Pradu wrote:

Code: Select all

EnterCriticalSection&#58;
    mov   eax, 1
    xchg  &#91;sema&#93;, eax
    test  eax, eax
    jnz   EnterCritilaSection

    ; mutual exclusive access
    inc  &#91;someSharedAddress&#93;

LeaveCriticalSection&#58;
    xor   eax, eax
    xchg  &#91;sema&#93;, eax
With high contention, the above could would be faster with a test before the xchg.
sorry, I don't get that. Faster but wrong imho...

Pradu
Posts: 287
Joined: Sat Mar 11, 2006 2:19 am
Location: Atlanta, GA
Contact:

Re: multithreading questions

Post by Pradu » Thu Aug 09, 2007 4:25 pm

Gerd Isenberg wrote:
Pradu wrote:

Code: Select all

EnterCriticalSection&#58;
    mov   eax, 1
    xchg  &#91;sema&#93;, eax
    test  eax, eax
    jnz   EnterCritilaSection

    ; mutual exclusive access
    inc  &#91;someSharedAddress&#93;

LeaveCriticalSection&#58;
    xor   eax, eax
    xchg  &#91;sema&#93;, eax
With high contention, the above could would be faster with a test before the xchg.
sorry, I don't get that. Faster but wrong imho...
You do a test on the variable that you are using for a lock before the exchange, then you test the returned result after the exchange. An example of this posted here: http://www.intel.com/cd/ids/developer/a ... /20464.htm

Code: Select all

if &#40;GETLOCK&#40;lock&#41; != 0&#41;
&#123;
  While &#40;VARIOUS_CRITERIA&#41;
 &#123;
   _asm pause;  // pause instruction in spin-loop
   if (&#40;volatile int*&#41;lock&#41; == 0&#41; // spin on dirty read, not on lock
   &#123;
     if &#40;GETLOCK&#40;lock&#41;) == 0&#41;
     &#123;
       goto PROTECTED_CODE;
     &#125;
   &#125;
&#125;
 BACK_OFF_LOCK&#40;lock&#41;; // back off lock or do other work
&#125;
PROTECTED_CODE&#58;
Here the variable is always tested before the exchange to see if it is unlocked before an exchange is attempted.
One common mistake made by developers developing their own spin-wait loops is attempting to spin on an atomic instruction instead of spinning on a volatile read. Spinning on a dirty read instead of attempting to acquire a lock consumes less time and resources. This allows an application to only attempt to acquire a lock only when it is free.

Post Reply