multithreading questions

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Martin Fierz

multithreading questions

Post by Martin Fierz » Wed Aug 08, 2007 10:58 am

aloha!

i have a few stupid multithreading questions, since i have a dual core machine now and am starting to get interested...

i seem to have learned that you should have no global variables (except for stuff that you want to share). i also seem to have learned that static variables won't work properly. then there are some things that i don't quite understand about shared data: i would expect that you want to share the hashtable among threads that search the same root position. if this is the case, then i would also expect that you should not access the hashtable simultaneously with multiple threads, because something bad could happen. so i used EnterCriticalSection() and LeaveCriticalSection around my hashtable lookup code, and while that worked, it made my checkers program (i know, i know, the game is solved!) 10 times slower (only when searching two threads parallel, when searching one thread it got slower too, but much less so). so i looked at crafty's source code and could find no attempt to restrict access to the hashtable.

finally, some questions:
does access to the hashtable have to be restricted to one thread? if not, why not?

are the windows functions EnterCriticalSection etc. much slower than the assembly code i saw in crafty's source code?

is there anything (paper, website etc) that describes some fundamentals of multi-threades game playing programs? i found a thread which said that the thesis by valavan manohararajah was a good starting point, so i read it - but i don't think it's a good starting point at all :-(
no pseudocode for the actual splitting operations, no discussion of sharing stuff etc. is there anything else, preferably on a lower level?

cheers
martin

Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 8:19 pm
Location: Oslo, Norway

Re: multithreading questions

Post by Tord Romstad » Wed Aug 08, 2007 11:31 am

Hello Martin!
Martin Fierz wrote:finally, some questions:
does access to the hashtable have to be restricted to one thread? if not, why not?
Because locking and unlocking are very expensive operations (as you have already noticed yourself), and because the hash table is so big that the probability of two threads writing to the same entry simultaneously is very small. What I (and, I think, most other programmers) do is to let all threads access the hash table freely without any locking, but to always check the hash move for legality before playing it. In theory, I will sometimes still get incorrect hash table cutoffs, but in practice this happens so rarely that it has zero impact on playing strength.

For pawn hash tables and material hash tables, I use separate tables for each thread.
are the windows functions EnterCriticalSection etc. much slower than the assembly code i saw in crafty's source code?
I don't know (I don't run Windows), but I doubt that they are sufficiently fast to make locking of the main hash table feasible.
is there anything (paper, website etc) that describes some fundamentals of multi-threades game playing programs? i found a thread which said that the thesis by valavan manohararajah was a good starting point, so i read it - but i don't think it's a good starting point at all :-(
no pseudocode for the actual splitting operations, no discussion of sharing stuff etc. is there anything else, preferably on a lower level?
Unfortunately, I am not aware of any better starting point than Valavan's thesis. If you want to look at some actual code, I have a lobotomized version of Glaurung 1 named Viper which is supposed to illustrate YBW in simple and straightforward code. It is still too complicated and poorly commented, but perhaps it can still be helpful.

Tord

Guetti

Re: multithreading questions

Post by Guetti » Wed Aug 08, 2007 12:06 pm

Crafty uses a lockless hashing which is discribed on Prof. Hyatt's website.

http://www.cis.uab.edu/hyatt/hashing.html

Maybe this helps to understand what Crafty does. Unfortunately, I didn't have the time yet to read it...

User avatar
hgm
Posts: 23723
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: multithreading questions

Post by hgm » Wed Aug 08, 2007 12:21 pm

Well, the idea of lockless hashing is very simple, and based on the philosophy that you don't care so much if a hash entry gets lost, as long as it produces a recognizably invalid hash entry rather than an erroneous one that might send your engine off track.

To this end you XOR all independently stored words of the hash entry together, and store it in stead of the word containing the signature. On probing you have to recover the signature (and whatever other data there might be in this word) by repeating the XOR operation, before you can compare it with your hash key.

If not all words belong to the same entry, (because two threads messed it up by simultaneously trying to store something in the same entry, or one is replacing it while you were reading), you won't recover the original signature word through the XORing, and thus will discard the corrupted entry as a miss.

jswaff

Re: multithreading questions

Post by jswaff » Wed Aug 08, 2007 1:17 pm

Martin Fierz wrote:aloha!

i seem to have learned that you should have no global variables (except for stuff that you want to share).
That's only partially true. The truth is most chess programs have a lot of globals, mine included. However, most of them don't get modified at all during the search, so I don't worry about them. What you really want to avoid is having a global variable that is written to by multiple threads without using some sort of locking mechanism (mutex, semaphore, etc) to ensure there are no race conditions.

Here's a classic race condition:

Code: Select all

while (flag==1) {}

flag=1;
<some critical section of code>

flag=0;

So, what happens is, multiple threads run this code. It seems the critical section is protected because you don't allow entry unless flag==1. But, suppose thread1 gets past the "while" statement, then is stopped so thread2 can execute for a while. Thread2 then gets past the while loop, and both threads are heading straight into the critical section.
Martin Fierz wrote: i also seem to have learned that static variables won't work properly. then there are some things that i don't quite understand about shared data: i would expect that you want to share the hashtable among threads that search the same root position. if this is the case, then i would also expect that you should not access the hashtable simultaneously with multiple threads, because something bad could happen.
You can safely access the hash table in multiple threads, you just have to be careful to do so in a thread safe way. You could use locking mechanisms, or you could just ignore the problem. If you ignore the problem, you run the serious risk of fetching corrupt hash data. But, if you can handle that, it may be a better approach than locking, which can get to be expensive. (Not 10x though!)

I recommend you read Hyatt's paper on "Lockless Hashing" from here:
http://www.cis.uab.edu/hyatt/pubs.html .
I have implemented it in Prophet, it's not too bad.
Martin Fierz wrote: so i used EnterCriticalSection() and LeaveCriticalSection around my hashtable lookup code, and while that worked, it made my checkers program (i know, i know, the game is solved!) 10 times slower (only when searching two threads parallel, when searching one thread it got slower too, but much less so). so i looked at crafty's source code and could find no attempt to restrict access to the hashtable.

finally, some questions:
does access to the hashtable have to be restricted to one thread? if not, why not?

are the windows functions EnterCriticalSection etc. much slower than the assembly code i saw in crafty's source code?

is there anything (paper, website etc) that describes some fundamentals of multi-threades game playing programs? i found a thread which said that the thesis by valavan manohararajah was a good starting point, so i read it - but i don't think it's a good starting point at all :-(
no pseudocode for the actual splitting operations, no discussion of sharing stuff etc. is there anything else, preferably on a lower level?

cheers
martin

Pradu
Posts: 287
Joined: Sat Mar 11, 2006 2:19 am
Location: Atlanta, GA
Contact:

Re: multithreading questions

Post by Pradu » Wed Aug 08, 2007 6:32 pm

Martin Fierz wrote:are the windows functions EnterCriticalSection etc. much slower than the assembly code i saw in crafty's source code?
Yes much slower (atleast 3x slower on my P4) because Critical selection does a spin-wait first then does a slow OS call to block. Crafty's spinning assembly locks are also a tad-bit slower when there is high contention because it does an interlocked exchange first then tests. Here's what I'm using in Buzz:

Code: Select all

#ifndef INLINE
	#ifdef _MSC_VER
		#define INLINE __forceinline
	#elif defined&#40;__GNUC__)
		#define INLINE __inline__ __attribute__(&#40;always_inline&#41;)
	#else
		#define INLINE inline
	#endif
#endif

#if defined&#40;__GNUC__)
	typedef volatile int SpinLock&#91;1&#93;;
	typedef volatile int* const SpinLock_P;
	static INLINE int volatile LockedExchange&#40;SpinLock_P Target, const int Value&#41;
	&#123;
		int ret = Value;
		__asm__
		(
			"xchgl %&#91;ret&#93;, %&#91;Target&#93;"
			&#58; &#91;ret&#93; "+r" &#40;ret&#41;
			&#58; &#91;Target&#93; "m" (*Target&#41;
			&#58; "memory"
		);
		return ret;
	&#125;
#elif defined&#40;_MSC_VER&#41;
	typedef volatile long SpinLock&#91;1&#93;;
	typedef volatile long* const SpinLock_P;
	#include <intrin.h>
	#pragma intrinsic (_InterlockedExchange&#41; 
	#define LockedExchange&#40;Target,Value&#41; _InterlockedExchange&#40;Target,Value&#41;
#else
	#error Unspported Compiler
#endif

#define IsLocked&#40;s&#41; (&#40;s&#41;&#91;0&#93;)
#define SetLocked&#40;s,boolean&#41; (&#40;s&#41;&#91;0&#93;=&#40;boolean&#41;)
#define ResetSpinLock&#40;s&#41; SetLocked&#40;s,0&#41;
static INLINE void volatile Release&#40;SpinLock_P s&#41; &#123;SetLocked&#40;s,0&#41;;&#125;
static INLINE int volatile TryLock&#40;SpinLock_P s&#41; &#123;return !&#40;IsLocked&#40;s&#41; || LockedExchange&#40;s,1&#41;);&#125; 
static INLINE void volatile Lock&#40;SpinLock_P s&#41; &#123;while&#40;IsLocked&#40;s&#41; || LockedExchange&#40;s,1&#41;);&#125;
This does a test first before trying to do an interlocked exchange. However, this method is a tad bit slower when there's low contention but the overhead is relativly nothing in such cases. You can read up on high-performance locking here:
http://www.intel.com/cd/ids/developer/a ... 333935.htm
is there anything (paper, website etc) that describes some fundamentals of multi-threades game playing programs? i found a thread which said that the thesis by valavan manohararajah was a good starting point, so i read it - but i don't think it's a good starting point at all :-(
In my opinion, this is the best starting place:
http://www.netlib.org/utk/lsi/pcwLSI/text/node351.html
no pseudocode for the actual splitting operations, no discussion of sharing stuff etc. is there anything else, preferably on a lower level?
It's slow and boring but for this I read up online on Windows API and POSIX Threads API.
cheers
martin
Good luck with your parallel engine. 8-)

User avatar
Steve Maughan
Posts: 1061
Joined: Wed Mar 08, 2006 7:28 pm
Location: Florida, USA
Contact:

Re: multithreading questions

Post by Steve Maughan » Thu Aug 09, 2007 2:04 pm

Martin,

I haven't written a multi-threaded chess program but I have thought of the issue you raise.

This is how I am planning to tackle the problem. Whenever a thread wants to probe or write to a hash entry it first increments a integer (call it "lock") which is part of the hash table entry. The thread then tests to see if lock = 1. If it is true it proceeds to use (i.e. read or write) the hash entry. After it is finished with the hash entry the thread decrements the lock flag. If on initially testing lock, it is > 1 it knows that the entry is being used by another thread and acts accordingly.

Regards,

Steve

Aleks Peshkov
Posts: 870
Joined: Sun Nov 19, 2006 8:16 pm
Location: Russia

Re: multithreading questions

Post by Aleks Peshkov » Thu Aug 09, 2007 2:18 pm

Integer flag decrement is not thread safe. If you make it safe it will be very slow.

User avatar
Steve Maughan
Posts: 1061
Joined: Wed Mar 08, 2006 7:28 pm
Location: Florida, USA
Contact:

Re: multithreading questions

Post by Steve Maughan » Thu Aug 09, 2007 2:26 pm

Aleks,
Aleks Peshkov wrote:Integer flag decrement is not thread safe. If you make it safe it will be very slow.
I don't know that much about multi-core programming so I'm sure you're right. Nevertheless could you expand upon this? How is it not safe? What would happen if you implemented a raw integer decrement?

Steve

User avatar
hgm
Posts: 23723
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: multithreading questions

Post by hgm » Thu Aug 09, 2007 2:48 pm

Aleks Peshkov wrote:Integer flag decrement is not thread safe. If you make it safe it will be very slow.
I also never have done any multi-core programming, but what would be wrong in simply using the inc/dec instruction with a lock prefix (or using xchg, which is always locked)? Being only a single instruction, that cannot be disastrously slow, can it?

Post Reply