More Participants or More Number of Games?

Discussion of chess software programming and technical issues.

Moderator: Ras

Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: More Participants or More Number of Games?

Post by Daniel Shawul »

Did I say it worked on 32bit linux? I said the posix spinlock implementation is broken. When Lance suggested sometime ago that I should use spinlocks instead of mutexes, I just grabbed one from zct for posix without testing it. Anyway it was just a silly mistake of passing the address instead of the value of the lock which is fixed now.

Daniel
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: More Participants or More Number of Games?

Post by bob »

Daniel Shawul wrote:Red Hat linux 5. But the problem is the posix spinlock implementation is wrong so it won't work without the suggestion i made. Or if you still want to use the spinlocks just change the :"q" (x) to :"q" (&x) :)

Daniel
or just copy the one from crafty, which came from the linux kernel, as this works on every linux platform I have used...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: More Participants or More Number of Games?

Post by bob »

Daniel Shawul wrote:Did I say it worked on 32bit linux? I said the posix spinlock implementation is broken. When Lance suggested sometime ago that I should use spinlocks instead of mutexes, I just grabbed one from zct for posix without testing it. Anyway it was just a silly mistake of passing the address instead of the value of the lock which is fixed now.

Daniel
Do you mean passing the value rather than the address? The address is typically what you want to pass. Passing the value doesn't help at all, as you are trying to use the processor's clever cache / memory abiilty to make the xchg() "atomic" in operation...

Code: Select all

static void __inline__ LockX86(volatile int *lock) {
  int dummy;
  asm __volatile__(
      "1:          movl    $1, %0"   "\n\t"
      "            xchgl   (%1), %0" "\n\t"
      "            testl   %0, %0"   "\n\t"
      "            jz      3f"       "\n\t"
      "2:          pause"            "\n\t"
      "            movl    (%1), %0" "\n\t"
      "            testl   %0, %0"   "\n\t"
      "            jnz     2b"       "\n\t"
      "            jmp     1b"       "\n\t"
      "3:"                           "\n\t"
      :"=&q"(dummy)
      :"q"(lock)
      :"cc", "memory");
}
The above also addresses hyperthreading, where the "pause" will cause the currently spinning logical procesor to halt and switch to another thread so that I don't spin on the same physical core as another thread that actually holds the lock).
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: More Participants or More Number of Games?

Post by Daniel Shawul »

The asm spinlocks in linux don't work any better for me. I have been using mutexes and critical sections in the past and that is what I plan to keep on using . Spinlock in windows are slightly better than critical sections for me so that is why users have choice to make. The linux spinlocks are just added in v2.1 when you and Lance told me that I should try out spinlocks for linux as well, which I never got around to testing (and should have been removed before release). I did try in the past this other linux spinlock codes (again taken from Linus kernel) and ditched it. Tord (with the same spinlocks as yours) also found out that the spinlocks that you use are not that much faster in his system. For me too what the high level semaphores are good enough so far..

Code: Select all

/* spinlock.h - spinlocks for i386
                Files that include this file should be compiled with the flags
		  gcc -O2 -W -Wall ...
 */

#ifndef _SPINLOCK_H_
#define _SPINLOCK_H_

/*VVVVVVVVVVV This code was extracted from the Linux Kernel VVVVVVVVV*/

#ifdef __SMP__
#define LOCK_PREFIX "lock ; "
#else
#define LOCK_PREFIX ""
#endif
struct __dummy { unsigned long a[100]; };
#define ADDR (*(volatile struct __dummy *) addr)


extern __inline__ int test_and_set_bit(int nr, volatile void * addr)
{
         int oldbit;
 
         __asm__ __volatile__( LOCK_PREFIX
                 "btsl %2,%1\n\tsbbl %0,%0"
                 :"=r" (oldbit),"=m" (ADDR)
                 :"Ir" (nr));
         return oldbit;
 }

/*^^^^^^^^^^^ This code was extracted from the Linux Kernel ^^^^^^^^^*/

/* This implementation of spinlocks does not disable interrupts, so
   there is a danger of deadlocks. It uses busy waiting. It can be 
   used for mutual exclusion between processes. 
   The variable x below must be a pointer to a location in memory
   shared between the communicating processes
 */

Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: More Participants or More Number of Games?

Post by Daniel Shawul »

Code: Select all

Do you mean passing the value rather than the address? 
Yes. it was a typo. For cache coherence to work it should have the address of the variable not the value.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: More Participants or More Number of Games?

Post by bob »

Daniel Shawul wrote:The asm spinlocks in linux don't work any better for me. I have been using mutexes and critical sections in the past and that is what I plan to keep on using . Spinlock in windows are slightly better than critical sections for me so that is why users have choice to make. The linux spinlocks are just added in v2.1 when you and Lance told me that I should try out spinlocks for linux as well, which I never got around to testing (and should have been removed before release). I did try in the past this other linux spinlock codes (again taken from Linus kernel) and ditched it. Tord (with the same spinlocks as yours) also found out that the spinlocks that you use are not that much faster in his system. For me too what the high level semaphores are good enough so far..

It depends on how frequently you acquire/release locks. If you don't split near the tips (I can split all the way out to the tips in Crafty) then the benefit drops proportional to the frequency of lock accesses. If you've paid attention to past threads, and don't lock hash table entries of any kind, don't split anywhere near the tips, etc, then mutexes are fine. Spinlocks will always be more efficient, but the improvement can be almost immeasurable of you don't acquire/release locks very frequently.

Code: Select all

/* spinlock.h - spinlocks for i386
                Files that include this file should be compiled with the flags
		  gcc -O2 -W -Wall ...
 */

#ifndef _SPINLOCK_H_
#define _SPINLOCK_H_

/*VVVVVVVVVVV This code was extracted from the Linux Kernel VVVVVVVVV*/

#ifdef __SMP__
#define LOCK_PREFIX "lock ; "
#else
#define LOCK_PREFIX ""
#endif
struct __dummy { unsigned long a[100]; };
#define ADDR (*(volatile struct __dummy *) addr)


extern __inline__ int test_and_set_bit(int nr, volatile void * addr)
{
         int oldbit;
 
         __asm__ __volatile__( LOCK_PREFIX
                 "btsl %2,%1\n\tsbbl %0,%0"
                 :"=r" (oldbit),"=m" (ADDR)
                 :"Ir" (nr));
         return oldbit;
 }

/*^^^^^^^^^^^ This code was extracted from the Linux Kernel ^^^^^^^^^*/

/* This implementation of spinlocks does not disable interrupts, so
   there is a danger of deadlocks. It uses busy waiting. It can be 
   used for mutual exclusion between processes. 
   The variable x below must be a pointer to a location in memory
   shared between the communicating processes
 */

I don't see anything wrong with that at all. The interrupt warning is irrelevant for us,we are not using this in the kernel.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: More Participants or More Number of Games?

Post by Daniel Shawul »

yes i don't lock hash tables and never split below 4 plies. So that was why IIRC that Lance's suggestion did not have any impact when I tested it. What brought me slight speed up was avoiding the master-slave relation( A thread who initiated the split doesn't have to be the one to return from it ). I am also trying out other unique featrues of DTS like choosing the split block yourself when you go idle, instead of some other thread attaching you whenever it has the chance. The explanation about this on the DTS paper is not clear enough for me now though , but i will try to grasp the idea..

Code: Select all

I don't see anything wrong with that at all. The interrupt warning is irrelevant for us,we are not using this in the kernel.
It worked but was slow for me last time I tried it. The busy wait could sometimes cause significant delays especially in cases where they are not used as frequently as they "should be".
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: More Participants or More Number of Games?

Post by bob »

Daniel Shawul wrote:yes i don't lock hash tables and never split below 4 plies. So that was why IIRC that Lance's suggestion did not have any impact when I tested it. What brought me slight speed up was avoiding the master-slave relation( A thread who initiated the split doesn't have to be the one to return from it ). I am also trying out other unique featrues of DTS like choosing the split block yourself when you go idle, instead of some other thread attaching you whenever it has the chance. The explanation about this on the DTS paper is not clear enough for me now though , but i will try to grasp the idea..

Code: Select all

I don't see anything wrong with that at all. The interrupt warning is irrelevant for us,we are not using this in the kernel.
It worked but was slow for me last time I tried it. The busy wait could sometimes cause significant delays especially in cases where they are not used as frequently as they "should be".
I think I see the problem. I use what is known as a "shadow lock". I use the xchg() which is the "atomic instruction" that grabs the old value and replaces it with a new value. This does some complex (and very slow) cache synchronization. But once I notice that the lock is held by someone else, I go into a simple move/test (non-atomic) loop which does not do all the atomic locking stuff, the lock gets copied into my cache (a shadow of the lock) and I spin there until the cache snoops the fact that that word has been changed by someone else. I drop out of that fast loop and go back and try to do an xchg() again, which is again a slow operation...

That's what the 2 loops in my inlineasm are about...
krazyken

Re: More Participants or More Number of Games?

Post by krazyken »

Daniel Shawul wrote:I recently tried linux 64 bit and found a bug with lock implmentation. Spinlocks are broken there so you have to use mutexes. Just comment
#define USESPINLOCK in my_types.h and it should be work (i.e for the 2.1 version). I don't know if there are problems on Mac since i did not test on it as you pointed out. If there is a segfault there, it is most probably due to the locks.

DAniel
Thank you! It runs now. 2.1 claims to be 2.06 I'm only 4 hours into the first game, but no troubles so far.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: More Participants or More Number of Games?

Post by Don »

Ralf wrote:
bob wrote:Finding decent opponents is not easy. (1) not all run under linux. Most don't. (2) of those that do, many have trouble with fast time controls and lose too many games on time, which doesn't help me at all; (3) of those that are left, many are too weak. You don't learn anything playing programs that are more than 200 Elo weaker.
If these are your requirements for an opponent, you could also try the linux-version of Spike 1.2 Turin (http://spike.lazypics.de/downloads/Spik ... nux.tar.gz). At least we never heard of any problems regarding the time-management or stability. Its not open-source, though.

best wishes
Ralf
I use that version of Spike and I really like it. It give me no problems. And I like it because it is signficantly different than most programs.