Parallel search: System-level programmin details

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Parallel search: System-level programmin details

Post by AlvaroBegue »

I am about to try to write a parallel alpha-beta search for the first time. My first idea is to implement a fairly vanilla version of YBWC, and I think I understand the algorithm well enough, but I am sure getting the details of the implementation right is also very important.

In particular, I think I know how to do this using multiple processes in Linux, where an idle process uses signals to ask a random process for work. But I understand there are many other options: One could use threads and "publish" the available split nodes using some sort of lockless queue, so other threads can get work from it. Or perhaps something with MPI so the transition to a cluster is more seamless (although for now I'd be happy to get something working just for SMP machines).

I would also like this to work on Windows with minimal changes, but if I understand correctly, Windows doesn't have inter-process signals...

Is there a standard approach to this? How do you guys do it?

Thanks!
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Parallel search: System-level programmin details

Post by jdart »

Most engines use threads, not processes (since they run on one physical machine and not a cluster of machines). If you don't intend to run on a cluster, threads are more efficient.

My implementation is open source (http://www.arasanchess.org) and includes Linux, Windows and MacOS support.

C++ 11 includes builit-in support for threads and synchronization primitives (I don't use this because my code pre-dates C++ 11).

--Jon
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Parallel search: System-level programmin details

Post by AlvaroBegue »

Thank you, Jon. I'll read your code and try to learn. By the little I have perused it, it seems to be pretty well written.
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Parallel search: System-level programmin details

Post by jdart »

Note there is actually a bug (Linux-specific) in one of my thread control primitives. The fixed version is below (will be in the next release):

Code: Select all

void ThreadControl::reset() {
#ifdef _WIN32
   if (!ResetEvent(hEvent1)) {
        cerr << "ResetEvent failed" << endl;
        ASSERT&#40;0&#41;;
   &#125;
#elif defined&#40;_MAC&#41;
   // use semctl, not semop, to mimic Windows behavior where                                              
   // multiple resets have no extra effect.                                                               
   union semun val;
   // set semaphore to 1                                                                                  
   val.val = 1;
   if &#40;semctl&#40;sem, 0, SETVAL, val&#41;) &#123;
      perror&#40;"semctl");
   &#125;
#else
   // lock semaphore                                                                                      
   while (!sem_trywait&#40;&sem&#41;) ;
   if &#40;errno == EAGAIN&#41; &#123;
      // expected, semaphore is locked                                                                    
   &#125; else &#123;
      perror&#40;"reset");
   &#125;
#endif
&#125;
zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Re: Parallel search: System-level programmin details

Post by zamar »

AlvaroBegue wrote:I am about to try to write a parallel alpha-beta search for the first time. My first idea is to implement a fairly vanilla version of YBWC, and I think I understand the algorithm well enough, but I am sure getting the details of the implementation right is also very important.

In particular, I think I know how to do this using multiple processes in Linux, where an idle process uses signals to ask a random process for work. But I understand there are many other options: One could use threads and "publish" the available split nodes using some sort of lockless queue, so other threads can get work from it. Or perhaps something with MPI so the transition to a cluster is more seamless (although for now I'd be happy to get something working just for SMP machines).

I would also like this to work on Windows with minimal changes, but if I understand correctly, Windows doesn't have inter-process signals...

Is there a standard approach to this? How do you guys do it?

Thanks!
Each thread has a boolean variable "idle". Every time after move is completed in high enough depth, check if there are threads with "idle = true". If so, create a split point, and assign all idle threads to it.

Not maybe the most efficient, but very simple method.
Joona Kiiski
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Parallel search: System-level programmin details

Post by AlvaroBegue »

zamar wrote:
AlvaroBegue wrote:I am about to try to write a parallel alpha-beta search for the first time. My first idea is to implement a fairly vanilla version of YBWC, and I think I understand the algorithm well enough, but I am sure getting the details of the implementation right is also very important.

In particular, I think I know how to do this using multiple processes in Linux, where an idle process uses signals to ask a random process for work. But I understand there are many other options: One could use threads and "publish" the available split nodes using some sort of lockless queue, so other threads can get work from it. Or perhaps something with MPI so the transition to a cluster is more seamless (although for now I'd be happy to get something working just for SMP machines).

I would also like this to work on Windows with minimal changes, but if I understand correctly, Windows doesn't have inter-process signals...

Is there a standard approach to this? How do you guys do it?

Thanks!
Each thread has a boolean variable "idle". Every time after move is completed in high enough depth, check if there are threads with "idle = true". If so, create a split point, and assign all idle threads to it.

Not maybe the most efficient, but very simple method.
I read a description of YBWC by Valavan Manohararajah in his thesis titled "Parallel Alpha-Beta Search on Shared Memory Multiprocessors". I quote:
A processor, P1, that is idle selects another processor, P2, at random and transmits a message requesting work. Processor P2 has work available if there is at least one node in the subtree it is examining that satisfies the weak YBWC criterion. That is, P2 has work available if it owns a node at which the eldest brother has been evaluated. The node that
satisfies the criterion becomes the split-point; if there are many nodes that satisfy the criterion, then the node that is the highest in the tree is selected as the split-point.
That's why I was thinking of an implementation using signals, since I would need to be able to interrupt what another worker is doing and ask for work. It seems like communicating the other way around is easier to implement, but perhaps you'll miss opportunities to split at a deeper node, if at the time when it became splittable there were no idle workers.
zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Re: Parallel search: System-level programmin details

Post by zamar »

AlvaroBegue wrote:
zamar wrote:
AlvaroBegue wrote:I am about to try to write a parallel alpha-beta search for the first time. My first idea is to implement a fairly vanilla version of YBWC, and I think I understand the algorithm well enough, but I am sure getting the details of the implementation right is also very important.

In particular, I think I know how to do this using multiple processes in Linux, where an idle process uses signals to ask a random process for work. But I understand there are many other options: One could use threads and "publish" the available split nodes using some sort of lockless queue, so other threads can get work from it. Or perhaps something with MPI so the transition to a cluster is more seamless (although for now I'd be happy to get something working just for SMP machines).

I would also like this to work on Windows with minimal changes, but if I understand correctly, Windows doesn't have inter-process signals...

Is there a standard approach to this? How do you guys do it?

Thanks!
Each thread has a boolean variable "idle". Every time after move is completed in high enough depth, check if there are threads with "idle = true". If so, create a split point, and assign all idle threads to it.

Not maybe the most efficient, but very simple method.
I read a description of YBWC by Valavan Manohararajah in his thesis titled "Parallel Alpha-Beta Search on Shared Memory Multiprocessors". I quote:
A processor, P1, that is idle selects another processor, P2, at random and transmits a message requesting work. Processor P2 has work available if there is at least one node in the subtree it is examining that satisfies the weak YBWC criterion. That is, P2 has work available if it owns a node at which the eldest brother has been evaluated. The node that
satisfies the criterion becomes the split-point; if there are many nodes that satisfy the criterion, then the node that is the highest in the tree is selected as the split-point.
That's why I was thinking of an implementation using signals, since I would need to be able to interrupt what another worker is doing and ask for work. It seems like communicating the other way around is easier to implement, but perhaps you'll miss opportunities to split at a deeper node, if at the time when it became splittable there were no idle workers.
That's definetily a downside, but in practice the effect is minor (at least when number of threads is at most 8). The most critical split happens when we are in PV-node and first move is completed. Then all threads are automatically available and start in parallel searching moves 2,3,4,5 (I assume number of threads is 4). This is the only split point until move n-3 is completed. Then other threads are searching moves n-2, n-1 and n. Thread which completed move n-3 is available as a helpful master.

In this case the schema I described doesn't behave optimally, but practice has shown that moves at the end of the move list are usually completed very quickly anyway, so even a split is a bit suboptimal in this case, the performance loss is minimalistic.

When computers with 16 or 32 threads become available, then we need to reconsider, maybe we'll even see a switch from YBW to DTS?
Joona Kiiski
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Parallel search: System-level programmin details

Post by AlvaroBegue »

I have one more question. I am using PVS, so I perform null-window searches for all the young brothers, and those are the searches that I would do in parallel. So what happens when one of those searches actually discovers that the move is better than its eldest brother? Of course we'll change the boundary to compare new nodes to. But, should the threads that are currently searching with the wrong boundary be stopped? Or is it OK to let them continue with the old boundary, and we'll re-search them if we have to?

At this point it would be great if someone could point me to a paper that goes into all these details. Or I risk asking too many questions and bothering everybody here...
zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Re: Parallel search: System-level programmin details

Post by zamar »

AlvaroBegue wrote:I have one more question. I am using PVS, so I perform null-window searches for all the young brothers, and those are the searches that I would do in parallel. So what happens when one of those searches actually discovers that the move is better than its eldest brother? Of course we'll change the boundary to compare new nodes to. But, should the threads that are currently searching with the wrong boundary be stopped?
I think this issue was discussed here some time ago and every body agreed that it didn't make any measurable difference.
Joona Kiiski
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Parallel search: System-level programmin details

Post by AlvaroBegue »

zamar wrote:
AlvaroBegue wrote:I have one more question. I am using PVS, so I perform null-window searches for all the young brothers, and those are the searches that I would do in parallel. So what happens when one of those searches actually discovers that the move is better than its eldest brother? Of course we'll change the boundary to compare new nodes to. But, should the threads that are currently searching with the wrong boundary be stopped?
I think this issue was discussed here some time ago and every body agreed that it didn't make any measurable difference.
Great. That makes things much easier, then. Thanks!