LIFO stack based parallel processing?

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
smatovic
Posts: 931
Joined: Wed Mar 10, 2010 9:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic
Contact:

LIFO stack based parallel processing?

Post by smatovic » Thu Sep 22, 2011 1:27 pm

Have some thoughts about a LIFO stack based parallel processing scheme,
so one thread gets one position from the stack and puts the generated childs back on the stack,

The point is that i dont have a clue how to handle the scores for such an process.

Does somebody have an idea or an hint?

--
Srdja

bob
Posts: 20639
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: LIFO stack based parallel processing?

Post by bob » Thu Sep 22, 2011 5:21 pm

smatovic wrote:Have some thoughts about a LIFO stack based parallel processing scheme,
so one thread gets one position from the stack and puts the generated childs back on the stack,

The point is that i dont have a clue how to handle the scores for such an process.

Does somebody have an idea or an hint?

--
Srdja
First thought is that it is a gigantic race condition that is going to end up spending a lot of time waiting for access to that "stack". I think the more traditional approach offers better performance and less scaling issues.

smatovic
Posts: 931
Joined: Wed Mar 10, 2010 9:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic
Contact:

Re: LIFO stack based parallel processing?

Post by smatovic » Thu Sep 22, 2011 6:25 pm

First thought is that it is a gigantic race condition that is going to end up spending a lot of time waiting for access to that "stack".
You are right,

i thought the builtin atomic memory operations of OpenCL could handle this,
but a simple testprogramm shows the race conditions.

--
Srdja

joca

Re: LIFO stack based parallel processing?

Post by joca » Fri Sep 23, 2011 7:55 pm

use an elimination-backoff stack (a stack with an elimination array and exponential backoff). if well implemented has low contention and is highly parallelizable

a version is described in the book "The Art of Multiprocessor Programming" (http://www.elsevier.com/wps/find/bookde ... escription). you can probably find some slides about it in some parallel programming course available on the web

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 6:43 pm

Re: LIFO stack based parallel processing?

Post by sje » Fri Sep 23, 2011 8:11 pm

bob wrote:
smatovic wrote:Have some thoughts about a LIFO stack based parallel processing scheme,
so one thread gets one position from the stack and puts the generated childs back on the stack,

The point is that i dont have a clue how to handle the scores for such an process.
First thought is that it is a gigantic race condition that is going to end up spending a lot of time waiting for access to that "stack". I think the more traditional approach offers better performance and less scaling issues.
I've been thinking about a similar scheme. I think it's possible to avoid race, and the stack can be guarded by a spinlock. To reduce overall waiting, only stack entries with draft greater than N (N is say, 4 or so) are allowed.

bob
Posts: 20639
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: LIFO stack based parallel processing?

Post by bob » Fri Sep 23, 2011 9:49 pm

sje wrote:
bob wrote:
smatovic wrote:Have some thoughts about a LIFO stack based parallel processing scheme,
so one thread gets one position from the stack and puts the generated childs back on the stack,

The point is that i dont have a clue how to handle the scores for such an process.
First thought is that it is a gigantic race condition that is going to end up spending a lot of time waiting for access to that "stack". I think the more traditional approach offers better performance and less scaling issues.
I've been thinking about a similar scheme. I think it's possible to avoid race, and the stack can be guarded by a spinlock. To reduce overall waiting, only stack entries with draft greater than N (N is say, 4 or so) are allowed.
Spinlocks ALWAYS solve races, but do you want a common data structure that has to be locked and unlocked frequently? Not to mention it doesn't really sound like a viable way to make it work cleanly...

User avatar
Zach Wegner
Posts: 1922
Joined: Wed Mar 08, 2006 11:51 pm
Location: Earth
Contact:

Re: LIFO stack based parallel processing?

Post by Zach Wegner » Fri Sep 23, 2011 9:54 pm

You could easily make a lockless queue and not have any spinlocks.

The problem is that with alpha-beta you need to delete elements from the middle of the queue. So I don't think it could be done in any efficient way...

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 6:43 pm

Re: LIFO stack based parallel processing?

Post by sje » Sat Sep 24, 2011 1:17 am

My experiments with using spinlocks on transposition table regions (1/256 segments) shows access degradation times of less than one percent.

A LIFO stack implemented as a two way linked list requires changing only a few pointers during a lock period. Queue elements can be dynamically allocated with unused ones getting attached to a free list for re-use.

smatovic
Posts: 931
Joined: Wed Mar 10, 2010 9:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic
Contact:

Re: LIFO stack based parallel processing?

Post by smatovic » Sat Sep 24, 2011 6:32 pm

A LIFO stack implemented as a two way linked list requires changing only a few pointers during a lock period. Queue elements can be dynamically allocated with unused ones getting attached to a free list for re-use.
That sounds good, i will give it a try.

--
Srdja

smatovic
Posts: 931
Joined: Wed Mar 10, 2010 9:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic
Contact:

Re: LIFO stack based parallel processing?

Post by smatovic » Tue Sep 27, 2011 12:52 pm

That sounds good, i will give it a try.
...try and fail.

It is nearly impossible to implement a parallel LIFO stack on a GPU considering that threads inside a SIMD Unit are again divided into Warps/Wavefronts.

--
Srdja

Post Reply