Page 1 of 2

LIFO stack based parallel processing?

Posted: Thu Sep 22, 2011 3:27 pm
by smatovic
Have some thoughts about a LIFO stack based parallel processing scheme,
so one thread gets one position from the stack and puts the generated childs back on the stack,

The point is that i dont have a clue how to handle the scores for such an process.

Does somebody have an idea or an hint?

--
Srdja

Re: LIFO stack based parallel processing?

Posted: Thu Sep 22, 2011 7:21 pm
by bob
smatovic wrote:Have some thoughts about a LIFO stack based parallel processing scheme,
so one thread gets one position from the stack and puts the generated childs back on the stack,

The point is that i dont have a clue how to handle the scores for such an process.

Does somebody have an idea or an hint?

--
Srdja
First thought is that it is a gigantic race condition that is going to end up spending a lot of time waiting for access to that "stack". I think the more traditional approach offers better performance and less scaling issues.

Re: LIFO stack based parallel processing?

Posted: Thu Sep 22, 2011 8:25 pm
by smatovic
First thought is that it is a gigantic race condition that is going to end up spending a lot of time waiting for access to that "stack".
You are right,

i thought the builtin atomic memory operations of OpenCL could handle this,
but a simple testprogramm shows the race conditions.

--
Srdja

Re: LIFO stack based parallel processing?

Posted: Fri Sep 23, 2011 9:55 pm
by joca
use an elimination-backoff stack (a stack with an elimination array and exponential backoff). if well implemented has low contention and is highly parallelizable

a version is described in the book "The Art of Multiprocessor Programming" (http://www.elsevier.com/wps/find/bookde ... escription). you can probably find some slides about it in some parallel programming course available on the web

Re: LIFO stack based parallel processing?

Posted: Fri Sep 23, 2011 10:11 pm
by sje
bob wrote:
smatovic wrote:Have some thoughts about a LIFO stack based parallel processing scheme,
so one thread gets one position from the stack and puts the generated childs back on the stack,

The point is that i dont have a clue how to handle the scores for such an process.
First thought is that it is a gigantic race condition that is going to end up spending a lot of time waiting for access to that "stack". I think the more traditional approach offers better performance and less scaling issues.
I've been thinking about a similar scheme. I think it's possible to avoid race, and the stack can be guarded by a spinlock. To reduce overall waiting, only stack entries with draft greater than N (N is say, 4 or so) are allowed.

Re: LIFO stack based parallel processing?

Posted: Fri Sep 23, 2011 11:49 pm
by bob
sje wrote:
bob wrote:
smatovic wrote:Have some thoughts about a LIFO stack based parallel processing scheme,
so one thread gets one position from the stack and puts the generated childs back on the stack,

The point is that i dont have a clue how to handle the scores for such an process.
First thought is that it is a gigantic race condition that is going to end up spending a lot of time waiting for access to that "stack". I think the more traditional approach offers better performance and less scaling issues.
I've been thinking about a similar scheme. I think it's possible to avoid race, and the stack can be guarded by a spinlock. To reduce overall waiting, only stack entries with draft greater than N (N is say, 4 or so) are allowed.
Spinlocks ALWAYS solve races, but do you want a common data structure that has to be locked and unlocked frequently? Not to mention it doesn't really sound like a viable way to make it work cleanly...

Re: LIFO stack based parallel processing?

Posted: Fri Sep 23, 2011 11:54 pm
by Zach Wegner
You could easily make a lockless queue and not have any spinlocks.

The problem is that with alpha-beta you need to delete elements from the middle of the queue. So I don't think it could be done in any efficient way...

Re: LIFO stack based parallel processing?

Posted: Sat Sep 24, 2011 3:17 am
by sje
My experiments with using spinlocks on transposition table regions (1/256 segments) shows access degradation times of less than one percent.

A LIFO stack implemented as a two way linked list requires changing only a few pointers during a lock period. Queue elements can be dynamically allocated with unused ones getting attached to a free list for re-use.

Re: LIFO stack based parallel processing?

Posted: Sat Sep 24, 2011 8:32 pm
by smatovic
A LIFO stack implemented as a two way linked list requires changing only a few pointers during a lock period. Queue elements can be dynamically allocated with unused ones getting attached to a free list for re-use.
That sounds good, i will give it a try.

--
Srdja

Re: LIFO stack based parallel processing?

Posted: Tue Sep 27, 2011 2:52 pm
by smatovic
That sounds good, i will give it a try.
...try and fail.

It is nearly impossible to implement a parallel LIFO stack on a GPU considering that threads inside a SIMD Unit are again divided into Warps/Wavefronts.

--
Srdja