If making an SMP engine, do NOT use processes.

Zach Wegner · Post by **Zach Wegner** » Thu Feb 07, 2008 2:56 am

Well, not too long ago, I found a huge bug in my program's SMP mode, that was causing illegal moves in the PV. It turns out that I was launching the child process(es) before I initialized the bitboard attack data. Thus, the sliders have no attacks, and thus the king cannot be in check...

Just now, I was tuning my split point selection code, and I was trying to make it based on the iteration. I got some strange behavior before I realized that current_depth is a global variable that only gets modified in the master process...

Now I'm trying to decide whether to bite the bullet and convert my whole program to threads (mostly just find and replace board. to board->), or just deal with it for now. I know some people are waiting on my program.

JUST USE THREADS!!!

Edsel Apostol · Post by **Edsel Apostol** » Thu Feb 07, 2008 3:13 am

Hi Zach,

I hope you could fix those bugs. Your open source release would be a good contribution to the community as it will be the first real DTS implementation.

Alessandro Scotti · Post by **Alessandro Scotti** » Thu Feb 07, 2008 6:33 pm

I have been using threads for years and I feel more comfortable with them, but the cases you mention looks more like simple bugs to me, which can happens both with processes and threads (e.g. maybe now you have variables that work because they are NOT shared and switching to threads will break this)... SMP will always be quite hard either way, and even more so with DTS!

bob · Post by **bob** » Thu Feb 07, 2008 7:21 pm

Zach Wegner wrote:Well, not too long ago, I found a huge bug in my program's SMP mode, that was causing illegal moves in the PV. It turns out that I was launching the child process(es) before I initialized the bitboard attack data. Thus, the sliders have no attacks, and thus the king cannot be in check...

Just now, I was tuning my split point selection code, and I was trying to make it based on the iteration. I got some strange behavior before I realized that current_depth is a global variable that only gets modified in the master process...

Now I'm trying to decide whether to bite the bullet and convert my whole program to threads (mostly just find and replace board. to board->), or just deal with it for now. I know some people are waiting on my program.

JUST USE THREADS!!!

Or, use processes correctly.

I used to use threads but they have some serious compatibility issues across a wide variety of platforms, whereas in unix, fork() works the same way every time.

You just have to learn how to use global memory. I use the AT&T SYSV stuff (shmget, shmat, etc) and it works on every unix platform I have run on, unlike threads where there is always an issue here and there to deal with.

The problem you had is one that is well-known, and you just need experience to not fall into that. It took me a couple of hours to remove posix thread support and change over to processes. I did this a few years back when using an 8-way AMD box that was misbehaving in the posix threads library somewhere.

Fortunately, on unix, the memory footprint doesn't change because of the copy-on-write way things get duplicated after a fork() call. Data initialized before the fork, and not modified after the fork, gets shared among all processes just as it would with threads. Ditto for the executable code... So there is no real benefit to doing it either way, except that processes is probably safer for the beginner because you only share what you explicitly share. With threads, _everything_ is shared, which leads to lots of unexpected bugs.

Zach Wegner · Post by **Zach Wegner** » Fri Feb 08, 2008 3:03 am

bob wrote:Or, use processes correctly. I used to use threads but they have some serious compatibility issues across a wide variety of platforms, whereas in unix, fork() works the same way every time.

You just have to learn how to use global memory. I use the AT&T SYSV stuff (shmget, shmat, etc) and it works on every unix platform I have run on, unlike threads where there is always an issue here and there to deal with.

The problem you had is one that is well-known, and you just need experience to not fall into that. It took me a couple of hours to remove posix thread support and change over to processes. I did this a few years back when using an 8-way AMD box that was misbehaving in the posix threads library somewhere.

Fortunately, on unix, the memory footprint doesn't change because of the copy-on-write way things get duplicated after a fork() call. Data initialized before the fork, and not modified after the fork, gets shared among all processes just as it would with threads. Ditto for the executable code... So there is no real benefit to doing it either way, except that processes is probably safer for the beginner because you only share what you explicitly share. With threads, _everything_ is shared, which leads to lots of unexpected bugs.

Yeah, it is an "in front of the monitor" error, but it can get irritating. Also note that these are the only two problems I've had, unless you count trying to get shared memory to work in OS X and scrapping it for mmap().

I think Allessandro is right in that it's a problem either way, processes or threads. Whenever I've used threads in the past though, it seemed easier to keep track of the variables that are going to need to be thread-safe. There are many issues with shared memory that are a bit harder to deal with, like, do I make history shared? If not, how do I clear it? I'd rather not send a message to each process telling it to clear it. If I do share it, I also have a variable history_counter that keeps track of the highest value in the table. Now I need to put that in a shared struct, or put a pointer to it, and that makes my single processor code messier.

Also, I've heard you mention portability problems both with pthreads and mmap(). What are they?

Oh, and Edsel, those bugs are now long gone. I just have to find all the other ones!

jswaff · Post by **jswaff** » Fri Feb 08, 2008 3:32 pm

Edsel Apostol wrote:Hi Zach,

I hope you could fix those bugs. Your open source release would be a good contribution to the community as it will be the first real DTS implementation.

He will probably beat me, but he's not the only one working on DTS.

Prophet already uses YBWC, thanks in large part to Tord's Viper. I just finished writing an iterative version of my search and am working on DTS now. So, the 3.0 release of Prophet will be DTS.

--
James

Zach Wegner · Post by **Zach Wegner** » Fri Feb 08, 2008 3:48 pm

jswaff wrote:
Edsel Apostol wrote:Hi Zach,

I hope you could fix those bugs. Your open source release would be a good contribution to the community as it will be the first real DTS implementation.
He will probably beat me, but he's not the only one working on DTS.
Prophet already uses YBWC, thanks in large part to Tord's Viper. I just finished writing an iterative version of my search and am working on DTS now. So, the 3.0 release of Prophet will be DTS.

--
James

Be prepared to wait. I've used an iterative search for a few years now, and only very recently has my DTS search been completed. I still have a bit of work to do, and the speedup is still very bad. I think because Bob's paper leaves so much out of the details it is extremely hard. That's one reason I want to release ZCT, because it would have been a huge help to me to have before! The difference in reading about parallel search and seeing an implementation is huge. Looking at code, you can see exactly what needs to be done. Whenever I've found a non-obvious bug, I've put a comment that explains why the code is there.

I hope it helps you guys!

bob · Post by **bob** » Fri Feb 08, 2008 5:51 pm

Zach Wegner wrote:
bob wrote:Or, use processes correctly. I used to use threads but they have some serious compatibility issues across a wide variety of platforms, whereas in unix, fork() works the same way every time.

You just have to learn how to use global memory. I use the AT&T SYSV stuff (shmget, shmat, etc) and it works on every unix platform I have run on, unlike threads where there is always an issue here and there to deal with.

The problem you had is one that is well-known, and you just need experience to not fall into that. It took me a couple of hours to remove posix thread support and change over to processes. I did this a few years back when using an 8-way AMD box that was misbehaving in the posix threads library somewhere.

Fortunately, on unix, the memory footprint doesn't change because of the copy-on-write way things get duplicated after a fork() call. Data initialized before the fork, and not modified after the fork, gets shared among all processes just as it would with threads. Ditto for the executable code... So there is no real benefit to doing it either way, except that processes is probably safer for the beginner because you only share what you explicitly share. With threads, _everything_ is shared, which leads to lots of unexpected bugs.
Yeah, it is an "in front of the monitor" error, but it can get irritating. Also note that these are the only two problems I've had, unless you count trying to get shared memory to work in OS X and scrapping it for mmap().

I think Allessandro is right in that it's a problem either way, processes or threads. Whenever I've used threads in the past though, it seemed easier to keep track of the variables that are going to need to be thread-safe. There are many issues with shared memory that are a bit harder to deal with, like, do I make history shared? If not, how do I clear it? I'd rather not send a message to each process telling it to clear it. If I do share it, I also have a variable history_counter that keeps track of the highest value in the table. Now I need to put that in a shared struct, or put a pointer to it, and that makes my single processor code messier.

Also, I've heard you mention portability problems both with pthreads and mmap(). What are they?

Oh, and Edsel, those bugs are now long gone. I just have to find all the other ones!

The portability issues I have found are:

1. difficult to determine "CPU time" reliably. Some systems report it to you on a "per thread" basis, others report cumulative time. With processes you can always get per process usage.

2. how many physical processors do you use when you start 4 threads? Linux gives 4. Sun's Solaris uses 1 unless you use _another_ threads library function to tell it to use one physical processor per thread. Other systems break when you use this call.

3. Even starting threads has some quirks in how the lone argument is passed, what can be passed, and some systems crash if you are not precise in what you do, and what you do can be different for some systems.

4. If you start and terminate threads, you can run into difficulties when the underlying thread library doesn't really terminate threads but you thought they were wiped out. So the next thread creation uses an old thread with uninitialized local memory and another zap happens.

5. Perhaps the biggest problem for new parallel programmers is that with threads, _everything_ is shared. Even "local memory". Which means any thread can blow out another thread with a simple programming bug, which can make debugging difficult.

6. Threads share everything, including file descriptors and such, which can complicate debugging if you try to write debugging output to a file because there is too much of it going to the screen.

Using processes, the main point becomes carefully thinking about what is shared. Things that you initialize before the fork() don't need to be shared unless they later get modified. If this happens, they must be shared or each process will continue to use the old/original value, not the new one. But now, nobody can bother anybody else's local memory. So the "bugs" become inverted in a way. With threads you share too much, with processes you can share too little. Either way it won't work correctly.

I would not claim one approach is any better or worse than the other, except that I run on so many different platforms, threads were problematic enough that I decided to do away with them. Threads are simpler from a programming standpoint, but then they offer more opportunities for subtle bugs since _everything_ is visible in all threads. The main advantage I have seen for processes is that I have not had any case where they have not worked on a unix platform, whereas with threads I had problems here and there. Remember that I started out with threads, but after running on several different platforms for WCCC-type events, I decided to use something that I think is a bit more complicated (processes) over something that was easier to use but more problematic over a wide range of platforms.

It also greatly cut down my tech support type email queries.

Zach Wegner · Post by **Zach Wegner** » Fri Feb 08, 2008 6:48 pm

bob wrote:The portability issues I have found are:

1. difficult to determine "CPU time" reliably. Some systems report it to you on a "per thread" basis, others report cumulative time. With processes you can always get per process usage.

2. how many physical processors do you use when you start 4 threads? Linux gives 4. Sun's Solaris uses 1 unless you use _another_ threads library function to tell it to use one physical processor per thread. Other systems break when you use this call.

3. Even starting threads has some quirks in how the lone argument is passed, what can be passed, and some systems crash if you are not precise in what you do, and what you do can be different for some systems.

4. If you start and terminate threads, you can run into difficulties when the underlying thread library doesn't really terminate threads but you thought they were wiped out. So the next thread creation uses an old thread with uninitialized local memory and another zap happens.

Yeah, those are pretty bad. I don't plan on running on Solaris any time soon though.

I suppose I'll stick with processes for now.

5. Perhaps the biggest problem for new parallel programmers is that with threads, _everything_ is shared. Even "local memory". Which means any thread can blow out another thread with a simple programming bug, which can make debugging difficult.

For me, debugging with processes is maybe the worst aspect. I use gdb, and the only way I know of to debug with processes is to have multiple windows with each running different copies of gdb. This can be very tedious.

Using processes, the main point becomes carefully thinking about what is shared. Things that you initialize before the fork() don't need to be shared unless they later get modified. If this happens, they must be shared or each process will continue to use the old/original value, not the new one. But now, nobody can bother anybody else's local memory. So the "bugs" become inverted in a way. With threads you share too much, with processes you can share too little. Either way it won't work correctly.

Indeed. I try to keep all of my "engine state" stuff in the master process, and only read/write it from there. The children can update their statistics in shared memory. Sometimes it just gets a little complicated.

mridul · Post by **mridul** » Tue Feb 12, 2008 5:26 pm

There are good reasons why some programs would consider an MT approach and others would go for an MP design. But, not designing code properly always going to get you bugs ... irrespective of whether you use threads or processes

It is easier to spot bugs if using MP than MT ... but you would have figured that out already.

- Mridul

If making an SMP engine, do NOT use processes.

If making an SMP engine, do NOT use processes.

Re: If making an SMP engine, do NOT use processes.

Re: If making an SMP engine, do NOT use processes.

Re: If making an SMP engine, do NOT use processes.

Re: If making an SMP engine, do NOT use processes.

Re: If making an SMP engine, do NOT use processes.

Re: If making an SMP engine, do NOT use processes.

Re: If making an SMP engine, do NOT use processes.

Re: If making an SMP engine, do NOT use processes.

Re: If making an SMP engine, do NOT use processes.