interesting SMP bug

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

interesting SMP bug

Post by bob » Tue Sep 24, 2013 6:47 pm

I just found a bug I have been searching for for years. Thought I would relay it here just in case it might affect anyone using the same sort of idea...

I have a typical killer move array, two per ply. For quite a while I have been first trying the two killers for the current ply, then the two killers for two plies previous to this ply. Made a significant difference in the size of the tree.

After I search the hash move, then non-losing captures, and killers, I generate the rest of the moves, and for efficiency, I carefully excluded any previously searched killers from the current move list. I was doing a sanity test on a recent change, and noticed that I had missed WAC $266, which is a simple mate, taking 11 plies and basically no time. But the normal single-thread search version spotted it instantly. I started testing, and about 1 out of every 3-4 times, the parallel search version would miss it. The key move is Rxh2+ Bxh2 Ng3+ and on to mate. I was splitting at ply=1, and one thread followed the above to ply=3 where it needed to search everything. meanwhile, back at the ranch, the other parallel thread changed the killers for ply=1, and just happened to insert the move Ng3 as a killer. The first thread was busy searching at ply = 3, and when it got to the third move in the list to search, namely Ng3+, it concluded "already searched, it is a killer for ply=1, so skip searching it again.

Fix was simple, as I search killers, I save them in a local array, and when I start culling them from the full move list, I use this local array (up to 4 entries) to cull, rather than the killer list which can be changed by the other threads at a common split point.

ugh.

I went back and sure enough, a couple of other odd positions I had saved now no longer fail and overlook a forced repetition or else find what appears to be a forced repetition when there was a key escaping move that was getting culled as above.

One less bug, who knows how many remain...

I will release 23.7 once I run a full test on it, as it also has a couple of fixes for the skill command to stop the time losses some were seeing with skill=1 or such.

User avatar
mhull
Posts: 12496
Joined: Wed Mar 08, 2006 8:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: interesting SMP bug

Post by mhull » Tue Sep 24, 2013 7:08 pm

Have you determined the estimated ELO impact of this fix?
Matthew Hull

jdart
Posts: 3842
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: interesting SMP bug

Post by jdart » Tue Sep 24, 2013 9:05 pm

I have a very similar killer move scheme, except killers are per-thread, and the local killer list is cleared before a new search commences at a split point. This may be less efficient but it doesn't have the problem you mention.

--Jon

bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: interesting SMP bug

Post by bob » Tue Sep 24, 2013 9:26 pm

mhull wrote:Have you determined the estimated ELO impact of this fix?
I'd suspect minor... no effect on single-thread execution of course.

bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: interesting SMP bug

Post by bob » Tue Sep 24, 2013 9:30 pm

jdart wrote:I have a very similar killer move scheme, except killers are per-thread, and the local killer list is cleared before a new search commences at a split point. This may be less efficient but it doesn't have the problem you mention.

--Jon
I spent a lot of time fiddling with killers. I have per-thread killers, but at a split point, there is a common set of killers where any thread can update them. I try to "back up" the killers along with the best move and score, although I am not sure whether it is particularly helpful or not.

Certainly a non-obvious one however, until wac 266 came along. Always nice to find a position that causes a bug regularly, and only requiring a small amount of time to do so...

User avatar
michiguel
Posts: 6389
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Re: interesting SMP bug

Post by michiguel » Tue Sep 24, 2013 10:24 pm

jdart wrote:I have a very similar killer move scheme, except killers are per-thread, and the local killer list is cleared before a new search commences at a split point. This may be less efficient but it doesn't have the problem you mention.

--Jon
At the point of parallelization, I copy the killers array for the new thread. This is a very local information of I believe that the structure should be local.

Miguel

mvk
Posts: 589
Joined: Tue Jun 04, 2013 8:15 pm

Re: interesting SMP bug

Post by mvk » Tue Sep 24, 2013 11:22 pm

bob wrote:at a split point, there is a common set of killers where any thread can update them. I try to "back up" the killers along with the best move and score, although I am not sure whether it is particularly helpful or not..
Then what is the purpose of that?

bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: interesting SMP bug

Post by bob » Tue Sep 24, 2013 11:57 pm

mvk wrote:
bob wrote:at a split point, there is a common set of killers where any thread can update them. I try to "back up" the killers along with the best move and score, although I am not sure whether it is particularly helpful or not..
Then what is the purpose of that?
Keeping the killers from the largest of the parallel sub-trees, the one that produced the best move. Nearly impossible to test to see if it is better or worse, but since I don't zero/clear killers for obvious reasons, seemed reasonable to keep the best of the killers for the next search...

bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: interesting SMP bug

Post by bob » Tue Sep 24, 2013 11:59 pm

michiguel wrote:
jdart wrote:I have a very similar killer move scheme, except killers are per-thread, and the local killer list is cleared before a new search commences at a split point. This may be less efficient but it doesn't have the problem you mention.

--Jon
At the point of parallelization, I copy the killers array for the new thread. This is a very local information of I believe that the structure should be local.

Miguel
I do that as well. But at the actual SPLIT point, the killers are shared, as is the move list and everything else for that particular node/ply. While I agree it is local, I believe that keeping something globally available is better than starting from zero...

I might run a few test positions 10,000 times each way to see which produces the smaller tree, if either is better.

rtitle
Posts: 33
Joined: Wed Aug 14, 2013 5:23 pm

Re: interesting SMP bug

Post by rtitle » Fri Sep 27, 2013 11:09 am

There is a broader question related to this that I've been wondering about as I write my (multithreaded) chess engine: (1) What needs to be per-thread, what needs to be global?, and (2) For that which needs to be global, how does one protect it?

For (1), obviously the position being searched needs to be per-thread. And if you have a killer table, then it needs to be per thread (that was the bug Bob described in the initial post, he had incorrectly made it global then fixed it).

Now, let's talk about the transposition table - per-thread or global?. That wants to be global, right? I mean, it doesn't seem like you'd want to clone the transposition table to each thread. But if it's global, how do you guys protect it so it doesn't get corrupted by multiple threads writing, and so that a read doesn't get corrupted results by another thread writing? Take out read/write locks? That seems expensive. Lock-free hash algorithms? They exist but are complex to implement. Keep the entries small enough to allow for atomic reads and writes? That could work but constrains the entry size. Or do you just do unprotected reads and writes of TT entries from/to the transposition table from multiple threads, and pray for the best? :-)

Rich

Post Reply