Lazy SMP and Work Sharing

dchoman · Post by **dchoman** » Wed Jul 03, 2013 1:53 pm

This topic is a follow-up to a question I got in the "General" forum when I released the latest version of EXchess. In the release notes, I mentioned an improvement to the Lazy SMP scheme, and I wanted to provide a little more information for those that might be interested.

I changed the work-sharing approach to Lazy SMP in EXchess to be closer to the ABDADA model after Daniel Shawul's posts about his tests on the subject. However, I didn't want to use a hash table to keep a counter for the threads working on a given position. My hash table already had a 16 byte long entry, so I didn't want to expand it, and I also didn't like the idea of having to make each move before seeing whether another thread was searching it.

So as an alternative to the hash table, I made a simple work sharing data structure. In the end, it was just a single hash key for a position which is OR'd with the move being searched and the depth of the search. I use this to keep track of the move being searched at each ply of the tree for a given thread. Then, before I search a move at a given ply, I can just check the same ply in the other threads to see that the move is not already being worked on at that depth. If it is, then the move get placed at the end of the move list to be searched last. This doesn't allow for transpositions, but I expected the most likely work collisions to be as the threads are walking the PV where they will initially all be the same. Indeed, this scheme only helps in the PV, and if I check for work sharing in non-PV nodes, it only slows things down a bit.

My previous Lazy SMP work sharing was just to alternate moves (odd-even) in the root node for the odd-even threads. The above approach is about 10% faster than my previous scheme, and I get a time-to-depth improvement of roughly 1.65 for 2 threads compared to 1 thread and roughly 2.5 for 4 threads compared to 1 thread. Maybe not quite as good as ABDADA with a hash table counter, but not too bad for the simplicity.

- Dan

dchoman · Post by **dchoman** » Wed Jul 03, 2013 2:26 pm

dchoman wrote:... In the end, it was just a single hash key for a position which is OR'd with the move being searched and the depth of the search.

Of course, I meant XOR'd, and alpha and beta are included in the signature as well to be sure those match.

I also just realized that a possible improvement would be to communicate cutoffs when they are found in threads searching moves in the same position. This way the thread that didn't find the cutoff doesn't have to wait until the end of its movelist to discover it.

- Dan

Joerg Oster · Post by **Joerg Oster** » Wed Jul 03, 2013 3:23 pm

Nice improvement and thanks for sharing.
Keep up the good work!

Michel · Post by **Michel** » Wed Jul 03, 2013 3:50 pm

However, I didn't want to use a hash table to keep a counter for the threads working on a given position. My hash table already had a 16 byte long entry, so I didn't want to expand it,

Well Daniel outlined an approach where a single flag in the TT is sufficient to indicate that a node is being searched exclusively. I was surprised by this but at that time it seemed to be correct.

http://talkchess.com/forum/viewtopic.ph ... t&start=19

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 03, 2013 6:04 pm

Nice! I haven't looked at it carefully but it looks like a good simplification if one doesn't want to traverse over the move list twice like I did. I use the existing score array to flag if a move was already searched, but you just move it to the end of the list so no going back. Admittedly the way I did is susceptible to bugs compared to using a separate data structure (which i gather is list of hashkeys not hashtable in your case). Also as Michel noted, there is no need to store counters and my hash table actually was as is before I implemented ABDADA.
Cheers

Michel · Post by **Michel** » Wed Jul 03, 2013 6:17 pm

Actually there is an annoying problem with using the main hash table for ABDADA. It relies on the search threads leaving the counter/flag in a consistent state.

However I use setjump/longjump to handle timeouts which seems to be incompatible with this. So a small separate data structure which can be reinitialized at the start of every search seems like an attractive proposition.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 03, 2013 6:41 pm

Michel wrote:Actually there is an annoying problem with using the main hash table for ABDADA. It relies on the search threads leaving the counter/flag in a consistent state.

I don't think so. A hash table entry that is being used for work sharing can even be replaced in my case and everything goes well. The move is simply searched again by other processors. Basically every move gets searched by every processor unlike YBW where a processor searches only part of the move list. It you use counters, it is also the same as that is used only to allow speculative parallelization. A processor can disconnect and things work well as long as there is one worker (not necessarily the master). For failure tolerant YBW you have to pass through a lot of hoops and even after that the master has to be active. I have not analyzed DH's simplification to see if it preserves this property.

However I use setjump/longjump to handle timeouts which seems to be incompatible with this. So a small separate data structure which can be reinitialized at the start of every search seems like an attractive proposition.

I didn't need to do that but maybe it is needed for the case of separate data structures. The two step move list traversal is probably the biggest hurdle for most so this proposition is a good alternative.

Michel · Post by **Michel** » Wed Jul 03, 2013 6:47 pm

Actually there is an annoying problem with using the main hash table for ABDADA. It relies on the search threads leaving the counter/flag in a consistent state.

However I use setjump/longjump to handle timeouts which seems to be incompatible with this.

Actually I was too quick to post. I now realize that the age field may be used to recognize hash entries with a stale "exclusive flag". So this is probably not an issue after all.

Lazy SMP and Work Sharing

Lazy SMP and Work Sharing

Re: Lazy SMP and Work Sharing

Re: Lazy SMP and Work Sharing

Re: Lazy SMP and Work Sharing

Re: Lazy SMP and Work Sharing

Re: Lazy SMP and Work Sharing

Re: Lazy SMP and Work Sharing

Re: Lazy SMP and Work Sharing