Page 1 of 1

An MPI perft program

Posted: Sun Apr 05, 2015 1:18 pm
by nkg114mc
Hi all,

Recently I am trying to write a MPI-based perft. I do this as an exercise of understanding the framework of the MPI-based parallel alpha-beta search. So I choose perft because it is relatively easier, and is also easy to verify the correctness.

My program starts from Viper. The implementation of my parallel perft framework is a mixture of Viper and Scorpio. I use the idea that Viper implements for the SMP based parallel search, but also borrow a lot of code from Scorpio, especially for the code that involved MPI. Really appreciate the nice works by Tord and Daniel :). I just publish the code on Github (https://github.com/nkg114mc/viper-cluster) for some one who might be interested. The most of my work is in cluster.cpp and main.cpp, and I only test it under Ubuntu Linux. But actually, it was just able to give some correct outputs sometimes (not always) in the recent days, and there is still a lot of problems. So it is still far away from complete.

I also once try to simplify Scorpio to get such a perft program, but I did not understand the code of Scorpio very well, so my modification always makes Scorpio out of work. Not sure whether there is some other prior works about this.

One of the most headache issue is that, sometimes the slave process is just stuck there without doing anything. I suspect that this was caused by MPI_Probe (not MPI_IProbe), because I use this to wait for the response of “OFFERHELP” message. Moreover, the “helpful master” never happens. I still have no idea what’s the reason of this.

Any advice or comments would be welcome. Thanks!

Re: An MPI perft program

Posted: Sun Apr 05, 2015 5:22 pm
by Daniel Shawul
Nice work! I can see you have put in a lot of effort into it. I am glad someone found my cluster implementation useful :)

The "helpful master" concept works in Scorpio's smp code but not in the cluster implementation. Once a node becomes a slave to another after sending a "HELP" message it remains so until it is relieved of duty with a "CANCEL" message.

I don't use MPI_Probe so I can't comment on that. How does the "OFFERHELP" message differ from "HELP"?

Something I just added recently is a "Distributed Transposition Table", in which the transposition tables entries are shared on multiple nodes instead of a global transposition table stored on one node, or local ones. Distributed TT avoids problems of the other two: global TT increases memory traffic on one node, local TT is inefficient because TT entries are not shared. Besides global TT's number of entries grows linearly with number of nodes. This maybe useful for cluster-perft, but probably not a priority.

Nebiyu has a cluster implementation of montecarlo perft estimator but not one for calculating exact perft values. I am sure I can modify the code to do a that but one can simply construct a list of jobs and submit multiple jobs. This is what Steven and others do after removing duplicates, but I understand your goal is to have a framework for alpha-beta.

regards,
Daniel

Re: An MPI perft program

Posted: Mon Apr 06, 2015 12:19 pm
by nkg114mc
Hi Daniel,

Thanks for you comment! I really learned a lot from Scorpio~ If you see the communication procedure of my program, it is actually almost the same as Scorpio. Firstly, a slave host randomly choose another host to ask “whether a help is needed” by sending OFFERHELP (which is just HELP msg in Scorpio). Then the destination host reply ACCHELP if it really needs help (or DECLINE if it doesn't). Then this slave host becomes an avaliable host worker for the master host. Master host could construct a split point and sent INIT msg to initialize the position in slave. The slave replies an empty MERGE message (without any useful information) to ask for moves. The master then replies with SPLIT (just the same as the SPLIT in Scorpio) wiith a move, or CANCEL when there is no more moves. The slave host will leave this parallel split when it got CANCEL, and run into idle loop and ask for other host about needing help or not. That’s just a breif description about my design. It might not be accurate enough to explain the whole framework with just one reply. I will write a document in the future for this, but before that, I need the fixed the bugs and make sure that this framework can run stably.

For the “helpful master”, I agree that it would be better to turn if off now, since currently it always makes troubles and never works. But I still hope that I can make it work at some day in the future.

The distributed TT is really a great idea. I think the non-shared TT is probably the most significant disadvantage of cluster-based distributed search compared to SMP-based. I once think about this problem, but did not got a princle solution. Glad to hear that you have started the work on this, and really curious to know how it benefits the distributed search :)

One of the problem that I found recently is that, the position in the INIT message recieved by a slave host is not exactly the same as the postion in the master host. I am still debuging about this. Maybe directly senting the position structure is not a good idea. I will consider the “initial postion + played moves” design in Scorpio if this bug can not be resolved.

Re: An MPI perft program

Posted: Wed Apr 08, 2015 1:06 pm
by nkg114mc
Hi Daniel,

Just want to ask one more question about Scorpio by the way: is it possible to turn the code about SMP search off in Scorpio, but still keep the cluster search code? These days I am aslo trying to make a Scorpio-perft version running on the clusters. Turning the SMP search off might help me to understand the cluster framework better.

Thanks!

Re: An MPI perft program

Posted: Wed Apr 08, 2015 8:10 pm
by Daniel Shawul
The slave replies an empty MERGE message (without any useful information) to ask for moves.
I am doing something similar now to avoid setting up the position again on a re-split, i.e when searching on a different move but same parent position.
For the “helpful master”, I agree that it would be better to turn if off now
Say you have 4-nodes, initially node-0 gets all the work and the other 3 become its slaves. Right now scorpio just keeps this hierarchy for the rest of the search! This is slightly more efficient when you have few nodes with multiple cores. To make node-1 be a slave to node-2 instead of node-0, you need to put it off duty with a "CANCEL", and hopefully it will ask for a different node than node-0 for work. I figured if I don't even have this working, implementing the helpful master concept would be mess and more importantly for not so much gain.
I will consider the “initial postion + played moves”
I think this approach is best on a cluster for two reasons. a) Smaller message size, especially for a piece-list datastructure b) Most of the time only few moves need to be made/unmade. Infact on a re-split, only one move is made.
Is it possible to turn the code about SMP search off in Scorpio
Yes, just search with one thread mt=1 or turn off the smp code. I don't know if some other code, thread-polling for cluster search, might need some smp code but I am sure at one time it worked with just mpi code.

regards,
Daniel

Re: An MPI perft program

Posted: Fri Apr 10, 2015 1:23 am
by nkg114mc
Thank you very much for your reply, Daniel!