lazy smp questions

lucasart · Post by **lucasart** » Thu Sep 24, 2015 1:16 am

I've got some lazy SMP working, and it gives me this kind of repeated and shuffled output (eg. with 8 threads):

info depth 1 score 21 nodes 102 pv g8f7 
info depth 1 score 21 nodes 218 pv g8f7 
info depth 1 score 21 nodes 323 pv g8f7 
info depth 1 score 21 nodes 1211 pv g8f7 
info depth 1 score 21 nodes 1398 pv g8f7 
info depth 1 score 21 nodes 1405 pv g8f7 
info depth 2 score 0 nodes 2414 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 2968 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 1 score 21 nodes 5031 pv g8f7 
info depth 2 score 0 nodes 10118 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 12118 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 12751 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 14997 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 21287 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 3 score 0 nodes 52185 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 79642 pv h7h6 b1a1 g7g6 
info depth 1 score 21 nodes 94147 pv g8f7 
info depth 3 score 0 nodes 101678 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 108359 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 108359 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 109628 pv h7h6 b1a1 g7g6 
info depth 2 score 0 nodes 110173 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 3 score 0 nodes 139132 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 198261 pv h7h6 b1a1 g7g6 
info depth 4 score 0 nodes 1454975 pv a3b4 d2b3 h7h6 b1a1 b4e1 e2e1 a6c4

Is this allowed by the UCI protocol ? Would it cause problems in some GUIs ?

I can write some code to clean this up, and make sure completeted iterations are shown only once in increasing order, but it's more work, and I'm lazy

Also, I kind of like it like that, because it shows what is really happening (ie. who's doing what, i may add a threadid to the info line, just for debugging purposes etc.).

Lanzo · Post by **Lanzo** » Thu Sep 24, 2015 1:40 am

lucasart wrote:I've got some lazy SMP working, and it gives me this kind of repeated and shuffled output (eg. with 8 threads):

Code: Select all

info depth 1 score 21 nodes 102 pv g8f7 
info depth 1 score 21 nodes 218 pv g8f7 
info depth 1 score 21 nodes 323 pv g8f7 
info depth 1 score 21 nodes 1211 pv g8f7 
info depth 1 score 21 nodes 1398 pv g8f7 
info depth 1 score 21 nodes 1405 pv g8f7 
info depth 2 score 0 nodes 2414 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 2968 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 1 score 21 nodes 5031 pv g8f7 
info depth 2 score 0 nodes 10118 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 12118 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 12751 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 14997 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 2 score 0 nodes 21287 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 3 score 0 nodes 52185 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 79642 pv h7h6 b1a1 g7g6 
info depth 1 score 21 nodes 94147 pv g8f7 
info depth 3 score 0 nodes 101678 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 108359 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 108359 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 109628 pv h7h6 b1a1 g7g6 
info depth 2 score 0 nodes 110173 pv a3b4 d2b3 b4e1 e2e1 a6c4 
info depth 3 score 0 nodes 139132 pv h7h6 b1a1 g7g6 
info depth 3 score 0 nodes 198261 pv h7h6 b1a1 g7g6 
info depth 4 score 0 nodes 1454975 pv a3b4 d2b3 h7h6 b1a1 b4e1 e2e1 a6c4

Is this allowed by the UCI protocol ? Would it cause problems in some GUIs ?

I can write some code to clean this up, and make sure completeted iterations are shown only once in increasing order, but it's more work, and I'm lazy

Also, I kind of like it like that, because it shows what is really happening (ie. who's doing what, i may add a threadid to the info line, just for debugging purposes etc.).

Are you still quite confident that if theory and practice were to clash, practice would prevail triumphantly always?

lucasart · Post by **lucasart** » Thu Sep 24, 2015 10:01 am

Another interesting problem is how to implemente "info currmove" with lazy smp. In a lazy smp framework, you different threads running different iterations, all with different movenumber.

I can't think of a simple way to implement it. Ideas ?

Dann Corbit · Post by **Dann Corbit** » Thu Sep 24, 2015 10:37 am

lucasart wrote:Another interesting problem is how to implemente "info currmove" with lazy smp. In a lazy smp framework, you different threads running different iterations, all with different movenumber.

I can't think of a simple way to implement it. Ideas ?

Among the collection with maximum depth, choose the move with the best score.

A little simplistic, but I like the smell of it.

lucasart · Post by **lucasart** » Thu Sep 24, 2015 11:00 am

Dann Corbit wrote:
lucasart wrote:Another interesting problem is how to implemente "info currmove" with lazy smp. In a lazy smp framework, you different threads running different iterations, all with different movenumber.

I can't think of a simple way to implement it. Ideas ?
Among the collection with maximum depth, choose the move with the best score.

A little simplistic, but I like the smell of it.

I'm not sure you understood the complexity of the problem. With alpha/beta, we only know the score of the best move so far, not the others. The other issue is that you have an intricate interleeving of several threads running the same iteration and doing different moves. And you can't wait for the iteration to be completed, it defeats the purpose of info currmove…

Dann Corbit · Post by **Dann Corbit** » Thu Sep 24, 2015 11:12 am

lucasart wrote:
Dann Corbit wrote:
lucasart wrote:Another interesting problem is how to implemente "info currmove" with lazy smp. In a lazy smp framework, you different threads running different iterations, all with different movenumber.

I can't think of a simple way to implement it. Ideas ?
Among the collection with maximum depth, choose the move with the best score.

A little simplistic, but I like the smell of it.
I'm not sure you understood the complexity of the problem. With alpha/beta, we only know the score of the best move so far, not the others. The other issue is that you have an intricate interleeving of several threads running the same iteration and doing different moves. And you can't wait for the iteration to be completed, it defeats the purpose of info currmove…

I am sure that I do not fully appreciate the intricacy of the problem.
Here is what I do see. For some node being analyzed, there will be at most K threads of maximum depth. There must be a way to know what node the threads examine. I need to examine possible answers for the root node and his direct children. For at least one of these K threads, the score will be better than the other scores. If I had to make a move when someone said "BOO!" I would choose that node.

I suppose that there is a much more beautiful way to do it.

Since you have a public hash table that is collecting the whole mess, you could always sample that for the answer. But like any other choice, it may change the instant you make the choice.

I think I would probably pull the value from the hash table, but that is not really any different from pulling it directly from the best thread result, assuming that the thread with a deeper search will replace the value in the hash table.
Another thing you could do is simply make the user wait until you finish the iteration. It would drive some users crazy but at least it would produce correct results.

mar · Post by **mar** » Thu Sep 24, 2015 11:34 am

lucasart wrote:I've got some lazy SMP working, and it gives me this kind of repeated and shuffled output (eg. with 8 threads):

I didn't bother with this, I simply don't output anything for helpers and (if a helper finished first) I echo score/pv again at the end of the iteration.

cdani · Post by **cdani** » Thu Sep 24, 2015 3:24 pm

mar wrote:
lucasart wrote:I've got some lazy SMP working, and it gives me this kind of repeated and shuffled output (eg. with 8 threads):
I didn't bother with this, I simply don't output anything for helpers and (if a helper finished first) I echo score/pv again at the end of the iteration.

As I ignore the threads > 0, I only show the info of the thread 0. I even do not collect the pv for other threads as it is not necessary.

lucasart · Post by **lucasart** » Fri Sep 25, 2015 12:44 am

cdani wrote:
mar wrote:
lucasart wrote:I've got some lazy SMP working, and it gives me this kind of repeated and shuffled output (eg. with 8 threads):
I didn't bother with this, I simply don't output anything for helpers and (if a helper finished first) I echo score/pv again at the end of the iteration.
As I ignore the threads > 0, I only show the info of the thread 0. I even do not collect the pv for other threads as it is not necessary.

I suppose you can do that. It's not very correct though, as it means you may show completed iterations later than they actually happen (what if thread #2 finished an iteration 10 sec before thread 0?).

I opted for a more proper solution, although it requires a bit more work:
* a PV per thread (backed-up recursively through the stack using VLA for compactness).
* a global structure that contains at any time the depth/score/nodes/PV of the highest depth completed so far (by at least one thread).
* only the timer thread (not search threads) writes to std::cout, which alleviates the need for locking std::cout, and allows me to print the "info depth score nodes pv" line in strictly increasing depth order.

Code: Select all

/* declaration &#40;uci.h&#41; */

class Info &#123;
	Move _pv&#91;MAX_PLY + 1&#93;;
	int _depth, _score, _nodes;
	mutable bool updated;
	mutable std&#58;&#58;mutex m;
public&#58;
	Info&#40;);
	void update&#40;int depth, int score, int nodes, Move *pv&#41;;
	void print&#40;) const;
&#125;;

/* implementation &#40;uci.cc&#41; */

Info&#58;&#58;Info&#40;) &#58; _depth&#40;0&#41;, updated&#40;false&#41;
&#123;
	_pv&#91;0&#93;.clear&#40;);
&#125;

void Info&#58;&#58;update&#40;int depth, int score, int nodes, Move *pv&#41;
&#123;
	std&#58;&#58;lock_guard<std&#58;&#58;mutex> lk&#40;m&#41;;
	if &#40;depth > _depth&#41; &#123;
		_depth = depth;
		_score = score;
		_nodes = nodes;
		for &#40;int i = 0; ; i++)
			if ((_pv&#91;i&#93; = pv&#91;i&#93;).null&#40;))
				break;
		updated = true;
	&#125;
&#125;

// print info line, only if it has been updated since last print&#40;)
void Info&#58;&#58;print&#40;) const
&#123;
	std&#58;&#58;lock_guard<std&#58;&#58;mutex> lk&#40;m&#41;;
	if &#40;updated&#41; &#123;
		std&#58;&#58;ostringstream os;
		os << "info depth " << _depth << " score " << _score
			<< " nodes " << _nodes << " pv";
		for &#40;int i = 0; !_pv&#91;i&#93;.null&#40;); i++)
			os << ' ' << _pv&#91;i&#93;.to_string&#40;);
		std&#58;&#58;cout << os.str&#40;) << std&#58;&#58;endl;
		updated = false;
	&#125;
&#125;

* Any search threads call Info::update() whenever it completes an iteration. Info will only be updated if the depth is strictly larger than the highest completed depth so far, and will set updated = true in that case.
* Timer thread (which loops in 5ms resolution) calls Info::print(), which only prints if updated = true, which means that a higher depth was completed (by at least one thread) since last Info::print() actually printed something.

lucasart · Post by **lucasart** » Fri Sep 25, 2015 1:10 am

I don't really have a concept of helper threads. All searching threads are equal. The master thread is the timer thread, which is responsible for:
* creating and joining the searching threads.
* checking search termination conditions (max time or nodes) every 5ms, and raise a global signal to stop searching (observed by searching threads who subsequently terminate by throwing an exception).
* writes to std::cout the info lines (based on highest depth completed at any point by at least one thread).

So, in that sense there is a master/slave relationship between the timer thread and the seraching threads.

But sometimes slaves need to boss each other around. For example, if thread #2 finished depth = 3, it looks (in a global structure) for who's doing what. Let's say it sees: thread #0 doing depth=3, thread #1 doing depth =4, thread #3 doing depth=3. In that case it will raise a signal to say: threads #0 and #3 should stop what they are doing and go back to the iterative deepening loop to find a useful job to do, because they are doing something that is now useless (depth <= highest depth completed). That part was not so trivial to code (subtle races had to be avoided, while locking is forbidden in search because doing it at every node would be too costly).

Here's my search code, if you're interested:
https://github.com/lucasart/Demolito/bl ... /search.cc

lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions