Lazy SMP

Discussion of chess software programming and technical issues.

Moderator: Ras

Joost Buijs
Posts: 1641
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Lazy SMP

Post by Joost Buijs »

bob wrote: Your numbers seem a bit off. Your results imply that about 2/3 of your total search time is spent in the evaluation, which seems incredibly high (if you take your eval and divide by 2 (fully parallel eval), keeping the rest of your code serial) you end up with an eval that is really expensive. Do you REALLY spend that much time in your eval, so that only parallelizing that gives you a 1.7x speedup?
Maybe it does seem strange to you, but it was even worse.
At that time my engine spent about 90% of it's time in the evaluation function according to the profiler I used. It did a full positional evaluation at each node without any lazy stuff and no TT lookups in quiescence.
You always like to compare things with Crafty, but you have to realise that other engines can exhibit totally different behaviour.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Lazy SMP

Post by bob »

Joost Buijs wrote:
bob wrote: Your numbers seem a bit off. Your results imply that about 2/3 of your total search time is spent in the evaluation, which seems incredibly high (if you take your eval and divide by 2 (fully parallel eval), keeping the rest of your code serial) you end up with an eval that is really expensive. Do you REALLY spend that much time in your eval, so that only parallelizing that gives you a 1.7x speedup?
Maybe it does seem strange to you, but it was even worse.
At that time my engine spent about 90% of it's time in the evaluation function according to the profiler I used. It did a full positional evaluation at each node without any lazy stuff and no TT lookups in quiescence.
You always like to compare things with Crafty, but you have to realise that other engines can exhibit totally different behaviour.
I simply assume "reasonable implementation". 90% in eval is NOT "reasonable implementation".

My mistake...
Joerg Oster
Posts: 978
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: Lazy SMP

Post by Joerg Oster »

JuLieN wrote:
Joost Buijs wrote:
JuLieN wrote:Has someone tried to split the move generator and the evaluation function, instead of splitting the search tree?

Likewise, you could compute all the terms of your evaluation function in parallel threads. If you have more terms than you have threads then once a term is finished computing you'd affect the thread a new term to compute and so on.

Has this ever been tried? Being a very lazy programmer I was contemplating the easiest way to implement SMP in my engine, and came with this "lazy SMP" idea.
This is something I tried in 2001 when I had a machine with only 2 cores available. My evaluation function was split in 2 parts, one for black and one for white. When the threads are already running and waiting there is hardly any overhead at all. It gave me a speedup of about 1.7. The nice thing is you can program this within less then one hour.
I'll definitely try it! :) Yes, there's very little to modify in the code to get such "threads" going, so it's definitely worth the... "effort".

As Bob says, there's still plenty of code left out of the threaded part (for instance list sortings and fast list evaluations), but if those simple tries give results then maybe other parts of the code could be added.

For SMP beginners another interest is that it's probably less prone to introduce some new logical bugs.
Very interesting idea.
I'm looking forward to seeing your results.

Modern engines do a lot of pruning based on a more or less lightweight eval. Most likely we could do much better pruning decisions with a more intelligent, heavier eval. Unfortunately, this heavier eval slows down the search, and thus our engine gets outsearched. Maybe this is one of the reasons why many eval ideas don't pay off when tested.
Jörg Oster
User avatar
JuLieN
Posts: 2949
Joined: Mon May 05, 2008 12:16 pm
Location: Bordeaux (France)
Full name: Julien Marcel

Re: Lazy SMP

Post by JuLieN »

Ok, I sadly have very bad results. I only modified the move generator to distribute it through 8 threads, and launched a perft. My MPS went from 22 millions MPS with the non-multithreaded version to ... 233 KMPS when running on height threads! So it was a hundred times slower on 8 threads than on one thread.

And the CPU monitor shows the task using only 113% of the CPU (when, if optimal, it should use 800%).

So it was a very bad idea after all: the overhead is much too important.
"The only good bug is a dead bug." (Don Dailey)
[Blog: http://tinyurl.com/predateur ] [Facebook: http://tinyurl.com/fbpredateur ] [MacEngines: http://tinyurl.com/macengines ]
syzygy
Posts: 5713
Joined: Tue Feb 28, 2012 11:56 pm

Re: Lazy SMP

Post by syzygy »

JuLieN wrote:And the CPU monitor shows the task using only 113% of the CPU (when, if optimal, it should use 800%).
So you're either spawning and destroying threads for each invocation of your move generator, or you're waking up and pausing threads. With such super tiny batches of work, this will indeed kill performance.

Your results would probably have been less bad if you had used spinlocks for synchronisation. Still, with such tiny batches of work it makes little sense to try to get it to work.
Joerg Oster
Posts: 978
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: Lazy SMP

Post by Joerg Oster »

And how about splitting the eval? Will you try this, too?

Unfortunately, I'm still a beginner in programming (though 47 years of age :roll: ), and this is way beyond my poor skills.
Jörg Oster
User avatar
JuLieN
Posts: 2949
Joined: Mon May 05, 2008 12:16 pm
Location: Bordeaux (France)
Full name: Julien Marcel

Re: Lazy SMP

Post by JuLieN »

syzygy wrote:
JuLieN wrote:And the CPU monitor shows the task using only 113% of the CPU (when, if optimal, it should use 800%).
So you're either spawning and destroying threads for each invocation of your move generator, or you're waking up and pausing threads. With such super tiny batches of work, this will indeed kill performance.

Your results would probably have been less bad if you had used spinlocks for synchronisation. Still, with such tiny batches of work it makes little sense to try to get it to work.
Yes, I used a simple extension unit of the Object Pascal, mtprocs, that does just that: creating threads and destroying them when the task is done. It allows to create multithreaded applications with very little code modification, but it's (much) less efficient than creating a multithreaded app from the ground up. As they say in the doc, it is only interesting if the tasks last for at least a few milliseconds, which is of course not the case when creating the list of legal moves for a given position. Take my 22 millions MPS monothread engine: with an average of, say, 35 legal moves per position, it gives 628,000 such lists created every second, or one every 0.00159 seconds (so, every 0.00159 millisecond).

But for any task that takes at least a few ms, this unit is indeed very interesting for the lazy programmer. :)
Joerg Oster wrote:And how about splitting the eval? Will you try this, too?

Unfortunately, I'm still a beginner in programming (though 47 years of age :roll: ), and this is way beyond my poor skills.
I won't, Joerg, sorry. I see no reason why I would get better results with the eval function. Typically my engine rates 1.2 millions positions per second during a game, so you can make the same computation as I did above and will see that we're still far away from tasks lasting for a few milliseconds...
"The only good bug is a dead bug." (Don Dailey)
[Blog: http://tinyurl.com/predateur ] [Facebook: http://tinyurl.com/fbpredateur ] [MacEngines: http://tinyurl.com/macengines ]
Joerg Oster
Posts: 978
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: Lazy SMP

Post by Joerg Oster »

Hi Julien,

I agree.
Still I think it is a possibility worth further investigations, to make a heavy eval more competitive.
Jörg Oster