Trying to improve lazy smp

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Trying to improve lazy smp

Post by cdani »

Hi.
I had some bad scaling with Andscacs with more than 4 cores as maybe you has seen in the general section:
2 threads: +61
4 threads: +108
6 threads: +117
7 threads: +119
8 threads: +96

Initially I thought than may be there was a bug, or some computer limitation like some cache collapse was kicking in, or maybe that my careless disregard or locking and other stuff was paying at last.
Of course any of this can be possible even with the change I explain here, but I have a hope is not like this. Hope because I don't have experience on this, of course :-)

So I thought that it was not very logical that so many threads where thinking in the two same depths:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1)

Initially I changed this for
NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 5)
and I obtained
8 threads: +115
Not bad.

I tried to be a little more aggressive:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 4) + (Depth > 6)
and I obtained
8 threads: +120

Now I'm trying something more aggressive. I will report.

I don't know if this happened to some of you with lazy smp, or if someone has tried something like I'm trying now.

After those attempts, I will try to obtain access to a 12 or 16 core machine, to see how this must be modified to scale well at those machines. May be some of you have experience on an ISP that offers such services. I will pay just for a month, because I suppose it will not be any cheap, but I think it will be enough.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Trying to improve lazy smp

Post by Dann Corbit »

cdani wrote:Hi.
I had some bad scaling with Andscacs with more than 4 cores as maybe you has seen in the general section:
2 threads: +61
4 threads: +108
6 threads: +117
7 threads: +119
8 threads: +96

Initially I thought than may be there was a bug, or some computer limitation like some cache collapse was kicking in, or maybe that my careless disregard or locking and other stuff was paying at last.
Of course any of this can be possible even with the change I explain here, but I have a hope is not like this. Hope because I don't have experience on this, of course :-)

So I thought that it was not very logical that so many threads where thinking in the two same depths:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1)

Initially I changed this for
NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 5)
and I obtained
8 threads: +115
Not bad.

I tried to be a little more aggressive:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 4) + (Depth > 6)
and I obtained
8 threads: +120

Now I'm trying something more aggressive. I will report.

I don't know if this happened to some of you with lazy smp, or if someone has tried something like I'm trying now.

After those attempts, I will try to obtain access to a 12 or 16 core machine, to see how this must be modified to scale well at those machines. May be some of you have experience on an ISP that offers such services. I will pay just for a month, because I suppose it will not be any cheap, but I think it will be enough.
Didn't Dan Homan do something like this with ExChess?
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Trying to improve lazy smp

Post by cdani »

Dann Corbit wrote:Didn't Dan Homan do something like this with ExChess?
Yes. This:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1)

So I extended it.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Trying to improve lazy smp

Post by mar »

I don't have 8 core machine here, but something seems odd.
Are you sure you tested on real 8 core machine? (not on a quad with HT on?)
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Trying to improve lazy smp

Post by cdani »

mar wrote:I don't have 8 core machine here, but something seems odd.
Are you sure you tested on real 8 core machine? (not on a quad with HT on?)
Of course. AMD FX-8350.

If you want I can test your engine to see if there is similar behavior.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Trying to improve lazy smp

Post by cdani »

New improvement:

NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 2) + (Depth > 4) + (Depth > 6)

4 threads: from +108 to +117

8 threads: from +120 to +134

So of course I'm trying something even more aggressive.

The updated version is here, only for 64 popcnt:
http://www.andscacs.com/andscacs074024.zip

I tried to find information about how this compare to other engines and I found this thread:

http://talkchess.com/forum/viewtopic.php?t=55563

It's important to say that my tests are against a gauntlet, not against Andscacs itself.

So with 4 threads Andscacs wins 117 elo against a gauntlet, and Zappa Mexico II, "known to scale particularly well", better than Stockfish, obtains 114 but in selfplay.

I know that is more difficult for a better engine to win itself because of the diminishing returns.

Do you think this holds or compares well? It exists any option that lazy eval is better than other ways of doing MP?

My intuition, using it because I have not experience on all this, believes that in lazy eval, not being necessary to do synchronizations between threads, at least will be more lightweight, and the more the threads, the gains can be better. Of course I will continue testing all this.
elcabesa
Posts: 855
Joined: Sun May 23, 2010 1:32 pm

Re: Trying to improve lazy smp

Post by elcabesa »

I'm having similr problems with lazy smp in Vajolet.

i was thinking about let some thread searching the second best move, something like a parallel MULTIPV. but up to now I never tried it
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Trying to improve lazy smp

Post by mar »

cdani wrote:If you want I can test your engine to see if there is similar behavior.
Thanks, you don't have to. Peter did some testing recently and I know that cheng scales beyond 4 cores (somehow).
As for your formula:

Code: Select all

NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 2) + (Depth > 4) + (Depth > 6) 
Where is thread id? this would mean you have same depth for each helper thread.
It seems to me that your problem might be that you don't terminate iteration when a helper finishes before "master", but I may be wrong.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Trying to improve lazy smp

Post by cdani »

mar wrote: As for your formula:

Code: Select all

NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 2) + (Depth > 4) + (Depth > 6) 
Where is thread id? this would mean you have same depth for each helper thread.
It seems to me that your problem might be that you don't terminate iteration when a helper finishes before "master", but I may be wrong.
Ooops!
Because I translated it from catalan to english I have done a mistake. The good one is:

NewDepth = Depth + (((thread_id + 1) & 1) ^ 1) + (thread_id > 2) + (thread_id > 4) + (thread_id > 6)

The same applies to all the other code I have put in this page.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Trying to improve lazy smp

Post by cdani »

mar wrote: It seems to me that your problem might be that you don't terminate iteration when a helper finishes before "master", but I may be wrong.
Yes, I have to try this also.