Trying to improve lazy smp

cdani · Post by **cdani** » Sat Apr 11, 2015 9:09 pm

Hi.
I had some bad scaling with Andscacs with more than 4 cores as maybe you has seen in the general section:
2 threads: +61
4 threads: +108
6 threads: +117
7 threads: +119
8 threads: +96

Initially I thought than may be there was a bug, or some computer limitation like some cache collapse was kicking in, or maybe that my careless disregard or locking and other stuff was paying at last.
Of course any of this can be possible even with the change I explain here, but I have a hope is not like this. Hope because I don't have experience on this, of course

So I thought that it was not very logical that so many threads where thinking in the two same depths:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1)

Initially I changed this for
NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 5)
and I obtained
8 threads: +115
Not bad.

I tried to be a little more aggressive:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 4) + (Depth > 6)
and I obtained
8 threads: +120

Now I'm trying something more aggressive. I will report.

I don't know if this happened to some of you with lazy smp, or if someone has tried something like I'm trying now.

After those attempts, I will try to obtain access to a 12 or 16 core machine, to see how this must be modified to scale well at those machines. May be some of you have experience on an ISP that offers such services. I will pay just for a month, because I suppose it will not be any cheap, but I think it will be enough.

Dann Corbit · Post by **Dann Corbit** » Sat Apr 11, 2015 9:30 pm

cdani wrote:Hi.
I had some bad scaling with Andscacs with more than 4 cores as maybe you has seen in the general section:
2 threads: +61
4 threads: +108
6 threads: +117
7 threads: +119
8 threads: +96

Initially I thought than may be there was a bug, or some computer limitation like some cache collapse was kicking in, or maybe that my careless disregard or locking and other stuff was paying at last.
Of course any of this can be possible even with the change I explain here, but I have a hope is not like this. Hope because I don't have experience on this, of course

So I thought that it was not very logical that so many threads where thinking in the two same depths:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1)

Initially I changed this for
NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 5)
and I obtained
8 threads: +115
Not bad.

I tried to be a little more aggressive:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 4) + (Depth > 6)
and I obtained
8 threads: +120

Now I'm trying something more aggressive. I will report.

I don't know if this happened to some of you with lazy smp, or if someone has tried something like I'm trying now.

After those attempts, I will try to obtain access to a 12 or 16 core machine, to see how this must be modified to scale well at those machines. May be some of you have experience on an ISP that offers such services. I will pay just for a month, because I suppose it will not be any cheap, but I think it will be enough.

Didn't Dan Homan do something like this with ExChess?

cdani · Post by **cdani** » Sat Apr 11, 2015 9:40 pm

Dann Corbit wrote:Didn't Dan Homan do something like this with ExChess?

Yes. This:
NewDepth = Depth + (((Depth + 1) & 1) ^ 1)

So I extended it.

mar · Post by **mar** » Sat Apr 11, 2015 10:15 pm

I don't have 8 core machine here, but something seems odd.
Are you sure you tested on real 8 core machine? (not on a quad with HT on?)

cdani · Post by **cdani** » Sun Apr 12, 2015 12:41 am

mar wrote:I don't have 8 core machine here, but something seems odd.
Are you sure you tested on real 8 core machine? (not on a quad with HT on?)

Of course. AMD FX-8350.

If you want I can test your engine to see if there is similar behavior.

cdani · Post by **cdani** » Sun Apr 12, 2015 10:27 am

New improvement:

NewDepth = Depth + (((Depth + 1) & 1) ^ 1) + (Depth > 2) + (Depth > 4) + (Depth > 6)

4 threads: from +108 to +117

8 threads: from +120 to +134

So of course I'm trying something even more aggressive.

The updated version is here, only for 64 popcnt:
http://www.andscacs.com/andscacs074024.zip

I tried to find information about how this compare to other engines and I found this thread:

http://talkchess.com/forum/viewtopic.php?t=55563

It's important to say that my tests are against a gauntlet, not against Andscacs itself.

So with 4 threads Andscacs wins 117 elo against a gauntlet, and Zappa Mexico II, "known to scale particularly well", better than Stockfish, obtains 114 but in selfplay.

I know that is more difficult for a better engine to win itself because of the diminishing returns.

Do you think this holds or compares well? It exists any option that lazy eval is better than other ways of doing MP?

My intuition, using it because I have not experience on all this, believes that in lazy eval, not being necessary to do synchronizations between threads, at least will be more lightweight, and the more the threads, the gains can be better. Of course I will continue testing all this.

elcabesa · Post by **elcabesa** » Sun Apr 12, 2015 11:07 am

I'm having similr problems with lazy smp in Vajolet.

i was thinking about let some thread searching the second best move, something like a parallel MULTIPV. but up to now I never tried it

mar · Post by **mar** » Sun Apr 12, 2015 11:12 am

cdani wrote:If you want I can test your engine to see if there is similar behavior.

Thanks, you don't have to. Peter did some testing recently and I know that cheng scales beyond 4 cores (somehow).
As for your formula:

Code: Select all

NewDepth = Depth + ((&#40;Depth + 1&#41; & 1&#41; ^ 1&#41; + &#40;Depth > 2&#41; + &#40;Depth > 4&#41; + &#40;Depth > 6&#41;

Where is thread id? this would mean you have same depth for each helper thread.
It seems to me that your problem might be that you don't terminate iteration when a helper finishes before "master", but I may be wrong.

cdani · Post by **cdani** » Sun Apr 12, 2015 12:11 pm

mar wrote: As for your formula:
Code: Select all
NewDepth = Depth + ((&#40;Depth + 1&#41; & 1&#41; ^ 1&#41; + &#40;Depth > 2&#41; + &#40;Depth > 4&#41; + &#40;Depth > 6&#41; 
Where is thread id? this would mean you have same depth for each helper thread.
It seems to me that your problem might be that you don't terminate iteration when a helper finishes before "master", but I may be wrong.

Ooops!
Because I translated it from catalan to english I have done a mistake. The good one is:

NewDepth = Depth + (((thread_id + 1) & 1) ^ 1) + (thread_id > 2) + (thread_id > 4) + (thread_id > 6)

The same applies to all the other code I have put in this page.

cdani · Post by **cdani** » Sun Apr 12, 2015 12:13 pm

mar wrote: It seems to me that your problem might be that you don't terminate iteration when a helper finishes before "master", but I may be wrong.

Yes, I have to try this also.

Trying to improve lazy smp

Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp

Re: Trying to improve lazy smp