lazy smp questions

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
JVMerlino
Posts: 1003
Joined: Wed Mar 08, 2006 9:15 pm
Location: San Francisco, California

Re: lazy smp questions

Post by JVMerlino » Thu Sep 10, 2015 9:45 pm

hgm wrote:Are you using threads or processes? I was using processes. Although I cannot see why this would matter, I cannot exclude it either. When I made the hash mask that isolates the index from the key process-dependent so that each process used a separate part of the shared memory, the speed went back to normal. If both processes used the full table, the nps drops and time-to-depth increases.
I use processes which are all launched at main app launch and pointed to the shared memory.

User avatar
JVMerlino
Posts: 1003
Joined: Wed Mar 08, 2006 9:15 pm
Location: San Francisco, California

Re: lazy smp questions

Post by JVMerlino » Thu Sep 10, 2015 9:47 pm

Dann Corbit wrote:Do you count a hash hit as a node?
I do. Not sure about Crafty. Note that I do not probe the hash tables in qsearch, though.

bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: lazy smp questions

Post by bob » Thu Sep 10, 2015 10:45 pm

Dann Corbit wrote:Do you count a hash hit as a node?
I count each new node reached as a "node". When I make a move, in inc the counter as that is certainly a new node.. What happens at the next recursive search level I don't know. Could be a rep, a 50 move draw, a hash hit, or a search.

bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: lazy smp questions

Post by bob » Thu Sep 10, 2015 10:51 pm

mar wrote:
cdani wrote:Cheng, I suppose Nirvana, and Andscacs use threads, and with shared hash between threads.
Honestly I don't see the point: "lazy smp" doesn't pretend to be the best way to do smp (hence LAZY - just to clarify to some individuals ;)
- even though 100+elo compared to mostly _crappy_ YBW implementations that do nothing but wait seems fine to me :)
Of course we have evangelists here who love to spread rumors.
"this doesn't work coz I did it 30 years ago".
Good riddance.
if lazy smp was so lousy we wouldn't get so many negative reactions from stars. who gives a damn. I don't.
In fact this "community" starts to annoy me.
What's with the hostility? (a) it is certainly known to be a poor algorithm. Those that choose to use it certainly have the right to do so and it doesn't matter to me one bit. (b) I simply asked a question about numbers that looked a bit odd, nothing more, nothing less. Sometimes such comments lead to a bug being discovered and fixed.

To date, there is only one optimal way, perhaps another one or two that are fairly close to optimal approaches, and then the rest fall farther down the performance ladder. There are certainly better approaches than this that were used 30 years ago, not that that means anything in particular.

this:
if lazy smp was so lousy we wouldn't get so many negative reactions from stars.
I don't begin to understand. I think you might mean the opposite, that you WOULD get negative reactions.

mar
Posts: 2015
Joined: Fri Nov 26, 2010 1:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: lazy smp questions

Post by mar » Thu Sep 10, 2015 11:05 pm

bob wrote:(a) it is certainly known to be a poor algorithm.
Well, I get +136 on CEGT 40/4, 4 cores vs 1. Error bars are very high, sure.
Until your 25.0 is out with alleged "linear speedup", I don't see a single engine that would perform significantly better on 4 cores doing smp "right".
Feel free to neglect it, say what you want, I really don't care.
Yet somehow independent tests show something else that what you claim...
Using capital letters won't change it a bit.

bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: lazy smp questions

Post by bob » Fri Sep 11, 2015 12:10 am

mar wrote:
bob wrote:(a) it is certainly known to be a poor algorithm.
Well, I get +136 on CEGT 40/4, 4 cores vs 1. Error bars are very high, sure.
Until your 25.0 is out with alleged "linear speedup", I don't see a single engine that would perform significantly better on 4 cores doing smp "right".
Feel free to neglect it, say what you want, I really don't care.
Yet somehow independent tests show something else that what you claim...
Using capital letters won't change it a bit.
Error bars are meaningless at that level. 136 Elo means 4x faster. I can tell you THAT is not happening. My 25.0 does NOT support linear speedup if you mean 16x faster on 16 cores. I have certainly been getting 13x on 20 cores or better, but linear would be 20x which is not so likely.

I don't need to know "results" when I know the theory behind parallel search. 4x is NOT going to consistently happen with lazy SMP. 2x would be a remarkable result when tested in a reasonable way...

There are good papers to read that clue you in on the overhead, and the limitations on speedup for various approaches.

Michel
Posts: 2057
Joined: Sun Sep 28, 2008 11:50 pm

Re: lazy smp questions

Post by Michel » Fri Sep 11, 2015 2:38 am

Martin Sedlak wrote:Well, I get +136 on CEGT 40/4, 4 cores vs 1. Error bars are very high, sure.
The error bars are not so high. Combining them I get +-26. So still 110 elo in worst case (with 95% confidence).
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: lazy smp questions

Post by bob » Fri Sep 11, 2015 2:48 am

Michel wrote:
Martin Sedlak wrote:Well, I get +136 on CEGT 40/4, 4 cores vs 1. Error bars are very high, sure.
The error bars are not so high. Combining them I get +-26. So still 110 elo in worst case (with 95% confidence).
You have to look at all the data. For example, look at the average opponent rating for cheng 4cpu vs cheng 1cpu. 1cpu played against an opponent average about 50 Elo stronger than cheng 4cpu. What would you expect that to cause? Make cheng 1cpu look weaker? You can't compare Elo numbers between partially or fully disjoint sets of opponents...

I wouldn't guess at either the rating difference or the Elo difference given that data. After it has been running a while, perhaps. But until the average opponent ratings get closer, you already have a 50 Elo error potential.

I've said this MANY times. To measure parallel performance, or even programming changes, you need a really stable test environment. Same opponents, same everything except for the changes to your own program. Then you get some pretty accurate data. Here, there are so many degrees of freedom in the test comparison is difficult to impossible.

Michel
Posts: 2057
Joined: Sun Sep 28, 2008 11:50 pm

Re: lazy smp questions

Post by Michel » Fri Sep 11, 2015 3:23 am

Robert Hyatt wrote:You have to look at all the data. For example, look at the average opponent rating for cheng 4cpu vs cheng 1cpu. 1cpu played against an opponent average about 50 Elo stronger than cheng 4cpu. What would you expect that to cause? Make cheng 1cpu look weaker? You can't compare Elo numbers between partially or fully disjoint sets of opponents...
Why not? As long if the graph is connected the comparison is fine.

If A plays B and B plays C and C plays D you can still compare A and D. The comparison via intermediate engines just blows up the error bars.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

bob
Posts: 20642
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: lazy smp questions

Post by bob » Fri Sep 11, 2015 3:54 am

Michel wrote:
Robert Hyatt wrote:You have to look at all the data. For example, look at the average opponent rating for cheng 4cpu vs cheng 1cpu. 1cpu played against an opponent average about 50 Elo stronger than cheng 4cpu. What would you expect that to cause? Make cheng 1cpu look weaker? You can't compare Elo numbers between partially or fully disjoint sets of opponents...
Why not? As long if the graph is connected the comparison is fine.

If A plays B and B plays C and C plays D you can still compare A and D. The comparison via intermediate engines just blows up the error bars.
That was my point. If the average ratings for player A's opponents is X, and the average rating for player B's opponents is X+50, it is going to be VERY difficult to compare their ratings with any accuracy and use the resulting Elo numbers to predict outcome between the two versions. The two versions of the original program are different, the average opponents are different, WHICH is responsible for the Elo gain or loss?

So in that specific CEGT comparison, the error bars are not +/-26. They are more like +/- 75...

In this case, saying A is +130 better than B is quite inaccurate. It is most likely better, to be sure. But how much better is much harder to determine without more data points.

Post Reply