lazy smp questions

JVMerlino · Post by **JVMerlino** » Thu Sep 10, 2015 11:45 pm

hgm wrote:Are you using threads or processes? I was using processes. Although I cannot see why this would matter, I cannot exclude it either. When I made the hash mask that isolates the index from the key process-dependent so that each process used a separate part of the shared memory, the speed went back to normal. If both processes used the full table, the nps drops and time-to-depth increases.

I use processes which are all launched at main app launch and pointed to the shared memory.

JVMerlino · Post by **JVMerlino** » Thu Sep 10, 2015 11:47 pm

Dann Corbit wrote:Do you count a hash hit as a node?

I do. Not sure about Crafty. Note that I do not probe the hash tables in qsearch, though.

bob · Post by **bob** » Fri Sep 11, 2015 12:45 am

Dann Corbit wrote:Do you count a hash hit as a node?

I count each new node reached as a "node". When I make a move, in inc the counter as that is certainly a new node.. What happens at the next recursive search level I don't know. Could be a rep, a 50 move draw, a hash hit, or a search.

bob · Post by **bob** » Fri Sep 11, 2015 12:51 am

mar wrote:
cdani wrote:Cheng, I suppose Nirvana, and Andscacs use threads, and with shared hash between threads.
Honestly I don't see the point: "lazy smp" doesn't pretend to be the best way to do smp (hence LAZY - just to clarify to some individuals
- even though 100+elo compared to mostly _crappy_ YBW implementations that do nothing but wait seems fine to me
Of course we have evangelists here who love to spread rumors.
"this doesn't work coz I did it 30 years ago".
Good riddance.
if lazy smp was so lousy we wouldn't get so many negative reactions from stars. who gives a damn. I don't.
In fact this "community" starts to annoy me.

What's with the hostility? (a) it is certainly known to be a poor algorithm. Those that choose to use it certainly have the right to do so and it doesn't matter to me one bit. (b) I simply asked a question about numbers that looked a bit odd, nothing more, nothing less. Sometimes such comments lead to a bug being discovered and fixed.

To date, there is only one optimal way, perhaps another one or two that are fairly close to optimal approaches, and then the rest fall farther down the performance ladder. There are certainly better approaches than this that were used 30 years ago, not that that means anything in particular.

this:

if lazy smp was so lousy we wouldn't get so many negative reactions from stars.

I don't begin to understand. I think you might mean the opposite, that you WOULD get negative reactions.

mar · Post by **mar** » Fri Sep 11, 2015 1:05 am

bob wrote:(a) it is certainly known to be a poor algorithm.

Well, I get +136 on CEGT 40/4, 4 cores vs 1. Error bars are very high, sure.
Until your 25.0 is out with alleged "linear speedup", I don't see a single engine that would perform significantly better on 4 cores doing smp "right".
Feel free to neglect it, say what you want, I really don't care.
Yet somehow independent tests show something else that what you claim...
Using capital letters won't change it a bit.

bob · Post by **bob** » Fri Sep 11, 2015 2:10 am

mar wrote:
bob wrote:(a) it is certainly known to be a poor algorithm.
Well, I get +136 on CEGT 40/4, 4 cores vs 1. Error bars are very high, sure.
Until your 25.0 is out with alleged "linear speedup", I don't see a single engine that would perform significantly better on 4 cores doing smp "right".
Feel free to neglect it, say what you want, I really don't care.
Yet somehow independent tests show something else that what you claim...
Using capital letters won't change it a bit.

Error bars are meaningless at that level. 136 Elo means 4x faster. I can tell you THAT is not happening. My 25.0 does NOT support linear speedup if you mean 16x faster on 16 cores. I have certainly been getting 13x on 20 cores or better, but linear would be 20x which is not so likely.

I don't need to know "results" when I know the theory behind parallel search. 4x is NOT going to consistently happen with lazy SMP. 2x would be a remarkable result when tested in a reasonable way...

There are good papers to read that clue you in on the overhead, and the limitations on speedup for various approaches.

Michel · Post by **Michel** » Fri Sep 11, 2015 4:38 am

Martin Sedlak wrote:Well, I get +136 on CEGT 40/4, 4 cores vs 1. Error bars are very high, sure.

The error bars are not so high. Combining them I get +-26. So still 110 elo in worst case (with 95% confidence).

bob · Post by **bob** » Fri Sep 11, 2015 4:48 am

Michel wrote:
Martin Sedlak wrote:Well, I get +136 on CEGT 40/4, 4 cores vs 1. Error bars are very high, sure.
The error bars are not so high. Combining them I get +-26. So still 110 elo in worst case (with 95% confidence).

You have to look at all the data. For example, look at the average opponent rating for cheng 4cpu vs cheng 1cpu. 1cpu played against an opponent average about 50 Elo stronger than cheng 4cpu. What would you expect that to cause? Make cheng 1cpu look weaker? You can't compare Elo numbers between partially or fully disjoint sets of opponents...

I wouldn't guess at either the rating difference or the Elo difference given that data. After it has been running a while, perhaps. But until the average opponent ratings get closer, you already have a 50 Elo error potential.

I've said this MANY times. To measure parallel performance, or even programming changes, you need a really stable test environment. Same opponents, same everything except for the changes to your own program. Then you get some pretty accurate data. Here, there are so many degrees of freedom in the test comparison is difficult to impossible.

Michel · Post by **Michel** » Fri Sep 11, 2015 5:23 am

Robert Hyatt wrote:You have to look at all the data. For example, look at the average opponent rating for cheng 4cpu vs cheng 1cpu. 1cpu played against an opponent average about 50 Elo stronger than cheng 4cpu. What would you expect that to cause? Make cheng 1cpu look weaker? You can't compare Elo numbers between partially or fully disjoint sets of opponents...

Why not? As long if the graph is connected the comparison is fine.

If A plays B and B plays C and C plays D you can still compare A and D. The comparison via intermediate engines just blows up the error bars.

bob · Post by **bob** » Fri Sep 11, 2015 5:54 am

Michel wrote:
Robert Hyatt wrote:You have to look at all the data. For example, look at the average opponent rating for cheng 4cpu vs cheng 1cpu. 1cpu played against an opponent average about 50 Elo stronger than cheng 4cpu. What would you expect that to cause? Make cheng 1cpu look weaker? You can't compare Elo numbers between partially or fully disjoint sets of opponents...
Why not? As long if the graph is connected the comparison is fine.

If A plays B and B plays C and C plays D you can still compare A and D. The comparison via intermediate engines just blows up the error bars.

That was my point. If the average ratings for player A's opponents is X, and the average rating for player B's opponents is X+50, it is going to be VERY difficult to compare their ratings with any accuracy and use the resulting Elo numbers to predict outcome between the two versions. The two versions of the original program are different, the average opponents are different, WHICH is responsible for the Elo gain or loss?

So in that specific CEGT comparison, the error bars are not +/-26. They are more like +/- 75...

In this case, saying A is +130 better than B is quite inaccurate. It is most likely better, to be sure. But how much better is much harder to determine without more data points.

lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions

Re: lazy smp questions