Threads test incl. Stockfish 5 and Komodo 8

Laskos · Post by **Laskos** » Mon Oct 13, 2014 3:34 pm

bob wrote:
I find it amusing you demand high accuracy from an approximation. Pretty irrational thinking, in fact.

1/ Approximation is as good as its accuracy goes. If you don't require _some _ (higher is better) accuracy, that's not approximation, it's shamanism.

2/ You miss the asymptotic behavior of the approximation, which is pretty un-scientifical. My advice for effective speed-up of SMP: 1+a*log(Ncpus), or, with more data to not overfit, a + b*log(Ncpus + c). With the main lesson that the asymptotic behavior is logarithmic with the number of cores, and not linear as you seem to show.

If you want an exact number, only way to produce it is to actually run the tests. An approximation is used to avoid the work.

"Exact" doesn't exist in SMP, it's time dependent, engine dependent, hardware dependent, etc.

bob · Post by **bob** » Mon Oct 13, 2014 3:42 pm

Laskos wrote:
bob wrote:
I find it amusing you demand high accuracy from an approximation. Pretty irrational thinking, in fact.
1/ Approximation is as good as its accuracy goes. If you don't require _some _ (higher is better) accuracy, that's not approximation, it's shamanism.

2/ You miss the asymptotic behavior of the approximation, which is pretty un-scientifical. My advice for effective speed-up of SMP: 1+a*log(Ncpus), or, with more data . With the main lesson that the asymptotic behavior is logarithmic with the number of cores.
If you want an exact number, only way to produce it is to actually run the tests. An approximation is used to avoid the work.
"Exact" doesn't exist in SMP, it's time dependent, engine dependent, hardware dependent, etc.

Feel free to do exactly what I have done in the past, namely run several hundred positions, several dozen times each, with 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 processors, and then compute the average speedup for each test. Then show me the gross inaccuracy in my formula. It should be easy to do, if you are correct. Or if others are correct, you might find that that formula is a little optimistic, or a little pessimistic. You won't find it grossly inaccurate.

I could care less about how the formula extrapolates out to an infinite number of processors. I have NEVER claimed it is accurate beyond 16, in fact, although it looked reasonably accurate with a few 32 cpu tests, and a very small number of tests beyond that.

Grow up. Given my past statements, the formula works well. Just because you see something illogical when you take the limit as cores -> infinity doesn't mean a thing. It was not intended to be used with cores > 16, and I have always clearly stated so. Ergo, it does EXACTLY what was intended, using the simplest calculation possible so that it can be done in your head.

I actually use

speedup = N - (N-1)*0.3 which is quite easy to compute and which gives a pretty good ball-park number.

For my formula, for 8 cores, one gets either

1 + (N-1)*0.7 ==
1 + 7 * 0.7 ==
1 + 4.9 ~=
6.0

For yours, you gave

a + b*log(Ncpus + c)

what is a, b and c?

what is log(Ncpus + c) (in your head, since this is a simple approximation). Etc.

I'll take mine any day...

Laskos · Post by **Laskos** » Mon Oct 13, 2014 3:49 pm

bob wrote:
Laskos wrote:
bob wrote:
I find it amusing you demand high accuracy from an approximation. Pretty irrational thinking, in fact.
1/ Approximation is as good as its accuracy goes. If you don't require _some _ (higher is better) accuracy, that's not approximation, it's shamanism.

2/ You miss the asymptotic behavior of the approximation, which is pretty un-scientifical. My advice for effective speed-up of SMP: 1+a*log(Ncpus), or, with more data . With the main lesson that the asymptotic behavior is logarithmic with the number of cores.
If you want an exact number, only way to produce it is to actually run the tests. An approximation is used to avoid the work.
"Exact" doesn't exist in SMP, it's time dependent, engine dependent, hardware dependent, etc.
Feel free to do exactly what I have done in the past, namely run several hundred positions, several dozen times each, with 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 processors, and then compute the average speedup for each test. Then show me the gross inaccuracy in my formula. It should be easy to do, if you are correct. Or if others are correct, you might find that that formula is a little optimistic, or a little pessimistic. You won't find it grossly inaccurate.

I could care less about how the formula extrapolates out to an infinite number of processors. I have NEVER claimed it is accurate beyond 16, in fact, although it looked reasonably accurate with a few 32 cpu tests, and a very small number of tests beyond that.

Grow up. Given my past statements, the formula works well. Just because you see something illogical when you take the limit as cores -> infinity doesn't mean a thing. It was not intended to be used with cores > 16, and I have always clearly stated so. Ergo, it does EXACTLY what was intended, using the simplest calculation possible so that it can be done in your head.

I actually use

speedup = N - (N-1)*0.3 which is quite easy to compute and which gives a pretty good ball-park number.

For my formula, for 8 cores, one gets either

1 + (N-1)*0.7 ==
1 + 7 * 0.7 ==
1 + 4.9 ~=
6.0

For yours, you gave

a + b*log(Ncpus + c)

what is a, b and c?

what is log(Ncpus + c) (in your head, since this is a simple approximation). Etc.

I'll take mine any day...

I repeat, take a look at the Andreas results on the top of this thread. The empirical data is there for several engines, on a reasonable hardware with small error margins. Your overlook of hard empirical results of others, your denying it, are already almost legendary for those who still read you posts.

How about Komodo "bug"? Could you improve non-SMP Komodo 8 by 70 Elo points, not hurting Komodo SMP? Basically, you CLAIM this.

bob · Post by **bob** » Mon Oct 13, 2014 5:29 pm

Laskos wrote:
bob wrote:
Laskos wrote:
bob wrote:
I find it amusing you demand high accuracy from an approximation. Pretty irrational thinking, in fact.
1/ Approximation is as good as its accuracy goes. If you don't require _some _ (higher is better) accuracy, that's not approximation, it's shamanism.

2/ You miss the asymptotic behavior of the approximation, which is pretty un-scientifical. My advice for effective speed-up of SMP: 1+a*log(Ncpus), or, with more data . With the main lesson that the asymptotic behavior is logarithmic with the number of cores.
If you want an exact number, only way to produce it is to actually run the tests. An approximation is used to avoid the work.
"Exact" doesn't exist in SMP, it's time dependent, engine dependent, hardware dependent, etc.
Feel free to do exactly what I have done in the past, namely run several hundred positions, several dozen times each, with 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 processors, and then compute the average speedup for each test. Then show me the gross inaccuracy in my formula. It should be easy to do, if you are correct. Or if others are correct, you might find that that formula is a little optimistic, or a little pessimistic. You won't find it grossly inaccurate.

I could care less about how the formula extrapolates out to an infinite number of processors. I have NEVER claimed it is accurate beyond 16, in fact, although it looked reasonably accurate with a few 32 cpu tests, and a very small number of tests beyond that.

Grow up. Given my past statements, the formula works well. Just because you see something illogical when you take the limit as cores -> infinity doesn't mean a thing. It was not intended to be used with cores > 16, and I have always clearly stated so. Ergo, it does EXACTLY what was intended, using the simplest calculation possible so that it can be done in your head.

I actually use

speedup = N - (N-1)*0.3 which is quite easy to compute and which gives a pretty good ball-park number.

For my formula, for 8 cores, one gets either

1 + (N-1)*0.7 ==
1 + 7 * 0.7 ==
1 + 4.9 ~=
6.0

For yours, you gave

a + b*log(Ncpus + c)

what is a, b and c?

what is log(Ncpus + c) (in your head, since this is a simple approximation). Etc.

I'll take mine any day...
I repeat, take a look at the Andreas results on the top of this thread. The empirical data is there for several engines, on a reasonable hardware with small error margins. Your overlook of hard empirical results of others, your denying it, are already almost legendary for those who still read you posts.

How about Komodo "bug"? Could you improve non-SMP Komodo 8 by 70 Elo points, not hurting Komodo SMP? Basically, you CLAIM this.

1. Where is there ANY mention about Crafty's speedup or Elo gain in the original post?

2. Based on the data, yes Komodo 8 can be improved (1 cpu results), OR the SMP results can be improved, or BOTH. This is not magic. This is well-understood. There is a ton of theory on parallel alpha/beta search if you just look around. This is in the same realm as "super-linear speedup" that comes up from time to time. It happens on occasion, but it does NOT happen in the majority of positions, but only in a small minority. If you pick those samples, you can prove anything. But it doesn't stand up to inspection.

3. Parallel search brings more speed, NOT more "knowledge" or anything else. That is ALL it brings. When a program gains elo at the same depth, solely by adding more threads, something is wrong. and it can be improved. ANYONE that has been working with parallel search for a while will tell you the same thing. Contact Murray Campbell. Or Ken Thompson. Or Jonathan Schaeffer. Or Monty Newborn. Or anyone else you care to that has done this for a while. Alpha/beta is well-understood and has been since the Knuth/Moore paper. It hasn't changed.

Now if someone wants to use something other than alpha/beta, all bets are off. But for alpha/beta, this is a known and understood issue. Alpha/beta is the name, searching the minimal tree is the game. The ONLY game.

syzygy · Post by **syzygy** » Mon Oct 13, 2014 8:08 pm

bob wrote:
syzygy wrote:
Adam Hair wrote:
bob wrote: Why would you need to post ANY links. I clearly wrote, ONCE AGAIN, that if a program is stronger searching to the same depth but using SMP, it has a bug that can be fixed to IMPROVE the non-SMP search. This is really not worth discussing. You either understand it or you don't. In this case, you don't.
Is that true for every type of SMP implementation?
1. If a program does exactly what the programmer wants it to do, it is not buggy. Whatever Bob might think is irrelevant.

2. Bob's argument that the non-bug can be "fixed" to improve Elo is simply confused and false. This has been hashed out to death already in one or two threads.
Whether the author wants to "fix" the problem or not is up to him. It is not "confused and false." Just because YOU don't understand the basics of the parallel alpha/beta search issues doesn't mean EVERYONE doesn't understand them. NOBODY wants their search to be larger than the "minimal tree" as defined by Knuth/Moore. Absolutely NOBODY. To claim otherwise is simply naive and uninformed. Reducing the size of the tree is anything but trivial, but that does NOT mean it is not the ultimate goal of any parallel search, because that is how you get the maximal depth gain, which is where the real Elo gains exist.

Try again, if you want. But posting nonsense is not going to make it so.

How come absolutely nobody here with some knowledge of parallel programming agrees with you...

ernest · Post by **ernest** » Mon Oct 13, 2014 9:38 pm

syzygy wrote:How come absolutely nobody here with some knowledge of parallel programming agrees with you...

Well, at least on the theoretical side, what Bob says here doesn't sound so stupid...

bob · Post by **bob** » Mon Oct 13, 2014 10:23 pm

syzygy wrote:
bob wrote:
syzygy wrote:
Adam Hair wrote:
bob wrote: Why would you need to post ANY links. I clearly wrote, ONCE AGAIN, that if a program is stronger searching to the same depth but using SMP, it has a bug that can be fixed to IMPROVE the non-SMP search. This is really not worth discussing. You either understand it or you don't. In this case, you don't.
Is that true for every type of SMP implementation?
1. If a program does exactly what the programmer wants it to do, it is not buggy. Whatever Bob might think is irrelevant.

2. Bob's argument that the non-bug can be "fixed" to improve Elo is simply confused and false. This has been hashed out to death already in one or two threads.
Whether the author wants to "fix" the problem or not is up to him. It is not "confused and false." Just because YOU don't understand the basics of the parallel alpha/beta search issues doesn't mean EVERYONE doesn't understand them. NOBODY wants their search to be larger than the "minimal tree" as defined by Knuth/Moore. Absolutely NOBODY. To claim otherwise is simply naive and uninformed. Reducing the size of the tree is anything but trivial, but that does NOT mean it is not the ultimate goal of any parallel search, because that is how you get the maximal depth gain, which is where the real Elo gains exist.

Try again, if you want. But posting nonsense is not going to make it so.
How come absolutely nobody here with some knowledge of parallel programming agrees with you...

WHO with any knowledge of parallel programming has actually weighed in on the discussion on EITHER side? Think about it.

bob · Post by **bob** » Mon Oct 13, 2014 10:32 pm

ernest wrote:
syzygy wrote:How come absolutely nobody here with some knowledge of parallel programming agrees with you...
Well, at least on the theoretical side, what Bob says here doesn't sound so stupid...

There's a reason for that.

This is not exactly new ground, esoteric topic, etc. It is well-understood. Has been since the 70's. Lots of papers. Lots of research. Lots of theoretical analysis. Parallel search gains speed, nothing else. Speed translates to depth, nothing else.

Here's another angle on this somewhat idiotic discussion. Suppose you take program A, and turn null-move off in A1 and on in A2. And play them in a fixed depth match. Who wins?

null move OFF. Because fixed depth is not the right way to measure performance issues. Turning null-move off "widens" the search drastically. And makes it play better so long as depth isn't changed. So I suppose we should all be turning null-move off since that will produce a 100 elo gain. Until you use measured time to limit the search rather than depth. Then, suddenly, you are down a hundred Elo or so on the null-disabled program.

NOBODY wants to "widen the search". That is the antithesis of the alpha/beta algorithm whose sole purpose is to narrow the search toward the impossible to reach minimal tree knuth/moore talked about. Why we have these ridiculous discussions, primarily with people that have not studied the problem at any level, yet are quite happy weighing in with an argument that simply does not hold water.

But that is a theme here at times, obviously. "Widen" the search all you want. But it is a STUPID idea. May as well slow it down too.

michiguel · Post by **michiguel** » Mon Oct 13, 2014 10:54 pm

ernest wrote:
syzygy wrote:How come absolutely nobody here with some knowledge of parallel programming agrees with you...
Well, at least on the theoretical side, what Bob says here doesn't sound so stupid...

In paper. Only in paper.

But in practice, linearizing a parallel algorithm would be a mess of gigantic proportions, and I cannot even imagine the problems and the unreadability. So, saying that Komodo could be "fixed" in single core is silly. Even in paper the "fix" may not give any improvement as I demonstrated in another post. It could be equal.

The fact is that whatever Komodo is doing, it is scaling sustainably better than others. Whether widening the search is a cause or correlation is unknown, but clearly, the whole approach is good.

Miguel

bob · Post by **bob** » Mon Oct 13, 2014 11:40 pm

michiguel wrote:
ernest wrote:
syzygy wrote:How come absolutely nobody here with some knowledge of parallel programming agrees with you...
Well, at least on the theoretical side, what Bob says here doesn't sound so stupid...
In paper. Only in paper.

But in practice, linearizing a parallel algorithm would be a mess of gigantic proportions, and I cannot even imagine the problems and the unreadability. So, saying that Komodo could be "fixed" in single core is silly. Even in paper the "fix" may not give any improvement as I demonstrated in another post. It could be equal.

The fact is that whatever Komodo is doing, it is scaling sustainably better than others. Whether widening the search is a cause or correlation is unknown, but clearly, the whole approach is good.

Miguel

Sorry, but more than one program already has this capability. I think I saw it in stockfish, where they do a "pseudo-paralle-search" to debug, by doing the usual multiplexing operation. That is NOT a "messy" approach. When a program can be "fixed" by multiplexing, it can be fixed without multiplexing. The most common reason multiplexing will help is if you have poor move ordering where multiplexing lets you get to the better move while the first (inferior) move is still in progress. Of course, there are solutions that address that without multiplexing. But to date, NO algorithm has been found that improves the search above and beyond the basic idea of splitting the tree into parallel searches. How you split has lots of choices, but splitting to gain additional depth is the goal of parallel searching.

It is possible to write a program where it gets stronger as it goes deeper, in a sometimes non-linear way. Measuring Elo only between two versions doesn't say why/where the improvement comes from, just that it is there. One can measure absolutely anything, but quite often the measurement doesn't reveal anything useful...

Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8