Threads test incl. Stockfish 5 and Komodo 8

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by bob »

michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
syzygy wrote:
bob wrote:if that kind of super-linear speedup happens consistently.
For the Nth time, nobody is claiming a super-linear speedup. Your whole argument is a non-starter to begin with.
super-linear and this "widening" are members of the same set, "urban legend".
Do you realize your logic is totally flawed? it is a gigantic straw man. You bring something unrelated, say that they are, just to bring the other down.

And I am sure that Komodo is not a urban legend.

Miguel
No but intentional widening rather than searching deeper is certainly a legend.
It has never been discussed, so it is not a legend. By the way, it is irrelevant whether it is intentional or not.

Miguel
How so. If it is unintentional one works to reduce/eliminate it. If it is intentional, one does not.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by michiguel »

bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
syzygy wrote:
bob wrote:if that kind of super-linear speedup happens consistently.
For the Nth time, nobody is claiming a super-linear speedup. Your whole argument is a non-starter to begin with.
super-linear and this "widening" are members of the same set, "urban legend".
Do you realize your logic is totally flawed? it is a gigantic straw man. You bring something unrelated, say that they are, just to bring the other down.

And I am sure that Komodo is not a urban legend.

Miguel
No but intentional widening rather than searching deeper is certainly a legend.
It has never been discussed, so it is not a legend. By the way, it is irrelevant whether it is intentional or not.

Miguel
How so. If it is unintentional one works to reduce/eliminate it. If it is intentional, one does not.
No, if it is a side effect that helps, you do not want to remove it.

Miguel
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by michiguel »

An image in a decent size (I could not edit it after 15 min, arghh).

Image

The equation is

speed up = n / (f * (n^2 - n) + 1)

log2 (W/L) = k * log2 (Speedup)

k is a proportionality constant and f is a factor that mean the "apparent" amount of extra work (non parallelizable) that each extra cpu introduce.

For
K8, f = 0.0011 (0.11%) k = 1.82
K7, f = 0.0016 (0.16%) k = 1.62
SF, f = 0.0087 (0.87%) k = 2.09 (starts great)
Za, f = 0.0100 (1.00%) k = 1.84
Cr, f = 0.0160 (1.60%) k = 1.82

Miguel

Miguel
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by syzygy »

bob wrote:
syzygy wrote:
bob wrote:Explain HOW you are going to pull that off. You are already searching 2x the minimal tree for this position. Most of which is not helping you at all.
Note that if some of it is helping, we're already done. That some part of the extra nodes that are helping is referred to as "widening".

This is the thing, nothing more, nothing less.

Widening == not all "extra" nodes are worthless.
But you CAN reduce the total with effort, and you can NOT reduce just the ones that are no good.
With effort you can make Crafty beat Stockfish...
And IF those extra nodes help, search 'em in the regular search also, they will help there too.
Nope. Firstly, you cannot duplicate the 4-core tree with a single threaded search without time penalty. Secondly, even if you could, it would only help in case of super-linear speedup which nobody is claiming.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by bob »

michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
syzygy wrote:
bob wrote:if that kind of super-linear speedup happens consistently.
For the Nth time, nobody is claiming a super-linear speedup. Your whole argument is a non-starter to begin with.
super-linear and this "widening" are members of the same set, "urban legend".
Do you realize your logic is totally flawed? it is a gigantic straw man. You bring something unrelated, say that they are, just to bring the other down.

And I am sure that Komodo is not a urban legend.

Miguel
No but intentional widening rather than searching deeper is certainly a legend.
It has never been discussed, so it is not a legend. By the way, it is irrelevant whether it is intentional or not.

Miguel
How so. If it is unintentional one works to reduce/eliminate it. If it is intentional, one does not.
No, if it is a side effect that helps, you do not want to remove it.

Miguel
OK, how about stopping to think about this for a minute. You KNOW that not all extra nodes are "helpful". You have NO CONTROL over where those extra nodes show up in the tree. Why on earth would you choose to do that? There is some percentage of those extra nodes that can help by including an important move that normal selectivity would exclude. So what percentage of those overhead nodes are good? Even if it were 50%, which is beyond unlikely, you are wasting half of that time. I'd be more likely to believe 10% help, 90% do not. Isn't there a BETTER way to include that 10% and exclude the 90% that are just raw overhead? Of course there must be.

I don't go along with this stuff. Before one widens the search intentionally, one has to give up on improving the parallel search by reducing the overhead. So you have overhead that is not helping very much since you supposedly want to intentionally "widen" the search to help offset selectivity errors. And for each of those nodes you add, you get even more overhead nodes due to the parallel search. That is not a reasonable way to do things. In fact, it is the antithesis of alpha/beta and selectivity. You exclude what is unimportant, only to add things back in that are unimportant. And if that is a good idea in a parallel search, why is it not a good idea in the basic serial search?

NOBODY has addressed that issue and just relies on handwaving and "but I have seen this and shows that the widening happens." But just because it happens does NOT mean (a) it was intentional and (b) it was the best use of computational resources.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:Explain HOW you are going to pull that off. You are already searching 2x the minimal tree for this position. Most of which is not helping you at all.
Note that if some of it is helping, we're already done. That some part of the extra nodes that are helping is referred to as "widening".

This is the thing, nothing more, nothing less.

Widening == not all "extra" nodes are worthless.
But you CAN reduce the total with effort, and you can NOT reduce just the ones that are no good.
With effort you can make Crafty beat Stockfish...
And IF those extra nodes help, search 'em in the regular search also, they will help there too.
Nope. Firstly, you cannot duplicate the 4-core tree with a single threaded search without time penalty. Secondly, even if you could, it would only help in case of super-linear speedup which nobody is claiming.
Come back to the discussion when you have some basic idea about what is being discussed. As of right now, you are arguing without a clue about what is happening inside a tree.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by Adam Hair »

bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
syzygy wrote:
bob wrote:if that kind of super-linear speedup happens consistently.
For the Nth time, nobody is claiming a super-linear speedup. Your whole argument is a non-starter to begin with.
super-linear and this "widening" are members of the same set, "urban legend".
Do you realize your logic is totally flawed? it is a gigantic straw man. You bring something unrelated, say that they are, just to bring the other down.

And I am sure that Komodo is not a urban legend.

Miguel
No but intentional widening rather than searching deeper is certainly a legend.
It has never been discussed, so it is not a legend. By the way, it is irrelevant whether it is intentional or not.

Miguel
How so. If it is unintentional one works to reduce/eliminate it. If it is intentional, one does not.
No, if it is a side effect that helps, you do not want to remove it.

Miguel
OK, how about stopping to think about this for a minute. You KNOW that not all extra nodes are "helpful". You have NO CONTROL over where those extra nodes show up in the tree. Why on earth would you choose to do that? There is some percentage of those extra nodes that can help by including an important move that normal selectivity would exclude. So what percentage of those overhead nodes are good? Even if it were 50%, which is beyond unlikely, you are wasting half of that time. I'd be more likely to believe 10% help, 90% do not. Isn't there a BETTER way to include that 10% and exclude the 90% that are just raw overhead? Of course there must be.

I don't go along with this stuff. Before one widens the search intentionally, one has to give up on improving the parallel search by reducing the overhead. So you have overhead that is not helping very much since you supposedly want to intentionally "widen" the search to help offset selectivity errors. And for each of those nodes you add, you get even more overhead nodes due to the parallel search. That is not a reasonable way to do things. In fact, it is the antithesis of alpha/beta and selectivity. You exclude what is unimportant, only to add things back in that are unimportant. And if that is a good idea in a parallel search, why is it not a good idea in the basic serial search?

NOBODY has addressed that issue and just relies on handwaving and "but I have seen this and shows that the widening happens." But just because it happens does NOT mean (a) it was intentional and (b) it was the best use of computational resources.
Best use of computational resources? Perhaps not. However, by 4 cores Komodo 8 is more effectively using the computer resources than Zappa and Crafty, and more effectively than Stockfish by 8 cores.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by Laskos »

michiguel wrote:An image in a decent size (I could not edit it after 15 min, arghh).

Image

The equation is

speed up = n / (f * (n^2 - n) + 1)

log2 (W/L) = k * log2 (Speedup)

k is a proportionality constant and f is a factor that mean the "apparent" amount of extra work (non parallelizable) that each extra cpu introduce.

For
K8, f = 0.0011 (0.11%) k = 1.82
K7, f = 0.0016 (0.16%) k = 1.62
SF, f = 0.0087 (0.87%) k = 2.09 (starts great)
Za, f = 0.0100 (1.00%) k = 1.84
Cr, f = 0.0160 (1.60%) k = 1.82

Miguel
Very interesting. Can you hint how you derived f?
You assumed an Amdahl-like behavior:
S = (R + 1) * n / (R + n) like in this model?
http://www.talkchess.com/forum/viewtopi ... 29&t=48780

About Wilos, I speculated that Win/Loss ratio remains pretty constant beyond ultra-fast controls several years ago on another forum. More than one year ago, I speculated that here too:
http://www.talkchess.com/forum/viewtopi ... 27&t=48733
With some data. Inversing with draw rate and Elo gain, I even saw a very slow increase with time control of Win/Loss ratio, but almost flat. Wilos will get rid of all these strength, time control, hardware, etc. issues present in Elos. From your graphs and interpolation, it seems that Zappa and Crafty apparent good Elo gain to 8 cores occurs just because they are weaker, and have less draws. In Wilos, their gain is worse than that of top engines, and to 16 cores Komodo 8 is standing out.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by Laskos »

bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:Explain HOW you are going to pull that off. You are already searching 2x the minimal tree for this position. Most of which is not helping you at all.
Note that if some of it is helping, we're already done. That some part of the extra nodes that are helping is referred to as "widening".

This is the thing, nothing more, nothing less.

Widening == not all "extra" nodes are worthless.
But you CAN reduce the total with effort, and you can NOT reduce just the ones that are no good.
With effort you can make Crafty beat Stockfish...
And IF those extra nodes help, search 'em in the regular search also, they will help there too.
Nope. Firstly, you cannot duplicate the 4-core tree with a single threaded search without time penalty. Secondly, even if you could, it would only help in case of super-linear speedup which nobody is claiming.
Come back to the discussion when you have some basic idea about what is being discussed. As of right now, you are arguing without a clue about what is happening inside a tree.
You seem to not know how a single core tree grows. Example is that 3 year old thread
http://www.talkchess.com/forum/viewtopi ... 83&t=38808
The pictures are gone, but it still can be understood, with Marco Costabla surely understanding it:
Laskos wrote:
Tree shape is a completely different matter, and as you can see, it is widening going to 5,10,15,20,25 _target_ depth. I can clearly separate EBF into two components: EBF_widening, which is ~1.4, and EBF_deepening, which is ~1.6, for a total EBF of 1.4*1.6 ~ 2.2. As I already wrote, going from target N to target N+1 requires a comparable effort in both deepening and widening.
mcostalba wrote:
This is very interesting, if I have understood correctly you say that the extra effort that we need to move the search at depth N+1 is due for a good 45% (1,4 vs 1,6) by additional search at the inner nodes and only for 55% by an additional search at the new deeper leaf nodes.
With Zach Wegner having some plot of the tree behavior.
Zach Wegner wrote:
Laskos wrote:
Try to talk that to Bob, he doesn't even know how to read a plot, where the slopes (first derivatives) are giving EBF(target depth, ply) on the tree, not the absolute values, and those slopes are both larger and smaller for different plies than the general EBF(target depth), giving a total pretty constant EBF(target depth) i.e. from target depth N to N+1 (different trees). I am really eager to see someone plot the tree shape of Stockfish for different target depths, there is no way it's not widening compared to the purely exponential, his "widening", which is not a widening at all (he seems to be confused about what widening of the tree is). Bob's theoretical tree shape is the purely exponential, non-widening W^D, how silly is that? I don't think he knows the tree shape even of his Crafty.

Kai
I did plot some Stockfish trees and, as expected since it uses logarithmic reductions, the trees have a huge amount of widening. In fact, the widening accounts for far more of the tree growth than deepening.
_________________
You Bob missed the whole point, IIRC, and you have no idea how the tree grows even in single-cored Crafty iteration after iteration.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Threads test incl. Stockfish 5 and Komodo 8

Post by michiguel »

Laskos wrote:
michiguel wrote:An image in a decent size (I could not edit it after 15 min, arghh).

Image

The equation is

speed up = n / (f * (n^2 - n) + 1)

log2 (W/L) = k * log2 (Speedup)

k is a proportionality constant and f is a factor that mean the "apparent" amount of extra work (non parallelizable) that each extra cpu introduce.

For
K8, f = 0.0011 (0.11%) k = 1.82
K7, f = 0.0016 (0.16%) k = 1.62
SF, f = 0.0087 (0.87%) k = 2.09 (starts great)
Za, f = 0.0100 (1.00%) k = 1.84
Cr, f = 0.0160 (1.60%) k = 1.82

Miguel
Very interesting. Can you hint how you derived f?
You assumed an Amdahl-like behavior:
S = (R + 1) * n / (R + n) like in this model?
http://www.talkchess.com/forum/viewtopi ... 29&t=48780

About Wilos, I speculated that Win/Loss ratio remains pretty constant beyond ultra-fast controls several years ago on another forum. More than one year ago, I speculated that here too:
http://www.talkchess.com/forum/viewtopi ... 27&t=48733
With some data. Inversing with draw rate and Elo gain, I even saw a very slow increase with time control of Win/Loss ratio, but almost flat. Wilos will get rid of all these strength, time control, hardware, etc. issues present in Elos. From your graphs and interpolation, it seems that Zappa and Crafty apparent good Elo gain to 8 cores occurs just because they are weaker, and have less draws. In Wilos, their gain is worse than that of top engines, and to 16 cores Komodo 8 is standing out.
That was a good observation Kai. We were brainstorming about how to introduce draws and deal with the fact that draw rates change with increasing strength. If you look at the physical chemical "model" of Ordo, it becomes obvious that TRUE strength is related to the ratio W/L (that is related to the difference in "energetic" levels between W and L.

As you point out, Z and C seem to have a better scaling at the beginning because they are weaker than SF and K. Exactly.

One thing that Andreas could do is to test Engine_X vs Engine_X (with twice the time, not twice the cores). This will be a theoretical 100% speed up for two cores. If we get the ratio we will get an "effective" speed up.

Sorry, I was going give the derivation but I got trapped with work

Code: Select all

Tipical Amdahl can be shown as

      S + P 
SU = ---------
      S + P/n

Numerator is the work with one thread, denominator is the work with "n" threads
Where
SU = speed up
S = serial fraction of the work that cannot be parallelized
P = work that can be parallelized
n = threads
Of course, S + P = 1

But, Amdahl cannot explain that the speed up may go down at a certain number of threads.

Then, I assumed that the work with n threads starts to have spurious work added for each thread added, and that it is a work that cannot be parallelized. That is, for each thread added, every thread slows down a bit in areas doing nothing. So, I add a term f*(n-1) in the denominator and place 1 in the numerator since S+P=1.

Code: Select all

             1
SU = -------------------
      S + P/n + f (n-1)

I can stop here, but I assumed that S is near zero because the fitting hinted me it was reasonable (that is, the amdahl term S in negligible compared to the new term f*(n-1)) then.
P = 1 and the term S disappears:

Code: Select all

             1
SU = -------------------
        1/n + f (n-1)
multiplying n in the num and den:

Code: Select all

             n
SU = -------------------
        1 + f (n^2-n)

Miguel