Threads test incl. Stockfish 5 and Komodo 8

Laskos · Post by **Laskos** » Sat Oct 18, 2014 2:43 pm

bob wrote:
Laskos wrote:
bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:Explain HOW you are going to pull that off. You are already searching 2x the minimal tree for this position. Most of which is not helping you at all.
Note that if some of it is helping, we're already done. That some part of the extra nodes that are helping is referred to as "widening".

This is the thing, nothing more, nothing less.

Widening == not all "extra" nodes are worthless.
But you CAN reduce the total with effort, and you can NOT reduce just the ones that are no good.
With effort you can make Crafty beat Stockfish...

And IF those extra nodes help, search 'em in the regular search also, they will help there too.
Nope. Firstly, you cannot duplicate the 4-core tree with a single threaded search without time penalty. Secondly, even if you could, it would only help in case of super-linear speedup which nobody is claiming.
Come back to the discussion when you have some basic idea about what is being discussed. As of right now, you are arguing without a clue about what is happening inside a tree.
You seem to not know how a single core tree grows. Example is that 3 year old thread
http://www.talkchess.com/forum/viewtopi ... 83&t=38808
The pictures are gone, but it still can be understood, with Marco Costabla surely understanding it:

Laskos wrote:
Tree shape is a completely different matter, and as you can see, it is widening going to 5,10,15,20,25 _target_ depth. I can clearly separate EBF into two components: EBF_widening, which is ~1.4, and EBF_deepening, which is ~1.6, for a total EBF of 1.4*1.6 ~ 2.2. As I already wrote, going from target N to target N+1 requires a comparable effort in both deepening and widening.
mcostalba wrote:
This is very interesting, if I have understood correctly you say that the extra effort that we need to move the search at depth N+1 is due for a good 45% (1,4 vs 1,6) by additional search at the inner nodes and only for 55% by an additional search at the new deeper leaf nodes.
With Zach Wegner having some plot of the tree behavior.
Zach Wegner wrote:
Laskos wrote:
Try to talk that to Bob, he doesn't even know how to read a plot, where the slopes (first derivatives) are giving EBF(target depth, ply) on the tree, not the absolute values, and those slopes are both larger and smaller for different plies than the general EBF(target depth), giving a total pretty constant EBF(target depth) i.e. from target depth N to N+1 (different trees). I am really eager to see someone plot the tree shape of Stockfish for different target depths, there is no way it's not widening compared to the purely exponential, his "widening", which is not a widening at all (he seems to be confused about what widening of the tree is). Bob's theoretical tree shape is the purely exponential, non-widening W^D, how silly is that? I don't think he knows the tree shape even of his Crafty.

Kai
I did plot some Stockfish trees and, as expected since it uses logarithmic reductions, the trees have a huge amount of widening. In fact, the widening accounts for far more of the tree growth than deepening.
_________________
You Bob missed the whole point, IIRC, and you have no idea how the tree grows even in single-cored Crafty iteration after iteration.

The term "widening" was simply wrong. ANY tree is wider near the tips than near the root. Big surprise. But it doesn't grow EXTRA wider as it goes deeper, just a fairly constant proportion.

BTW I knew how an alpha/beta tree grows before you knew how to spell the same.

I think you are stuck on alpha/beta and that's all what you know more seriously. In that thread you had no clue how a tree shape with reductions looks. You proposed something from 60es as the tree shape, the purely exponential, non-widening W^Depth.

With logartihmic reductions a la Stockfish, I plotted tree shapes as leaf nodes density , first Bob Hyatt no-clue theoretized tree shape, probablt not even valid for Crafty, then Stockfish, Marco Costabla and Zach Wegner tree shapes.

1/ This is Bob Hyatt theory on tree shape (from 60es?):

On iteration 20 the nodes to the depth 5 are identical in number to the nodes at the depth 5 on iteration 5.

2/ Stockfish model (logarithmic reductions):

As one can see, the shape of the tree to depth 5 is very different for iteration 20 and iteration 5. On iteration 20, Stockfish searches depth 5 almost full width, while high depths are heavily pruned, and are sparse. On iteration 5 depth 5 is sparse.

So, Stockfish gains against Bob's exponential tree 2 things:
1) Goes deeper by heavy pruning.
2) Tree thickens with each iteration, becoming almost full width at lower depths on high number of iterations.

And as Zach Wegner said, Stockfish spends more on inner nodes than on deepening.

As for Bob, if he cannot visualize the tree shape on 1 core, how can he see the parallelization?

bob · Post by **bob** » Sat Oct 18, 2014 5:01 pm

Adam Hair wrote:
bob wrote:
Adam Hair wrote:
bob wrote:
syzygy wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
ernest wrote:
syzygy wrote:How come absolutely nobody here with some knowledge of parallel programming agrees with you...
Well, at least on the theoretical side, what Bob says here doesn't sound so stupid...
In paper. Only in paper.

But in practice, linearizing a parallel algorithm would be a mess of gigantic proportions, and I cannot even imagine the problems and the unreadability. So, saying that Komodo could be "fixed" in single core is silly. Even in paper the "fix" may not give any improvement as I demonstrated in another post. It could be equal.

The fact is that whatever Komodo is doing, it is scaling sustainably better than others. Whether widening the search is a cause or correlation is unknown, but clearly, the whole approach is good.

Miguel
Sorry, but more than one program already has this capability. I think I saw it in stockfish, where they do a "pseudo-paralle-search" to debug, by doing the usual multiplexing operation.

Never heard of this and it does not make sense for debugging. You lose all the randomness. Maybe a SF developer can clarify, but I do not believe it. Even if this exists, does not have any overhead?
SF certainly does not have anything remotely comparable to a single-threaded simulation of a multithreaded search.

But if it had, it would obviously have a major overhead. Bob surely understands this.

Even if it could be done without overhead, it would have nothing to do with the topic of this thread. Nobody is claiming that Komodo's 4-core search is as strong as Komodo's single-threaded search with 4x the time.
Go look at what this code triggers: this directly from stockfish search.cpp:

// Set to true to force running with one thread. Used for debugging
const bool FakeSplit = false;
FakeSplit is no longer in Stockfish. The following link shows the code that contained FakeSplit and the changes made.

https://github.com/zamar/Stockfish/comm ... a50004f810

This is the code that contained FakeSplit in Stockfish 4:
Code: Select all
      // Step 19. Check for splitting the search
      if (   !SpNode
          &&  depth >= Threads.minimumSplitDepth
          &&  Threads.available_slave&#40;thisThread&#41;
          &&  thisThread->splitPointsSize < MAX_SPLITPOINTS_PER_THREAD&#41;
      &#123;
          assert&#40;bestValue < beta&#41;;

          thisThread->split<FakeSplit>&#40;pos, ss, alpha, beta, &bestValue, &bestMove,
                                       depth, threatMove, moveCount, &mp, NT, cutNode&#41;;
          if &#40;bestValue >= beta&#41;
              break;
      &#125;
    &#125;

    if &#40;SpNode&#41;
        return bestValue;
What other programs are capable of multiplexing?
I don't peruse other programs so no idea. A couple of versions of Crafty did, when I first started the parallel search code, although I don't think any were ever released. The 1983 version of Cray Blitz did as we had to simulate testing while waiting on Cray to get the SMP libraries and fortran compiler changes completed.
I asked the question because you stated that "more than one already has this capability".

The only sensible reason for multiplexing is debugging when you don't have a decent debugger than can handle threads. gdb does this quite nicely today making multiplexing unnecessary. The multiplexing concept, to shoot down consistent super-linear speedups and such is a simple idea. It is NOT the way a rational person would "fix" the problem. It is mainly used as a simple and easy to understand methodology to debunk the occasional bogus claims that surface here and there.

bob wrote: I didn't look at it at all, just noticed it in passing. If you don't understand the multiplexing issue and why it was brought up, then please stay out of the discussion. It is DIRECTLY related to the "widening" issue that keeps getting broached here. If someone gains from this "widening" they could get the same benefit from multiplexing the search.

Obviously I thought StockFish did. Two of my programs did, and could have it again although there is no point since gdb works well with threads. When parallel search first sparked everyone's interest, it was a relatively common thing to do since access to parallel machines was quite limited and we would rather use that time where it was more helpful, if we could do basic debugging on a normal machine (like a DEC Vax or Sun workstation).

mvk · Post by **mvk** » Sun Oct 19, 2014 7:02 pm

Adam Hair wrote:So, I recomputed the ratings differences into Wilo differences (wins and losses only) and plotted them:

I missed this, or maybe it is obvious, but how are these "wilos" calculated exactly? Just the elo formula, but removing all draws from the results first?

michiguel · Post by **michiguel** » Sun Oct 19, 2014 7:17 pm

mvk wrote:
Adam Hair wrote:So, I recomputed the ratings differences into Wilo differences (wins and losses only) and plotted them:
I missed this, or maybe it is obvious, but how are these "wilos" calculated exactly? Just the elo formula, but removing all draws from the results first?

Adam is just removing the draws here, just for simplicity. That is why I plot log(W/L) in the y axis to make it explicit.

In a real rating program we will have to deal with draws differently, just not to throw that information. We will have to assume draw rates vary in a certain way throughout the "strength scale" and do some global maximum likelihood calculation.

We have found good experimental correlation with the theoretical assumptions we make (i.e. the equations of draw rate vs. strength fit the data nicely).

Miguel

Laskos · Post by **Laskos** » Mon Oct 20, 2014 4:29 pm

michiguel wrote:
One thing that Andreas could do is to test Engine_X vs Engine_X (with twice the time, not twice the cores). This will be a theoretical 100% speed up for two cores. If we get the ratio we will get an "effective" speed up.

Miguel

That's some additional effort from Andreas. For now I hope he tests Komodo 8 on 32 cores versus Komodo 8 on 16 cores.
As for your proposal, we do have some numbers relatively universal to top engines: on one core at 60''+0.05'' time control, doubling in time gives 90-100 Elo pints. On 16 cores, with effective speed-up of ~8, and core being some 1.5 slower than an i7 core, it would give ~70 Elo points gain. So, it's up to us to interpret what Andreas gets from Komodo 32 cores against Komodo 16 cores. Later, if he is kind enough, we can ask the pure doubling in time.

Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8

Re: Threads test incl. Stockfish 5 and Komodo 8