Time to depth concerns

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

kgburcham
Posts: 2016
Joined: Sun Feb 17, 2008 4:19 pm

Re: Time to depth concerns

Post by kgburcham »

compare my oc 8 core to Werewolf 24 core
I was using Shredder interface, Werewolf using Fritz interface


[D] rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -

Engine: Stockfish 130816 64 BMI2 (4096 MB)
by T. Romstad, M. Costalba, J. Kiiski, G.

1 core @ 4.2, HT off, hash 4096, Windows 8.1, 1200 seconds to depth 34

34/49 20:00 +0.13-- 1.e4 e6 (2.171.062.522) 1808
(1200 seconds)



8 cores @ 4.2, HT off, hash 4096, Windows 8.1, 484 seconds to depth 34

34/46 8:04 +0.10 1.d4 Nf6 2.c4 e6 3.Nf3 b6 4.g3 Ba6 (6.779.258.907) 14005
(484 seconds)
no chess program was born totally from one mind. all chess programs have many ideas from many minds.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Time to depth concerns

Post by MikeB »

Werewolf wrote:Recently I've been doing some research into real speedup on many threads. My results are alarming. Perhaps people can comment on them.

I'm running a dual E5-2697 v2 (24 physical cores) with the following settings:
...

Thanks

Carl

Code: Select all

HT off, with 512 RAM, 12 Phys Cores, go depth 30 ( less patience than you 8-) )
about 2.1/1

SF081416-2Y
one core:
info depth 30 seldepth 44 multipv 1 score cp 15 nodes 675914031 nps 1358634 hashfull 999 tbhits 0 time 497495 pv d2d4 g8f6 g1f3 e7e6 c1f4 c7c5 e2e3 b8c6 b1c3 a7a6 d4d5 e6d5 c3d5 f6d5 d1d5 c6b4 d5d2 d7d5 c2c3 b4c6 f1d3 f8e7 e1g1 e8g8 d3c2 g7g6 f4h6 f8e8 e3e4 d5e4 c2e4 d8d2 h6d2
bestmove d2d4 ponder g8f6

twelve cores:
info depth 30 seldepth 42 multipv 1 score cp 10 nodes 3609793168 nps 15642994 hashfull 999 tbhits 0 time 230761 pv d2d4 g8f6 g1f3 e7e6 c1f4 d7d5 e2e3 f8d6 f4d6 d8d6 c2c4 b7b6 b1c3 e8g8 f1e2 c7c5 e1g1 b8d7 c4d5 e6d5 d1c2 c8b7 f1d1 f8e8 e2b5 a8d8 a1c1 h7h6 c2a4 d6b8 a4a3 c5c4
bestmove d2d4 ponder g8f6 

Code: Select all

crafty-25.1-081216
sd 30 - single core 512 hash

         30->   6:35          0.18   1. e4 e6 2. Nf3 d5 3. Nc3 Nf6 4. exd5 exd5
                                     5. d4 Nc6 6. Bb5 Bb4 7. O-O O-O 8. Bg5
                                     Bxc3 9. bxc3 h6 10. Bh4 g5 11. Bg3 Bf5
                                     12. Ne5 Nxe5 13. Bxe5 Ng4 14. Re1 Nxe5
                                     15. Rxe5 Be4
        time=6:35(100%)  nodes=2101825003(2.1B)  fh1=91%  pred=0  nps=5.3M
        chk=21.1M  qchk=33.2M  fp=728.6M  mcp=418.2M  50move=1
        LMReductions:  1/63.0M  2/45.4M  3/35.0M  4/14.6M  5/1.3M  6/21.8K
              7/14
        null-move (R):  3/86.3M  4/6.8M  5/265.3K  6/11.0K  7/274


sd 30 - twelve cores 512 hash

         30->   1:17          0.14   1. Nf3 Nf6 2. e3 c5 3. c4 e6 4. d4 Be7
                                     5. Bd3 d5 6. Nc3 dxc4 7. Bxc4 O-O 8. O-O
                                     Nc6 9. dxc5 Bxc5 10. Bd2 Qe7 11. Rc1 Rd8
                                     12. a3 Bd7 13. b4 Bd6 14. Nb5 Rac8
                                     15. Nxd6 Qxd6 (s=3)
        time=1:17(98%)  nodes=4067935324(4.1B)  fh1=90%  pred=0  nps=52.8M
        chk=39.6M  qchk=61.0M  fp=1.4B  mcp=834.8M  50move=1
        LMReductions:  1/119.7M  2/85.0M  3/64.7M  4/27.6M  5/2.6M  6/40.7K
              7/2
        null-move (R):  3/163.3M  4/12.4M  5/443.8K  6/16.5K  7/341
        splits=534.2K(402.4K)  aborts=58.7K  joins=727.6K  data=20%(20%)
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: Time to depth concerns

Post by Joerg Oster »

APassionForCriminalJustic wrote:
Werewolf wrote:
Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP.
So are you saying Lazy SMP doesn't scale well with more threads, or that time to depth isn't a good indicator of its speed? I'm guessing the latter.

In which case, how do we measure speed with Lazy SMP?

What about Komodo, is that lazy SMP or YBWC?

Thanks.
Unless I am mistaken LAZY-SMP is about the nodes per second, mostly. Nobody knows what implementation Komodo uses as far as I know.
No, this is just a well known property of this implementation, not the goal.
The goal is playing strength.
Jörg Oster
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: Time to depth concerns

Post by Joerg Oster »

Like others already pointed out, time-to-depth is no good measure for LazySMP in general.

Besides that, when running a fixed depth search, I disabled picking the move from another thread which searched deeper and has a better score like we do in game play.
So the displayed search info can be misleading or not telling the whole story, because we simply don't know how deep each of the helper threads was searching and how much they contributed.

I'm afraid the only reliable measure is playing strength.
Jörg Oster
Werewolf
Posts: 1796
Joined: Thu Sep 18, 2008 10:24 pm

Re: Time to depth concerns

Post by Werewolf »

bob wrote:
For the record, there are two ways to use processing power through multiple cores.

(2) sure MORE of the tree at the same depth. This is almost always a purely accidental side-effect of a search that has excessive overhead. Which describes lazy-amp pretty accurately.

If you prune too much, then widening the tree is of some benefit. But I always point out that you could widen the tree without a parallel search, just as easily, if that is really a goal.

time-do-depth is by far the easiest way to analyze parallel speedup. Of course, playing games would be even better,
I'm struggling with the logic of this for the following reasons:

a) Your point (2) seems reasonable, you're describing a "thicker" search tree. If the nps on 24 threads is 20x greater than one thread, yet it takes on average 1/4 of the time to reach the same depth, it therefore follows that to reach depth x, it takes 24 threads 20/4 = 5 times as many nodes as one thread.

This seems excessive & wasteful.

b) However, Lazy SMP works!

c) Therefore, I'm confused.

(and as a separate point both nps and time to depth seem useless for measuring speedup in LAzy SMP algorithms. And as you point out elo would take ages as an alternative)
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Time to depth concerns

Post by cdani »

Werewolf wrote: b) However, Lazy SMP works!
Because you can increase strength going deeper or reviewing more different moves, and lazysmp goes more in the second way. Anyway also it goes quite deep, but engines does not show it.
petero2
Posts: 688
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Time to depth concerns

Post by petero2 »

Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Time to depth concerns

Post by bob »

petero2 wrote:
Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
That certainly sounds like a YBW type approach. The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology. Doesn't mean threads wait or anything...
petero2
Posts: 688
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Time to depth concerns

Post by petero2 »

bob wrote:
petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.
I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type. A quote from the algorithm description:
petero2 wrote:No young brothers wait concept is used. A helper thread will search the move with the highest estimated probability of being useful, even if this is for example the second move at an expected cut node, and the first move is currently being searched. The idea is that it is better to do something than nothing, and the mechanism that aborts a helper thread when a move with a significantly higher probability becomes available makes sure that there is no long term penalty for starting to search a move with a low probability.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Time to depth concerns

Post by bob »

petero2 wrote:
bob wrote:
petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.
I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type. A quote from the algorithm description:
petero2 wrote:No young brothers wait concept is used. A helper thread will search the move with the highest estimated probability of being useful, even if this is for example the second move at an expected cut node, and the first move is currently being searched. The idea is that it is better to do something than nothing, and the mechanism that aborts a helper thread when a move with a significantly higher probability becomes available makes sure that there is no long term penalty for starting to search a move with a low probability.
If you don't use the basic YBW idea, this is not going to work well. But I will bet that it does. DTS certainly did, it just didn't do the "search one and then split" any more than Crafty does today. Crafty also "pre-splits" to leave work laying around to be picked up when a thread goes idle. But it doesn't pre-split before searching one node as that absolutely invites high search overhead as you NEVER want to split at a CUT node, it is a total waste of time.

In Cray Blitz, the decision was that split after a move is searched when possible, otherwise split with some intelligence trying to pick only ALL/PV nodes and not CUT nodes...

You HAVE to use some sort of YBW-derivative or the search will simply have a very poor speedup, because you can't afford to split at CUT nodes since they are 100% overhead and contribute zero to the speedup you are chasing...