compare my oc 8 core to Werewolf 24 core
I was using Shredder interface, Werewolf using Fritz interface
[D] rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -
Engine: Stockfish 130816 64 BMI2 (4096 MB)
by T. Romstad, M. Costalba, J. Kiiski, G.
1 core @ 4.2, HT off, hash 4096, Windows 8.1, 1200 seconds to depth 34
34/49 20:00 +0.13-- 1.e4 e6 (2.171.062.522) 1808
(1200 seconds)
8 cores @ 4.2, HT off, hash 4096, Windows 8.1, 484 seconds to depth 34
34/46 8:04 +0.10 1.d4 Nf6 2.c4 e6 3.Nf3 b6 4.g3 Ba6 (6.779.258.907) 14005
(484 seconds)
Time to depth concerns
Moderators: hgm, Rebel, chrisw
-
- Posts: 2016
- Joined: Sun Feb 17, 2008 4:19 pm
Re: Time to depth concerns
no chess program was born totally from one mind. all chess programs have many ideas from many minds.
-
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: Time to depth concerns
Werewolf wrote:Recently I've been doing some research into real speedup on many threads. My results are alarming. Perhaps people can comment on them.
I'm running a dual E5-2697 v2 (24 physical cores) with the following settings:
...
Thanks
Carl
Code: Select all
HT off, with 512 RAM, 12 Phys Cores, go depth 30 ( less patience than you 8-) )
about 2.1/1
SF081416-2Y
one core:
info depth 30 seldepth 44 multipv 1 score cp 15 nodes 675914031 nps 1358634 hashfull 999 tbhits 0 time 497495 pv d2d4 g8f6 g1f3 e7e6 c1f4 c7c5 e2e3 b8c6 b1c3 a7a6 d4d5 e6d5 c3d5 f6d5 d1d5 c6b4 d5d2 d7d5 c2c3 b4c6 f1d3 f8e7 e1g1 e8g8 d3c2 g7g6 f4h6 f8e8 e3e4 d5e4 c2e4 d8d2 h6d2
bestmove d2d4 ponder g8f6
twelve cores:
info depth 30 seldepth 42 multipv 1 score cp 10 nodes 3609793168 nps 15642994 hashfull 999 tbhits 0 time 230761 pv d2d4 g8f6 g1f3 e7e6 c1f4 d7d5 e2e3 f8d6 f4d6 d8d6 c2c4 b7b6 b1c3 e8g8 f1e2 c7c5 e1g1 b8d7 c4d5 e6d5 d1c2 c8b7 f1d1 f8e8 e2b5 a8d8 a1c1 h7h6 c2a4 d6b8 a4a3 c5c4
bestmove d2d4 ponder g8f6
Code: Select all
crafty-25.1-081216
sd 30 - single core 512 hash
30-> 6:35 0.18 1. e4 e6 2. Nf3 d5 3. Nc3 Nf6 4. exd5 exd5
5. d4 Nc6 6. Bb5 Bb4 7. O-O O-O 8. Bg5
Bxc3 9. bxc3 h6 10. Bh4 g5 11. Bg3 Bf5
12. Ne5 Nxe5 13. Bxe5 Ng4 14. Re1 Nxe5
15. Rxe5 Be4
time=6:35(100%) nodes=2101825003(2.1B) fh1=91% pred=0 nps=5.3M
chk=21.1M qchk=33.2M fp=728.6M mcp=418.2M 50move=1
LMReductions: 1/63.0M 2/45.4M 3/35.0M 4/14.6M 5/1.3M 6/21.8K
7/14
null-move (R): 3/86.3M 4/6.8M 5/265.3K 6/11.0K 7/274
sd 30 - twelve cores 512 hash
30-> 1:17 0.14 1. Nf3 Nf6 2. e3 c5 3. c4 e6 4. d4 Be7
5. Bd3 d5 6. Nc3 dxc4 7. Bxc4 O-O 8. O-O
Nc6 9. dxc5 Bxc5 10. Bd2 Qe7 11. Rc1 Rd8
12. a3 Bd7 13. b4 Bd6 14. Nb5 Rac8
15. Nxd6 Qxd6 (s=3)
time=1:17(98%) nodes=4067935324(4.1B) fh1=90% pred=0 nps=52.8M
chk=39.6M qchk=61.0M fp=1.4B mcp=834.8M 50move=1
LMReductions: 1/119.7M 2/85.0M 3/64.7M 4/27.6M 5/2.6M 6/40.7K
7/2
null-move (R): 3/163.3M 4/12.4M 5/443.8K 6/16.5K 7/341
splits=534.2K(402.4K) aborts=58.7K joins=727.6K data=20%(20%)
-
- Posts: 937
- Joined: Fri Mar 10, 2006 4:29 pm
- Location: Germany
Re: Time to depth concerns
No, this is just a well known property of this implementation, not the goal.APassionForCriminalJustic wrote:Unless I am mistaken LAZY-SMP is about the nodes per second, mostly. Nobody knows what implementation Komodo uses as far as I know.Werewolf wrote:So are you saying Lazy SMP doesn't scale well with more threads, or that time to depth isn't a good indicator of its speed? I'm guessing the latter.Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP.
In which case, how do we measure speed with Lazy SMP?
What about Komodo, is that lazy SMP or YBWC?
Thanks.
The goal is playing strength.
Jörg Oster
-
- Posts: 937
- Joined: Fri Mar 10, 2006 4:29 pm
- Location: Germany
Re: Time to depth concerns
Like others already pointed out, time-to-depth is no good measure for LazySMP in general.
Besides that, when running a fixed depth search, I disabled picking the move from another thread which searched deeper and has a better score like we do in game play.
So the displayed search info can be misleading or not telling the whole story, because we simply don't know how deep each of the helper threads was searching and how much they contributed.
I'm afraid the only reliable measure is playing strength.
Besides that, when running a fixed depth search, I disabled picking the move from another thread which searched deeper and has a better score like we do in game play.
So the displayed search info can be misleading or not telling the whole story, because we simply don't know how deep each of the helper threads was searching and how much they contributed.
I'm afraid the only reliable measure is playing strength.
Jörg Oster
-
- Posts: 1796
- Joined: Thu Sep 18, 2008 10:24 pm
Re: Time to depth concerns
I'm struggling with the logic of this for the following reasons:bob wrote:
For the record, there are two ways to use processing power through multiple cores.
(2) sure MORE of the tree at the same depth. This is almost always a purely accidental side-effect of a search that has excessive overhead. Which describes lazy-amp pretty accurately.
If you prune too much, then widening the tree is of some benefit. But I always point out that you could widen the tree without a parallel search, just as easily, if that is really a goal.
time-do-depth is by far the easiest way to analyze parallel speedup. Of course, playing games would be even better,
a) Your point (2) seems reasonable, you're describing a "thicker" search tree. If the nps on 24 threads is 20x greater than one thread, yet it takes on average 1/4 of the time to reach the same depth, it therefore follows that to reach depth x, it takes 24 threads 20/4 = 5 times as many nodes as one thread.
This seems excessive & wasteful.
b) However, Lazy SMP works!
c) Therefore, I'm confused.
(and as a separate point both nps and time to depth seem useless for measuring speedup in LAzy SMP algorithms. And as you point out elo would take ages as an alternative)
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Time to depth concerns
Because you can increase strength going deeper or reviewing more different moves, and lazysmp goes more in the second way. Anyway also it goes quite deep, but engines does not show it.Werewolf wrote: b) However, Lazy SMP works!
Daniel José - http://www.andscacs.com
-
- Posts: 688
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: Time to depth concerns
Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Time to depth concerns
That certainly sounds like a YBW type approach. The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology. Doesn't mean threads wait or anything...petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
-
- Posts: 688
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: Time to depth concerns
I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type. A quote from the algorithm description:bob wrote:The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
petero2 wrote:No young brothers wait concept is used. A helper thread will search the move with the highest estimated probability of being useful, even if this is for example the second move at an expected cut node, and the first move is currently being searched. The idea is that it is better to do something than nothing, and the mechanism that aborts a helper thread when a move with a significantly higher probability becomes available makes sure that there is no long term penalty for starting to search a move with a low probability.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Time to depth concerns
If you don't use the basic YBW idea, this is not going to work well. But I will bet that it does. DTS certainly did, it just didn't do the "search one and then split" any more than Crafty does today. Crafty also "pre-splits" to leave work laying around to be picked up when a thread goes idle. But it doesn't pre-split before searching one node as that absolutely invites high search overhead as you NEVER want to split at a CUT node, it is a total waste of time.petero2 wrote:I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type. A quote from the algorithm description:bob wrote:The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.petero2 wrote:No young brothers wait concept is used. A helper thread will search the move with the highest estimated probability of being useful, even if this is for example the second move at an expected cut node, and the first move is currently being searched. The idea is that it is better to do something than nothing, and the mechanism that aborts a helper thread when a move with a significantly higher probability becomes available makes sure that there is no long term penalty for starting to search a move with a low probability.
In Cray Blitz, the decision was that split after a move is searched when possible, otherwise split with some intelligence trying to pick only ALL/PV nodes and not CUT nodes...
You HAVE to use some sort of YBW-derivative or the search will simply have a very poor speedup, because you can't afford to split at CUT nodes since they are 100% overhead and contribute zero to the speedup you are chasing...