Time to depth concerns

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Werewolf
Posts: 1795
Joined: Thu Sep 18, 2008 10:24 pm

Time to depth concerns

Post by Werewolf »

Recently I've been doing some research into real speedup on many threads. My results are alarming. Perhaps people can comment on them.

I'm running a dual E5-2697 v2 (24 physical cores) with the following settings:
- 64 GB RAM
- HT OFF
- clock speed is ALWAYS 3.00 GHz, measured with CPUZ.
- GUI Fritz 15
- Hash always 4 GB
- Windows 10, 64 bit.
- No other programs running.

1 THREAD
I took Stockfish Dev 10082016 and timed how long it took to reach depth 34 on the start position, running on 1 physical core (1 thread). To make sure Intel's turbo didn't raise the clock-speed I ran Aquarium in the background and had Komodo ticking over on 20 threads. I then checked the clockspeed to make sure it was 3.00 GHz.

RUN 1:
34 / 46 = 2175 seconds
34 / 52 = 2461 seconds

RUN 2:
34 / 46 = 2187 seconds
34 / 52 = 2475 seconds

Nothing surprising here. N/S were around 1.1 MNs. The two runs show what we all know - SF is deterministic on 1 thread and the results were what I expected.

24 THREADS
Next, I repeated the exercise but now I ran SF on 24 cores. No other programs were running. Clock speed was again confirmed at 3.00 GHz and N/S were around 25 million - about what was expected.

RUN 1:
34/49 = 714 s

RUN 2:
34/44 = 1052 s
34/47 = 1194 s

RUN 3:
34/45 = 537 s

RUN 4:
34/45 = 717 s
34/45 = 745 s
34/45 = 782 s

RUN 5:
34/43 = 507 s

RUN 6:
34/51 = 937 s

RUN 7:
34/49 = 671 s

RUN 8:
34/45 = 337 s
34/50 = 557 s

RUN 9:
34/43 = 497 s

RUN 10:
34/45 = 724 s

I realise it would be good to have more data. However, even with this limited sample it is fairly clear that if time-to-depth is a true indication of real speed up: the gain going from 1 thread to 24 is somewhere in the range of 3-4x quicker.

This is shocking, at least to me. I expected more like a 10-fold increase in search speed. Can anyone explain this?

Thanks

Carl
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: Time to depth concerns

Post by Edsel Apostol »

Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
gordonr
Posts: 194
Joined: Thu Aug 06, 2009 8:04 pm
Location: UK

Re: Time to depth concerns

Post by gordonr »

I can't explain your figures but in the past I've been interested in how best to conduct "time to depth" testing.

http://www.talkchess.com/forum/viewtopi ... highlight=

I ended up with an automated test framework that: used many various positions; did hundreds of tests runs; and used the median time. I believe this was necessary to make the figures as meaningful as possible.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Time to depth concerns

Post by zullil »

Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
I agree----this seems like typical LazySMP behavior.
Werewolf
Posts: 1795
Joined: Thu Sep 18, 2008 10:24 pm

Re: Time to depth concerns

Post by Werewolf »

Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP.
So are you saying Lazy SMP doesn't scale well with more threads, or that time to depth isn't a good indicator of its speed? I'm guessing the latter.

In which case, how do we measure speed with Lazy SMP?

What about Komodo, is that lazy SMP or YBWC?

Thanks.
APassionForCriminalJustic
Posts: 417
Joined: Sat May 24, 2014 9:16 am

Re: Time to depth concerns

Post by APassionForCriminalJustic »

Werewolf wrote:
Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP.
So are you saying Lazy SMP doesn't scale well with more threads, or that time to depth isn't a good indicator of its speed? I'm guessing the latter.

In which case, how do we measure speed with Lazy SMP?

What about Komodo, is that lazy SMP or YBWC?

Thanks.
Unless I am mistaken LAZY-SMP is about the nodes per second, mostly. Nobody knows what implementation Komodo uses as far as I know.
User avatar
Eelco de Groot
Posts: 4563
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Time to depth concerns

Post by Eelco de Groot »

APassionForCriminalJustic wrote:
Unless I am mistaken LAZY-SMP is about the nodes per second, mostly. Nobody knows what implementation Komodo uses as far as I know.
I don't know much more than you but those statements are both rather far from the truth, really. Back when Komodo did not have a working SMP implementation, unlike other programs from Don in the past, he did say he was working with an simpler form of YBW. Well something to that effect at least. I understood that as a form of Lazy SMP also because that was easier to implement. So it stands to reason that this could still be so if it worked well, and winning TCEC later qualifies as a success.

Has nothing to do with nodes per second. More with iterative deepening. The more threads, the fuzzier exactly what depth they are on, and what I have seen from SF they are all on different depths? But really that is about all I know of the implementation in SF. I just picture it as trying to get the same effect as restarting the search from scratch somewhere midtime, and do this many times over. You profit from the already partially ordered tree and start over. Some threads are at higher depths instead, but do not get the chance to finish in their iteration, but they could profit from a lucky transposition for instance or do some groundbreaking work for later iterations. Also, I think I saw the thread with the best result is chosen at the end in Stockfish, not just the highest completed depth. That has nothing to do with alpha beta anymore :P
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Time to depth concerns

Post by Dann Corbit »

LazySMP does not get as deep as YBW, but it plays better, according to tests.

It also seems to perform shockingly well with hyperthreading turned on, at least according to some experiments.

I don't think anyone can tell you the why on either of those.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Time to depth concerns

Post by bob »

Werewolf wrote:
Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP.
So are you saying Lazy SMP doesn't scale well with more threads, or that time to depth isn't a good indicator of its speed? I'm guessing the latter.

In which case, how do we measure speed with Lazy SMP?

What about Komodo, is that lazy SMP or YBWC?

Thanks.
For the record, there are two ways to use processing power through multiple cores.

(1) search the SAME tree faster, so that you get deeper into the tree in the same time;

(2) sure MORE of the tree at the same depth. This is almost always a purely accidental side-effect of a search that has excessive overhead. Which describes lazy-amp pretty accurately.

If you prune too much, then widening the tree is of some benefit. But I always point out that you could widen the tree without a parallel search, just as easily, if that is really a goal.

time-do-depth is by far the easiest way to analyze parallel speedup. Of course, playing games would be even better, but the computational cost is astronomical as in normal testing, you run one game per core to get enough games played to drive the error bar to your target. But if you test a parallel search, you use all cores on one game, which on a 20 core machine would make the testing take 20x longer. I don't think Elo will ever be a viable measuring approach. A couple of years ago we had a 70 node cluster with 8 cores per node. Giving me the option of playing 70 games at once, or 560 games at once. I always chose the latter. :)
APassionForCriminalJustic
Posts: 417
Joined: Sat May 24, 2014 9:16 am

Re: Time to depth concerns

Post by APassionForCriminalJustic »

Eelco de Groot wrote:
APassionForCriminalJustic wrote:
Unless I am mistaken LAZY-SMP is about the nodes per second, mostly. Nobody knows what implementation Komodo uses as far as I know.
I don't know much more than you but those statements are both rather far from the truth, really. Back when Komodo did not have a working SMP implementation, unlike other programs from Don in the past, he did say he was working with an simpler form of YBW. Well something to that effect at least. I understood that as a form of Lazy SMP also because that was easier to implement. So it stands to reason that this could still be so if it worked well, and winning TCEC later qualifies as a success.

Has nothing to do with nodes per second. More with iterative deepening. The more threads, the fuzzier exactly what depth they are on, and what I have seen from SF they are all on different depths? But really that is about all I know of the implementation in SF. I just picture it as trying to get the same effect as restarting the search from scratch somewhere midtime, and do this many times over. You profit from the already partially ordered tree and start over. Some threads are at higher depths instead, but do not get the chance to finish in their iteration, but they could profit from a lucky transposition for instance or do some groundbreaking work for later iterations. Also, I think I saw the thread with the best result is chosen at the end in Stockfish, not just the highest completed depth. That has nothing to do with alpha beta anymore :P
I'll go with what you say man. :) You know way more than me.