question on performance of DTS

Discussion of chess software programming and technical issues.

Moderator: Ras

liuzy

question on performance of DTS

Post by liuzy »

I found this table in Bob's website.
+---------------+-----+-----+-----+-----+------+
|# processors | 1 | 2 | 4 | 8 | 16 |
+---------------+-----+-----+-----+-----+------+
|speedup | 1.0 | 2.0 | 3.7 | 6.6 | 11.1 |
+---------------+-----+-----+-----+-----+------+

Where can I find such data of stockfish and recent crafty?
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: question on performance of DTS

Post by mcostalba »

liuzy wrote:I found this table in Bob's website.
+---------------+-----+-----+-----+-----+------+
|# processors | 1 | 2 | 4 | 8 | 16 |
+---------------+-----+-----+-----+-----+------+
|speedup | 1.0 | 2.0 | 3.7 | 6.6 | 11.1 |
+---------------+-----+-----+-----+-----+------+

Where can I find such data of stockfish ?
Nowhere :-)

Nobody has ever built up such a table for SF, as far as I know.


BTW although I concede that such a table, based on nodes/sec on a given hardware, has some validity I also think could be misleading because does not reflect a corresponding ELO increase speed-up.

Indeed ELO is much more complex then counting nodes, because it is important not only how many nodes you calculate but which nodes you calculate.

For instance you could have a SMP implementation with a very good speed up but with a slow system to stop the running threads when some of them finds a cut-off. In this case it is all to be demonstrated that such an implementation is better then a one with smaller speed up but more advanced inter-threads synchronization.

This is just an example, there are others, from where I made up my mind that such a tables are very misleading and naive, just like compare CPU of different architectures only by their clock frequency. :-)
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: question on performance of DTS

Post by Daniel Shawul »

I think you got it wrong. The numbers mentioned are time to complete a task (f.i fixed depth = 20), and not nps scaling.
To the OP : Neither stockfish nor crafty do DTS. I know ZCT uses it.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: question on performance of DTS

Post by mcostalba »

Daniel Shawul wrote:I think you got it wrong. The numbers mentioned are time to complete a task (f.i fixed depth = 20), and not nps scaling.
To the OP : Neither stockfish nor crafty do DTS. I know ZCT uses it.
Ok. Sorry for the noise then ;-)
liuzy

Re: question on performance of DTS

Post by liuzy »

Marco Costalba, can you do some test for stockfish.
I can not test it myself, because I don't have 16 cores CPU.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: question on performance of DTS

Post by mcostalba »

liuzy wrote:Marco Costalba, can you do some test for stockfish.
I can not test it myself, because I don't have 16 cores CPU.
I have neither.
liuzy

Re: question on performance of DTS

Post by liuzy »

Why stockfish and crafty don't use DTS since its performance is very good.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: question on performance of DTS

Post by bob »

liuzy wrote:I found this table in Bob's website.
+---------------+-----+-----+-----+-----+------+
|# processors | 1 | 2 | 4 | 8 | 16 |
+---------------+-----+-----+-----+-----+------+
|speedup | 1.0 | 2.0 | 3.7 | 6.6 | 11.1 |
+---------------+-----+-----+-----+-----+------+

Where can I find such data of stockfish and recent crafty?
Crafty data has been posted on CCC several times. Unfortunately, most of the testing has been with 8 cores, although I have posted some 16 core data. In general, the current Crafty data is worse for smaller numbers of processors, but is reasonably close for 16 (last run on 16 cores I had was actually 11.5x, but the numbers are hard to compare where CB was doing 10 ply searches and Crafty is well beyond 20, which helps parallel search (deeper = better, particularly if you split at the root).

I hardly ever see anyone else post parallel speedup data. I think there is some raw 8 cpu data (just large log files for various numbers of processsors all tested on exactly the same set of positions that came from the CB DTS paper.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: question on performance of DTS

Post by bob »

mcostalba wrote:
liuzy wrote:I found this table in Bob's website.
+---------------+-----+-----+-----+-----+------+
|# processors | 1 | 2 | 4 | 8 | 16 |
+---------------+-----+-----+-----+-----+------+
|speedup | 1.0 | 2.0 | 3.7 | 6.6 | 11.1 |
+---------------+-----+-----+-----+-----+------+

Where can I find such data of stockfish ?
Nowhere :-)

Nobody has ever built up such a table for SF, as far as I know.


BTW although I concede that such a table, based on nodes/sec on a given hardware, has some validity I also think could be misleading because does not reflect a corresponding ELO increase speed-up.
So if a program runs 1.7x faster on 2 processors, that won't affect the Elo the same way as running on a single CPU that is 1.7x faster? None of my speedup data is about NPS. It is all about time to a specific depth, which is a real performance measurement that does predict Elo accurately.


Indeed ELO is much more complex then counting nodes, because it is important not only how many nodes you calculate but which nodes you calculate.
Now if only the data he gave was counting nodes. But it wasn't. :)



For instance you could have a SMP implementation with a very good speed up but with a slow system to stop the running threads when some of them finds a cut-off. In this case it is all to be demonstrated that such an implementation is better then a one with smaller speed up but more advanced inter-threads synchronization.
That makes absolutely no sense to anyone familiar with parallel search. Time-to-depth is comparable to time-to-depth, whether the search is done in parallel or on faster hardware. We are _not_ measuring raw NPS and using that. If I did, both CB and Crafty would weigh in with a 16x speedup on 16 processors. But we don't measure parallel performance in such a flawed way. Never have, in fact. It is useful to compare NPS to see how much performance is lost to pure parallel issues (such as cache coherency, memory conflicts, etc) but we don't consider any of that when reporting parallel speed-up. At least no one I know of does.

This is just an example, there are others, from where I made up my mind that such a tables are very misleading and naive, just like compare CPU of different architectures only by their clock frequency. :-)
You are _very_ naive to make that statement about something being naive. :)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: question on performance of DTS

Post by bob »

liuzy wrote:Why stockfish and crafty don't use DTS since its performance is very good.
Current Crafty is pretty good as well. DTS eliminates recursive search, which I didn't want to give up. So I implemented YBW in a way that is fairly close to DTS, but without having to go to a pure iterative (non-recursive) search. I may do this one day, but the recursive search certainly is cleaner and easier to understand.