How to measure Parallel Search Speedup?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mfournier

How to measure Parallel Search Speedup?

Post by mfournier »

I'm working on the parallel search of my engine even though my engine is not very strong and I should perhaps first concentrate on improving evaluation and search functions but, anyway, I find parallel search an interesting topic.

I would like to know how do you measure the speedup of your parallel search.

I was thinking about running a fixed-depth benchmark on 5-6 positions, repeat it 3 times and then make the average of the times needed to complete it.
Could this be a reasonable approach?

Of course, the choice of the good positions would be crucial. Which positions do you suggest?
User avatar
Ajedrecista
Posts: 1971
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: How to measure parallel search speedup?

Post by Ajedrecista »

Hello Marcel:
mfournier wrote:I'm working on the parallel search of my engine even though my engine is not very strong and I should perhaps first concentrate on improving evaluation and search functions but, anyway, I find parallel search an interesting topic.

I would like to know how do you measure the speedup of your parallel search.

I was thinking about running a fixed-depth benchmark on 5-6 positions, repeat it 3 times and then make the average of the times needed to complete it.
Could this be a reasonable approach?

Of course, the choice of the good positions would be crucial. Which positions do you suggest?
Welcome to TalkChess! I am not a programmer, so disgracefully I can not answer your question about measuring parallel search speedup.

Houdini is currently the number 1 rated engine in lists where it appears: it has a command called 'autotune' for adjusting a parameter called 'split depth'. This command chooses four positions that can be of your interest:

Re: Houdini 1.5a Autotune

Code: Select all

r2qk2r/1b1n1ppp/1P1p1b2/pBpP4/P5Q1/2N5/1P3PPP/R3K1NR b KQkq - 0 1
5rk1/p2qr1b1/1Pn3pp/2pn4/2Np1p2/1N1P2P1/1P1BPP1P/R2QR1K1 b - - 0 1
rqb2rk1/p4p1p/2pb1np1/3pn3/NP1B1N2/P3P3/2Q1BPPP/R4RK1 w - - 0 1
r4rk1/p1qbnppp/1pn1p3/2ppP3/P2P4/2PB1N2/2P2PPP/R1BQ1RK1 w - - 6 1
Robert Houdart is the author of Houdini and a member of this forum; he can give an explanation of why he choosed these four positions. I hope that you will find those positions useful. By the way, can you give a rough estimate of the strength of your engine? Thanks in advance. I wish you very good luck.

Regards from Spain.

Ajedrecista.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: How to measure Parallel Search Speedup?

Post by Daniel Shawul »

In many other parallel applications speed up is measured the amount of work done by p processors divided by that done by p processors in a given time. But that will not be incitefull for chess because parallel search incurs lots of overheads. One usually takes ratios of time to complete a given work instead.

Nps Scaling = (NPS of p processors) / (NPS of 1 processor)

"Real" Time Scaling = (Time by 1 processor) / (Time by p processors)

You can sometimes get a super linear speed up mostly due to cache effects. That is usually how super linear speed up is explained in literature. For <=4 porcessors I usually get superlinear nps scaling.

Super linear time scaling on the other hand involves some chance. When you do parallel alpha-beta, a parallel search could sometimes find cutoffs quickly. So the algorithm is modified in a way. So depending on the alpha beta behaviour you can get super linear speed up there as well. Also somtimes you may find p processors taking more time than 1 processor due to locking and other factors.

Daniel
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: How to measure Parallel Search Speedup?

Post by jdart »

I have some sampling code that runs during the search and periodically tests how many threads are occupied and not in a wait state. It prints out an average % usage figure at the end of the search. So for 4 threads this should be close to 400% but it usually is 375 or something like that, partly because at low depths the search won't be split.

(This should also closely approximate the system thread usage report available from Windows or from Linux via the w command).

--Jon
mfournier

Re: How to measure Parallel Search Speedup?

Post by mfournier »

Thanks everyone for your answers.

But what I wanted to know was how to measure the efficiency (in time-to-depth) of a certain parallel search algorithm minimizing the effects of parallel search randomness.

Is running a benchmark for eg 3 times and then taking the average time enough?

I plan to experiment with ABDADA, PVS, YBW (though I already know YBW is the best among those three).

Thanks.
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: How to measure Parallel Search Speedup?

Post by Joerg Oster »

Well, from my experiences with Stockfish I would say 3 runs are an absolute minimum. Better do 5 runs (or more) and then delete the best and the worst run and take the average of the remaining ones.
Reason: you will most likely get an exceptional good run and a worse with different settings.
Jörg Oster
mfournier

Re: How to measure Parallel Search Speedup?

Post by mfournier »

Joerg Oster wrote:Well, from my experiences with Stockfish I would say 3 runs are an absolute minimum. Better do 5 runs (or more) and then delete the best and the worst run and take the average of the remaining ones.
Reason: you will most likely get an exceptional good run and a worse with different settings.
I will do something like that. Thanks.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: How to measure Parallel Search Speedup?

Post by diep »

mfournier wrote:I'm working on the parallel search of my engine even though my engine is not very strong and I should perhaps first concentrate on improving evaluation and search functions but, anyway, I find parallel search an interesting topic.

I would like to know how do you measure the speedup of your parallel search.

I was thinking about running a fixed-depth benchmark on 5-6 positions, repeat it 3 times and then make the average of the times needed to complete it.
Could this be a reasonable approach?

Of course, the choice of the good positions would be crucial. Which positions do you suggest?
Using exactly same settings at 1 core and at n cores,
Run 1000 positions to depth X, which takes a while to reach, say a minute or 2.

Then average and calculate the error you had in your speedup for n cores.

Simple.