Page 1 of 1

How to measure Parallel Search Speedup?

Posted: Thu Aug 23, 2012 5:53 pm
by mfournier
I'm working on the parallel search of my engine even though my engine is not very strong and I should perhaps first concentrate on improving evaluation and search functions but, anyway, I find parallel search an interesting topic.

I would like to know how do you measure the speedup of your parallel search.

I was thinking about running a fixed-depth benchmark on 5-6 positions, repeat it 3 times and then make the average of the times needed to complete it.
Could this be a reasonable approach?

Of course, the choice of the good positions would be crucial. Which positions do you suggest?

Re: How to measure parallel search speedup?

Posted: Thu Aug 23, 2012 7:05 pm
by Ajedrecista
Hello Marcel:
mfournier wrote:I'm working on the parallel search of my engine even though my engine is not very strong and I should perhaps first concentrate on improving evaluation and search functions but, anyway, I find parallel search an interesting topic.

I would like to know how do you measure the speedup of your parallel search.

I was thinking about running a fixed-depth benchmark on 5-6 positions, repeat it 3 times and then make the average of the times needed to complete it.
Could this be a reasonable approach?

Of course, the choice of the good positions would be crucial. Which positions do you suggest?
Welcome to TalkChess! I am not a programmer, so disgracefully I can not answer your question about measuring parallel search speedup.

Houdini is currently the number 1 rated engine in lists where it appears: it has a command called 'autotune' for adjusting a parameter called 'split depth'. This command chooses four positions that can be of your interest:

Re: Houdini 1.5a Autotune

Code: Select all

r2qk2r/1b1n1ppp/1P1p1b2/pBpP4/P5Q1/2N5/1P3PPP/R3K1NR b KQkq - 0 1
5rk1/p2qr1b1/1Pn3pp/2pn4/2Np1p2/1N1P2P1/1P1BPP1P/R2QR1K1 b - - 0 1
rqb2rk1/p4p1p/2pb1np1/3pn3/NP1B1N2/P3P3/2Q1BPPP/R4RK1 w - - 0 1
r4rk1/p1qbnppp/1pn1p3/2ppP3/P2P4/2PB1N2/2P2PPP/R1BQ1RK1 w - - 6 1
Robert Houdart is the author of Houdini and a member of this forum; he can give an explanation of why he choosed these four positions. I hope that you will find those positions useful. By the way, can you give a rough estimate of the strength of your engine? Thanks in advance. I wish you very good luck.

Regards from Spain.

Ajedrecista.

Re: How to measure Parallel Search Speedup?

Posted: Thu Aug 23, 2012 8:31 pm
by Daniel Shawul
In many other parallel applications speed up is measured the amount of work done by p processors divided by that done by p processors in a given time. But that will not be incitefull for chess because parallel search incurs lots of overheads. One usually takes ratios of time to complete a given work instead.

Nps Scaling = (NPS of p processors) / (NPS of 1 processor)

"Real" Time Scaling = (Time by 1 processor) / (Time by p processors)

You can sometimes get a super linear speed up mostly due to cache effects. That is usually how super linear speed up is explained in literature. For <=4 porcessors I usually get superlinear nps scaling.

Super linear time scaling on the other hand involves some chance. When you do parallel alpha-beta, a parallel search could sometimes find cutoffs quickly. So the algorithm is modified in a way. So depending on the alpha beta behaviour you can get super linear speed up there as well. Also somtimes you may find p processors taking more time than 1 processor due to locking and other factors.

Daniel

Re: How to measure Parallel Search Speedup?

Posted: Tue Aug 28, 2012 4:20 am
by jdart
I have some sampling code that runs during the search and periodically tests how many threads are occupied and not in a wait state. It prints out an average % usage figure at the end of the search. So for 4 threads this should be close to 400% but it usually is 375 or something like that, partly because at low depths the search won't be split.

(This should also closely approximate the system thread usage report available from Windows or from Linux via the w command).

--Jon

Re: How to measure Parallel Search Speedup?

Posted: Tue Aug 28, 2012 1:24 pm
by mfournier
Thanks everyone for your answers.

But what I wanted to know was how to measure the efficiency (in time-to-depth) of a certain parallel search algorithm minimizing the effects of parallel search randomness.

Is running a benchmark for eg 3 times and then taking the average time enough?

I plan to experiment with ABDADA, PVS, YBW (though I already know YBW is the best among those three).

Thanks.

Re: How to measure Parallel Search Speedup?

Posted: Tue Aug 28, 2012 3:10 pm
by Joerg Oster
Well, from my experiences with Stockfish I would say 3 runs are an absolute minimum. Better do 5 runs (or more) and then delete the best and the worst run and take the average of the remaining ones.
Reason: you will most likely get an exceptional good run and a worse with different settings.

Re: How to measure Parallel Search Speedup?

Posted: Wed Aug 29, 2012 9:33 am
by mfournier
Joerg Oster wrote:Well, from my experiences with Stockfish I would say 3 runs are an absolute minimum. Better do 5 runs (or more) and then delete the best and the worst run and take the average of the remaining ones.
Reason: you will most likely get an exceptional good run and a worse with different settings.
I will do something like that. Thanks.

Re: How to measure Parallel Search Speedup?

Posted: Wed Aug 29, 2012 12:06 pm
by diep
mfournier wrote:I'm working on the parallel search of my engine even though my engine is not very strong and I should perhaps first concentrate on improving evaluation and search functions but, anyway, I find parallel search an interesting topic.

I would like to know how do you measure the speedup of your parallel search.

I was thinking about running a fixed-depth benchmark on 5-6 positions, repeat it 3 times and then make the average of the times needed to complete it.
Could this be a reasonable approach?

Of course, the choice of the good positions would be crucial. Which positions do you suggest?
Using exactly same settings at 1 core and at n cores,
Run 1000 positions to depth X, which takes a while to reach, say a minute or 2.

Then average and calculate the error you had in your speedup for n cores.

Simple.