parallel search speed measurement

bob · Post by **bob** » Wed May 25, 2016 12:08 am

I have been working on this a while, and now almost have a full set of 1/2/4/8/16/20 core runs on the IBM power-8 box. Here's what I have decided to do:

First, I ran a 20 thread test 4 times, and looked to see how deep the search went in 10 minutes. I then ran the 1 cpu test with that depth limit set.

My first set of 20 cpu data is broken up into three parts. The first has an upper limit of 180 seconds per move. I take the last iteration completed before 180 seconds were burned, and then compute the speedup by taking the same iteration depth from the 1 cpu log. If you are interested, the four runs produced speedups of 16.4, 17.4, 16.0, 16.1.

Next I did the same but limiting the 20 core test to 30 seconds per move. I again took the matching iteration time from the 1 thread data and computed the speedups, which were 10.8, 10.9, 10.8, 10.3.

Finally I did the same but limiting the test to 5 seconds per move. These numbers were 11.1, 10.8, 10.9, 12.4.

There is an obvious trend where longer searches produce better speedups, but I was quite happy to see that the 5 second searches are STILL doing pretty well, averaging around 11x compared to 16x for the long searches. I figured this was going to be "the good, the bad and the ugly" but in reality it turned into "the good and the fair" (I consider a speedup of 11x not so good on a 20 core box, of course.)

I spent the day writing a C program to extract all this data from the logs, so when I finish with this, I will have the same numbers for 2, 4, 8 and 16 cores as well. More latter.

BTW, for the other tests, I am going to use the same methodology, namely to set time limits of 5, 30 and 180 seconds for the parallel version, then compare those to the equivalent 1 thread times. This will measure actual usable speedup with fewer cores, rather than making ALL the tests search to the same extreme depth that the 20 core runs reach...

Dann Corbit · Post by **Dann Corbit** » Wed May 25, 2016 12:19 am

I would be interested to see a problem solving test done in parallel with time allotted inversely proportional to threads.

E.g. given 100 very hard problems (e.g. Der brillante Schachzug 100 problems)

Give 100 problems 32 seconds each with 1 thread
Give 100 problems 16 seconds each with 2 threads
Give 100 problems 8 seconds each with 4 threads
Give 100 problems 4 seconds each with 8 threads
Give 100 problems 2 seconds each with 16 threads
Give 100 problems 1 seconds each with 32 threads (or whatever you have available)

About one hour per test for the 1 thread and 32 seconds for the 32 threads.

In theory, the smaller thread count would do better because of having no SMP loss. But I suspect that the heavy thread counts would do better in the same time.

The reason is that I suspect parallel searching may actually find the answers faster due to SMP uncertainty, but it is only a guess.

Vinvin · Post by **Vinvin** » Wed May 25, 2016 12:48 am

Dann Corbit wrote:E.g. given 100 very hard problems (e.g. Der brillante Schachzug 100 problems)

Solutions are probably too hard to find in the set.
Arasan Testsuite (version 19) is better : http://www.arasanchess.org/testsuite.shtml

Dann Corbit · Post by **Dann Corbit** » Wed May 25, 2016 12:56 am

The difficulty was intentional.
If one thread finds the solution in 10 seconds, then there will be nothing to learn. I chose a set intentionally where well under 50% will be solved in the allotted time.
https://www.dropbox.com/s/4ghf4nimymx3b ... l.epd?dl=0

lauriet · Post by **lauriet** » Wed May 25, 2016 1:13 am

Arent you afraid that with all this processing power you may solve chess one day soon ?
Then you will have nothing left to do.

Laurie

lauriet · Post by **lauriet** » Wed May 25, 2016 1:13 am

Arent you afraid that with all this processing power you may solve chess one day soon ?
Then you will have nothing left to do.

Laurie

bob · Post by **bob** » Wed May 25, 2016 1:22 am

lauriet wrote:Arent you afraid that with all this processing power you may solve chess one day soon ?
Then you will have nothing left to do.

Laurie

Maybe in a few million years...

Vinvin · Post by **Vinvin** » Thu May 26, 2016 5:54 am

Dann Corbit wrote:The difficulty was intentional.
If one thread finds the solution in 10 seconds, then there will be nothing to learn. I chose a set intentionally where well under 50% will be solved in the allotted time.
https://www.dropbox.com/s/4ghf4nimymx3b ... l.epd?dl=0

A lot of positions are not easy in the Arasan Testsuite (version 19) : http://www.talkchess.com/forum/viewtopi ... 859#670859
In the Der brillante Schachzug 100 problems, a lot of positions are way too hard for a non top engine.

Dann Corbit · Post by **Dann Corbit** » Thu May 26, 2016 10:05 am

Vinvin wrote:
Dann Corbit wrote:The difficulty was intentional.
If one thread finds the solution in 10 seconds, then there will be nothing to learn. I chose a set intentionally where well under 50% will be solved in the allotted time.
https://www.dropbox.com/s/4ghf4nimymx3b ... l.epd?dl=0
A lot of positions are not easy in the Arasan Testsuite (version 19) : http://www.talkchess.com/forum/viewtopi ... 859#670859
In the Der brillante Schachzug 100 problems, a lot of positions are way too hard for a non top engine.

That's the whole idea, all of them are terrifically hard.
Given about an hour of compute time, an engine like crafty will solve a significant number of them.
I would be curious to see how solve rate changes when the energy is split in a parallel manner.

But lacking that sort of hardware, it would take a lot of effort to perform the test on my kind of hardware.

The Arasan test does have ten or so very tough problems, but it also has a share of problems that are good and easy.
I would prefer uniform difficulty (and in the case of DBS100, it is uniformly horrendously difficult).

Laskos · Post by **Laskos** » Thu May 26, 2016 5:27 pm

Dann Corbit wrote:I would be interested to see a problem solving test done in parallel with time allotted inversely proportional to threads.

E.g. given 100 very hard problems (e.g. Der brillante Schachzug 100 problems)

Give 100 problems 32 seconds each with 1 thread
Give 100 problems 16 seconds each with 2 threads
Give 100 problems 8 seconds each with 4 threads
Give 100 problems 4 seconds each with 8 threads
Give 100 problems 2 seconds each with 16 threads
Give 100 problems 1 seconds each with 32 threads (or whatever you have available)

About one hour per test for the 1 thread and 32 seconds for the 32 threads.

In theory, the smaller thread count would do better because of having no SMP loss. But I suspect that the heavy thread counts would do better in the same time.

The reason is that I suspect parallel searching may actually find the answers faster due to SMP uncertainty, but it is only a guess.

Bob is testing time-to-depth, not time-to-solution or number of solutions. If the engine doesn't widen or narrowing its search, like Crafty 25, then time-to-depth is the best and easiest measure. If it uses, for example widening Lazy-SMP, like Komodo or Stockfish, then your proposal is interesting. I considered it since awhile, because it's easier to use a testsuite than to run many multithreaded games at long time control. Based on Arasan 19 testsuite with 1 minute per position on 1 thread, I concluded a rather low speedup 1 -> 4 cores of 2.6 for Komodo and 2.4 for Stockfish dev. I am not sure why the numbers are so low. Even Arasan 19 seems too hard for Crafty at say 1 minute 1 thread test, it solves only about 20% of positions. In fact, I am amazed at how weak Crafty 25, a year 2016 formerly top engine, tactically is. On WM testusite is solves about 50% of positions n 1 minute 1-thread search, worse than Shredder 8 of 2003. But the parallel search in Crafty 25 seems to work well, on time-to-depth I am getting about 3.2 speedup 1 -> 4 cores, better than Komodo with 2.6 and Stockfish dev. with 2.4 on Arasan 19.

parallel search speed measurement

parallel search speed measurement

Re: parallel search speed measurement

Re: parallel search speed measurement

Re: parallel search speed measurement

Re: parallel search speed measurement

Re: parallel search speed measurement

Re: parallel search speed measurement

Re: parallel search speed measurement

Re: parallel search speed measurement

Re: parallel search speed measurement