parallel search speed measurement

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

parallel search speed measurement

Post by bob »

I have been working on this a while, and now almost have a full set of 1/2/4/8/16/20 core runs on the IBM power-8 box. Here's what I have decided to do:

First, I ran a 20 thread test 4 times, and looked to see how deep the search went in 10 minutes. I then ran the 1 cpu test with that depth limit set.

My first set of 20 cpu data is broken up into three parts. The first has an upper limit of 180 seconds per move. I take the last iteration completed before 180 seconds were burned, and then compute the speedup by taking the same iteration depth from the 1 cpu log. If you are interested, the four runs produced speedups of 16.4, 17.4, 16.0, 16.1.

Next I did the same but limiting the 20 core test to 30 seconds per move. I again took the matching iteration time from the 1 thread data and computed the speedups, which were 10.8, 10.9, 10.8, 10.3.

Finally I did the same but limiting the test to 5 seconds per move. These numbers were 11.1, 10.8, 10.9, 12.4.

There is an obvious trend where longer searches produce better speedups, but I was quite happy to see that the 5 second searches are STILL doing pretty well, averaging around 11x compared to 16x for the long searches. I figured this was going to be "the good, the bad and the ugly" but in reality it turned into "the good and the fair" (I consider a speedup of 11x not so good on a 20 core box, of course.)

I spent the day writing a C program to extract all this data from the logs, so when I finish with this, I will have the same numbers for 2, 4, 8 and 16 cores as well. More latter.

BTW, for the other tests, I am going to use the same methodology, namely to set time limits of 5, 30 and 180 seconds for the parallel version, then compare those to the equivalent 1 thread times. This will measure actual usable speedup with fewer cores, rather than making ALL the tests search to the same extreme depth that the 20 core runs reach...
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: parallel search speed measurement

Post by Dann Corbit »

I would be interested to see a problem solving test done in parallel with time allotted inversely proportional to threads.

E.g. given 100 very hard problems (e.g. Der brillante Schachzug 100 problems)

Give 100 problems 32 seconds each with 1 thread
Give 100 problems 16 seconds each with 2 threads
Give 100 problems 8 seconds each with 4 threads
Give 100 problems 4 seconds each with 8 threads
Give 100 problems 2 seconds each with 16 threads
Give 100 problems 1 seconds each with 32 threads (or whatever you have available)

About one hour per test for the 1 thread and 32 seconds for the 32 threads.

In theory, the smaller thread count would do better because of having no SMP loss. But I suspect that the heavy thread counts would do better in the same time.

The reason is that I suspect parallel searching may actually find the answers faster due to SMP uncertainty, but it is only a guess.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: parallel search speed measurement

Post by Vinvin »

Dann Corbit wrote:E.g. given 100 very hard problems (e.g. Der brillante Schachzug 100 problems)
Solutions are probably too hard to find in the set.
Arasan Testsuite (version 19) is better : http://www.arasanchess.org/testsuite.shtml
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: parallel search speed measurement

Post by Dann Corbit »

The difficulty was intentional.
If one thread finds the solution in 10 seconds, then there will be nothing to learn. I chose a set intentionally where well under 50% will be solved in the allotted time.
https://www.dropbox.com/s/4ghf4nimymx3b ... l.epd?dl=0
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
lauriet
Posts: 199
Joined: Sun Nov 03, 2013 9:32 am

Re: parallel search speed measurement

Post by lauriet »

Arent you afraid that with all this processing power you may solve chess one day soon ?
Then you will have nothing left to do.

Laurie
lauriet
Posts: 199
Joined: Sun Nov 03, 2013 9:32 am

Re: parallel search speed measurement

Post by lauriet »

Arent you afraid that with all this processing power you may solve chess one day soon ?
Then you will have nothing left to do.

Laurie
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: parallel search speed measurement

Post by bob »

lauriet wrote:Arent you afraid that with all this processing power you may solve chess one day soon ?
Then you will have nothing left to do.

Laurie
Maybe in a few million years...
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: parallel search speed measurement

Post by Vinvin »

Dann Corbit wrote:The difficulty was intentional.
If one thread finds the solution in 10 seconds, then there will be nothing to learn. I chose a set intentionally where well under 50% will be solved in the allotted time.
https://www.dropbox.com/s/4ghf4nimymx3b ... l.epd?dl=0
A lot of positions are not easy in the Arasan Testsuite (version 19) : http://www.talkchess.com/forum/viewtopi ... 859#670859
In the Der brillante Schachzug 100 problems, a lot of positions are way too hard for a non top engine.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: parallel search speed measurement

Post by Dann Corbit »

Vinvin wrote:
Dann Corbit wrote:The difficulty was intentional.
If one thread finds the solution in 10 seconds, then there will be nothing to learn. I chose a set intentionally where well under 50% will be solved in the allotted time.
https://www.dropbox.com/s/4ghf4nimymx3b ... l.epd?dl=0
A lot of positions are not easy in the Arasan Testsuite (version 19) : http://www.talkchess.com/forum/viewtopi ... 859#670859
In the Der brillante Schachzug 100 problems, a lot of positions are way too hard for a non top engine.
That's the whole idea, all of them are terrifically hard.
Given about an hour of compute time, an engine like crafty will solve a significant number of them.
I would be curious to see how solve rate changes when the energy is split in a parallel manner.

But lacking that sort of hardware, it would take a lot of effort to perform the test on my kind of hardware.

The Arasan test does have ten or so very tough problems, but it also has a share of problems that are good and easy.
I would prefer uniform difficulty (and in the case of DBS100, it is uniformly horrendously difficult).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: parallel search speed measurement

Post by Laskos »

Dann Corbit wrote:I would be interested to see a problem solving test done in parallel with time allotted inversely proportional to threads.

E.g. given 100 very hard problems (e.g. Der brillante Schachzug 100 problems)

Give 100 problems 32 seconds each with 1 thread
Give 100 problems 16 seconds each with 2 threads
Give 100 problems 8 seconds each with 4 threads
Give 100 problems 4 seconds each with 8 threads
Give 100 problems 2 seconds each with 16 threads
Give 100 problems 1 seconds each with 32 threads (or whatever you have available)

About one hour per test for the 1 thread and 32 seconds for the 32 threads.

In theory, the smaller thread count would do better because of having no SMP loss. But I suspect that the heavy thread counts would do better in the same time.

The reason is that I suspect parallel searching may actually find the answers faster due to SMP uncertainty, but it is only a guess.
Bob is testing time-to-depth, not time-to-solution or number of solutions. If the engine doesn't widen or narrowing its search, like Crafty 25, then time-to-depth is the best and easiest measure. If it uses, for example widening Lazy-SMP, like Komodo or Stockfish, then your proposal is interesting. I considered it since awhile, because it's easier to use a testsuite than to run many multithreaded games at long time control. Based on Arasan 19 testsuite with 1 minute per position on 1 thread, I concluded a rather low speedup 1 -> 4 cores of 2.6 for Komodo and 2.4 for Stockfish dev. I am not sure why the numbers are so low. Even Arasan 19 seems too hard for Crafty at say 1 minute 1 thread test, it solves only about 20% of positions. In fact, I am amazed at how weak Crafty 25, a year 2016 formerly top engine, tactically is. On WM testusite is solves about 50% of positions n 1 minute 1-thread search, worse than Shredder 8 of 2003. But the parallel search in Crafty 25 seems to work well, on time-to-depth I am getting about 3.2 speedup 1 -> 4 cores, better than Komodo with 2.6 and Stockfish dev. with 2.4 on Arasan 19.