Stockfish and optimizing options for SMP search

Joerg Oster · Post by **Joerg Oster** » Sun May 08, 2011 11:32 am

Several times I tried to find better settings for Stockfish's UCI options "Maximum Number of Threads per Split Point" (MNTpSP) and "Minimum Split Depth" (MSD) for a Quadcore CPU. Assuming the default values may not be the best, which may be a wrong assumption btw. Unfortunately, I was unable to find a convincing better setting so far.

Just as a reminder: Louis Zulli found some better setting for an 8core system (see here http://www.talkchess.com/forum/viewtopi ... 16&t=31388), and after the release of 2.1, I thought why not give it another try.

My first test is to find the optimal setting for MNTpSP. Stockfish bench command (stockfish bench 512 4 22) over 50! different opening positions, 5 runs, 512 MB Hash, depth 22. MNTpSP from 2 to 8, MSD=4 (fix)
My system: Q6600@3,16GHz, 8 GB Ram, OS LinuxMint 10 64bit

Code: Select all

MNTpSP/MSD     2_4            3_4            4_4            5_4            6_4            7_4            8_4

sec            922            895           1089            962           1018            888           1034
nodes/sec    3732950        3828857        3878249        3869189        3880205        3867726        3868531

               957            951           1013            936           1038           1153            899
             3724098        3842708        3872926        3865703        3885754        3895055        3865837

               884           1024           1037           1015            965           1079           1075
             3716748        3838886        3869848        3884031        3869045        3879746        3872916

               982           1052           1055            904           1004            957           1033
             3740794        3858635        3879112        3868528        3891238        3883996        3878028

               891            979            945            940            996           1073            998
             3717818        3836214        3862928        3868798        3879267        3879151        3875439

------------------------------------------------------------------------------------------------------------------

Average        927            980           1029            951           1004           1030           1008
             3726482        3841060        3872613        3871250        3881102        3881135        3872150

MNTpSP=2 is the fastest overall, but we also see a slight drop in nps. And though quite fast, I have some doubts that this is the best setting for MNTpSP.
Second best is MNTpSP=5.
Should I refine the results by doing some 2 or 3 additional runs?

My next test will try to determine the best value for MSD.
Same testing scheme, with 50 opening positions and 50 endgame positions. Of course, with a different search depth for the endgame positions.
But which value(s) for MNTpSP should I test? Which one would you choose? Only 5 or 2 as well? Or 3 and 5?

Next question is what are reasonable numbers for MSD?
Which leads me to another question.
In thread.cpp line 112, MSD is muliplied with ONE_PLY. So I get MSD=10 when I set MSD=5 in uci option. Is that right? (Well, multiplying with 1 wouldn't make much sense here, right?)
Hence we only get even nunbers. Is there a reason for doing this? Maybe MSD=7 or 9 would do better. Who knows?

Any advices/comments are welcome.

mcostalba · Post by **mcostalba** » Sun May 08, 2011 1:07 pm

Joerg Oster wrote: Any advices/comments are welcome.

Because you have tested with 4 threads (stockfish bench 512 4 22) any MNTpSP >= 4 is equivalent because in any case all the available threads will be used, and this says a lot on the noise level of this tests that admittedly is very difficult.

At first glance I would say that MNTpSP is not such a sensible parameter. IMHO MSD is more important and could yield to a larger performance span. I would suggest to test with MSD 4, 6, 7 and with MNTpSP 2 and 4. If you want to tweak the sources you can increase MSD to say 12 or 16 (in ucioption.cpp overwrite following line with:

Code: Select all

o&#91;"Minimum Split Depth"&#93; = UCIOption&#40;4, 4, 16&#41;;

So to increase settable max split depth limit to 16.

bob · Post by **bob** » Sun May 08, 2011 5:21 pm

Joerg Oster wrote:Several times I tried to find better settings for Stockfish's UCI options "Maximum Number of Threads per Split Point" (MNTpSP) and "Minimum Split Depth" (MSD) for a Quadcore CPU. Assuming the default values may not be the best, which may be a wrong assumption btw. Unfortunately, I was unable to find a convincing better setting so far.

Just as a reminder: Louis Zulli found some better setting for an 8core system (see here http://www.talkchess.com/forum/viewtopi ... 16&t=31388), and after the release of 2.1, I thought why not give it another try.

My first test is to find the optimal setting for MNTpSP. Stockfish bench command (stockfish bench 512 4 22) over 50! different opening positions, 5 runs, 512 MB Hash, depth 22. MNTpSP from 2 to 8, MSD=4 (fix)
My system: Q6600@3,16GHz, 8 GB Ram, OS LinuxMint 10 64bit
Code: Select all
MNTpSP/MSD     2_4            3_4            4_4            5_4            6_4            7_4            8_4

sec            922            895           1089            962           1018            888           1034
nodes/sec    3732950        3828857        3878249        3869189        3880205        3867726        3868531

               957            951           1013            936           1038           1153            899
             3724098        3842708        3872926        3865703        3885754        3895055        3865837

               884           1024           1037           1015            965           1079           1075
             3716748        3838886        3869848        3884031        3869045        3879746        3872916

               982           1052           1055            904           1004            957           1033
             3740794        3858635        3879112        3868528        3891238        3883996        3878028

               891            979            945            940            996           1073            998
             3717818        3836214        3862928        3868798        3879267        3879151        3875439

------------------------------------------------------------------------------------------------------------------

Average        927            980           1029            951           1004           1030           1008
             3726482        3841060        3872613        3871250        3881102        3881135        3872150
MNTpSP=2 is the fastest overall, but we also see a slight drop in nps. And though quite fast, I have some doubts that this is the best setting for MNTpSP.
Second best is MNTpSP=5.
Should I refine the results by doing some 2 or 3 additional runs?

My next test will try to determine the best value for MSD.
Same testing scheme, with 50 opening positions and 50 endgame positions. Of course, with a different search depth for the endgame positions.
But which value(s) for MNTpSP should I test? Which one would you choose? Only 5 or 2 as well? Or 3 and 5?

Next question is what are reasonable numbers for MSD?
Which leads me to another question.
In thread.cpp line 112, MSD is muliplied with ONE_PLY. So I get MSD=10 when I set MSD=5 in uci option. Is that right? (Well, multiplying with 1 wouldn't make much sense here, right?)
Hence we only get even nunbers. Is there a reason for doing this? Maybe MSD=7 or 9 would do better. Who knows?

Any advices/comments are welcome.

First, completely ignore NPS. That's irrelevant in this context. You want the shortest time.

Second, 50 positions, 1000 seconds average? 20 seconds per position? You probably need to search longer. Shallow searches are full of noise. Get up into the 60 seconds per position minimum.

Third, make sure each position takes about the same time. Searching them all to the same fixed depth is no good. What if 10 of them are searched in 1 second, and one takes 100? If you treat those all as equal, you let that long search get lost when averaged equally with the 10 short ones. The short ones are probably not very reliable, yet they count more due to averaging.

Fourth, this number is probably not best represented as a constant (threads per split). This was a Crafty idea, from the beginning, and everyone seems to have copied the idea without giving any more thought to the problem. A fixed value of 4 is not so good in endgames where the EBF drops way down. It is probably not so good in complicated middlegame positions where the moves per node is higher. I've been playing around with a better way.

The minimum split depth idea is also one that was removed from Crafty years ago, and even what I do today gets played with from time to time... I don't believe it makes sense to say "I can't split within N plies of the tips" for constant N, because a tree of depth N represents a wildly varying amount of work, depending on whether this is a complicated middlegame position, or a simple endgame position. N becomes a compromise that either hurts MG or EG searches, yet we want both to be as efficient and fast as possible...

Finally, these numbers are _very much_ configuration dependent. Change the CPU speed, memory speed, cache sizes, etc, and the optimal numbers can and will change.

Joerg Oster · Post by **Joerg Oster** » Sun May 08, 2011 6:53 pm

Bob, many thanks for your comment/advice.

bob wrote: First, completely ignore NPS. That's irrelevant in this context. You want the shortest time.

Yes, I know. But if the nps would drop significantly, we'd know there is something going wrong.

bob wrote: Second, 50 positions, 1000 seconds average? 20 seconds per position? You probably need to search longer. Shallow searches are full of noise. Get up into the 60 seconds per position minimum.

I was afraid that someone would come up with this...

This will take a lot more time. Sigh.

bob wrote: Third, make sure each position takes about the same time. Searching them all to the same fixed depth is no good. What if 10 of them are searched in 1 second, and one takes 100? If you treat those all as equal, you let that long search get lost when averaged equally with the 10 short ones. The short ones are probably not very reliable, yet they count more due to averaging.

Good point. I will try to take care of this and throw out those positions which are solved within a few seconds.

bob wrote: Fourth, this number is probably not best represented as a constant (threads per split). This was a Crafty idea, from the beginning, and everyone seems to have copied the idea without giving any more thought to the problem. A fixed value of 4 is not so good in endgames where the EBF drops way down. It is probably not so good in complicated middlegame positions where the moves per node is higher. I've been playing around with a better way.

The minimum split depth idea is also one that was removed from Crafty years ago, and even what I do today gets played with from time to time... I don't believe it makes sense to say "I can't split within N plies of the tips" for constant N, because a tree of depth N represents a wildly varying amount of work, depending on whether this is a complicated middlegame position, or a simple endgame position. N becomes a compromise that either hurts MG or EG searches, yet we want both to be as efficient and fast as possible...

Yes, that's another goal of my test to show, that different MSD numbers for endgame and midgame positions might be helpful. I am thinking of an adaptive solution depending on the game phase.

bob wrote: Finally, these numbers are _very much_ configuration dependent. Change the CPU speed, memory speed, cache sizes, etc, and the optimal numbers can and will change.

rvida · Post by **rvida** » Mon May 09, 2011 12:29 am

bob wrote:<snip>
This was a Crafty idea, from the beginning, and everyone seems to have copied the idea without giving any more thought to the problem.
<snip>

Sorry Bob, this is an over-stretch and it sounds kinda narcistic....

<sarcasm on>
Of course... No innovation is possible without academic projects like Crafty. All original ideas must come from Crafty because all other chess programmers (especially commercial ones) are idiots with no brains.
<sarcasm off>

Have you ever considered that some ideas are in the category of 'common knowledge'? Especially the simpler ones.
If I dare to order captures in MVV/LVA order, you would certainly claim that I copied Crafty... sure...

Please excuse my whinings, I had too much beer tonight... Usually I am much better at controlling my tone...

Richard

bob · Post by **bob** » Mon May 09, 2011 12:34 am

Joerg Oster wrote:Bob, many thanks for your comment/advice.

bob wrote: First, completely ignore NPS. That's irrelevant in this context. You want the shortest time.
Yes, I know. But if the nps would drop significantly, we'd know there is something going wrong.

That's why I posted. That is wrong. Properly tuned, Crafty's NPS is not as high as can be produced. But speedup is what we care about, not NPS...

bob wrote: Second, 50 positions, 1000 seconds average? 20 seconds per position? You probably need to search longer. Shallow searches are full of noise. Get up into the 60 seconds per position minimum.
I was afraid that someone would come up with this...
This will take a lot more time. Sigh.

bob wrote: Third, make sure each position takes about the same time. Searching them all to the same fixed depth is no good. What if 10 of them are searched in 1 second, and one takes 100? If you treat those all as equal, you let that long search get lost when averaged equally with the 10 short ones. The short ones are probably not very reliable, yet they count more due to averaging.
Good point. I will try to take care of this and throw out those positions which are solved within a few seconds.

bob wrote: Fourth, this number is probably not best represented as a constant (threads per split). This was a Crafty idea, from the beginning, and everyone seems to have copied the idea without giving any more thought to the problem. A fixed value of 4 is not so good in endgames where the EBF drops way down. It is probably not so good in complicated middlegame positions where the moves per node is higher. I've been playing around with a better way.

The minimum split depth idea is also one that was removed from Crafty years ago, and even what I do today gets played with from time to time... I don't believe it makes sense to say "I can't split within N plies of the tips" for constant N, because a tree of depth N represents a wildly varying amount of work, depending on whether this is a complicated middlegame position, or a simple endgame position. N becomes a compromise that either hurts MG or EG searches, yet we want both to be as efficient and fast as possible...
Yes, that's another goal of my test to show, that different MSD numbers for endgame and midgame positions might be helpful. I am thinking of an adaptive solution depending on the game phase.

bob wrote: Finally, these numbers are _very much_ configuration dependent. Change the CPU speed, memory speed, cache sizes, etc, and the optimal numbers can and will change.

bob · Post by **bob** » Mon May 09, 2011 12:38 am

rvida wrote:
bob wrote:<snip>
This was a Crafty idea, from the beginning, and everyone seems to have copied the idea without giving any more thought to the problem.
<snip>
Sorry Bob, this is an over-stretch and it sounds kinda narcistic....

I don't follow. It is a simple statement of fact.

<sarcasm on>
Of course... No innovation is possible without academic projects like Crafty. All original ideas must come from Crafty because all other chess programmers (especially commercial ones) are idiots with no brains.
<sarcasm off>

Sarcasm is not making you a single point here, so what's the point of posting. Anyone can confirm where this idea came from. Nothing wrong with copying ideas. Unless the idea is not a particularly good one. This idea works, but not as well as I would like. The minimum split depth was another idea that didn't work very well. It was replaced by an idea that worked a little better, but there is a ton of room for improvement.

Have you ever considered that some ideas are in the category of 'common knowledge'? Especially the simpler ones.
If I dare to order captures in MVV/LVA order, you would certainly claim that I copied Crafty... sure...

Nope. I would say that you took that idea from Belle, and Ken Thompson, which is where it originated. But then I have some idea about who did what, and when. Perhaps you don't???

Please excuse my whinings, I had too much beer tonight... Usually I am much better at controlling my tone...

Richard

rvida · Post by **rvida** » Mon May 09, 2011 12:52 am

Hi Marco,

Just a tip.

Most losing captures are refuted very quickly. Splitting after all good captures/killers and all of the quiet moves were already searched is IMO not worth the overhead. It might be beneficial to restrict splits only when the move picker object is in phase < LOSING_CAPTURES.

Richard

mcostalba · Post by **mcostalba** » Mon May 09, 2011 7:40 am

rvida wrote:Hi Marco,

Just a tip.

Most losing captures are refuted very quickly. Splitting after all good captures/killers and all of the quiet moves were already searched is IMO not worth the overhead. It might be beneficial to restrict splits only when the move picker object is in phase < LOSING_CAPTURES.

Richard

This is a good idea ! Thanks Richard !

I will test it on our main testing framework: this is the typical case where self-testing is the best choice.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Mon May 09, 2011 9:37 am

rvida wrote:Hi Marco,

Just a tip.

Most losing captures are refuted very quickly. Splitting after all good captures/killers and all of the quiet moves were already searched is IMO not worth the overhead. It might be beneficial to restrict splits only when the move picker object is in phase < LOSING_CAPTURES.

Hmm, you gave me a good related idea: do not split if there is singular move.

Stockfish and optimizing options for SMP search

Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search