Stockfish "Use Sleeping Threads" Test

mcostalba · Post by **mcostalba** » Thu Jan 06, 2011 12:38 pm

Houdini wrote:
mcostalba wrote:2) Verify what really happens doing a test on real games with proper conditions, well yes, maybe TC should be increased a bit say to 30"+0.1
Even 30"+0.1" is way too fast for this test.
For a 8-thread v 4-thread match I recommend an average move time of at least 3 to 5 seconds. For example 2'+2" would be probably be fine.

Robert

The match is 8 vs 16 threads and if there is a difference should be already visible at 30"+0.1", perhaps at 2'+2" (I really don't understand the +2", anyhow..+0.1 is to avoid losing on time due to GUI lag, but +2" has no sense to me) difference is bigger but you need a week of test that is not feasible so to me it's kind of way to say: don't do it and is not nice trying to prevent possible useful tests of other engines.

Houdini · Post by **Houdini** » Thu Jan 06, 2011 1:10 pm

mcostalba wrote:The match is 8 vs 16 threads and if there is a difference should be already visible at 30"+0.1", perhaps at 2'+2" (I really don't understand the +2", anyhow..+0.1 is to avoid losing on time due to GUI lag, but +2" has no sense to me) difference is bigger but you need a week of test that is not feasible so to me it's kind of way to say: don't do it and is not nice trying to prevent possible useful tests of other engines.

My recommendation is to have an average move time of 3 to 5 seconds, you can use any TC that achieves this.
One probably needs at least 1,000 games, so this test could indeed take about 5 days. That's the price one has to pay for obtaining meaningful data.

Your suggestion that I would be "trying to prevent possible useful tests of other engines" sounds a bit paranoid. You should be appreciating that I'm sharing some of my experience in the field, and you may have noticed that I'm saying exactly the same things in this thread as Bob Hyatt.
So please stop reading my posts as cunning attempts to sabotage Stockfish.

Robert

bob · Post by **bob** » Thu Jan 06, 2011 5:43 pm

ernest wrote:
zullil wrote:So my mistake was to focus on NPS rather than total time taken to search to a fixed depth.
Yes, but my experience is that MP non-reproducibility is much worse for time (and nodes searched) to a fixed depth, than for NPS...

Not sure how to interpret that. Crafty's NPS is certainly stable for a given position and number of threads. But so is the total lines of code. Neither of which means anything with respect to SMP performance and what you are gaining by using it...

There are lots of examples where you can use 8 threads, see the nps go up by 8x, and yet the time to depth is actually longer than when just using one thread. A net loss, yet the NPS looks great.

bob · Post by **bob** » Thu Jan 06, 2011 5:47 pm

Houdini wrote:
mcostalba wrote:The match is 8 vs 16 threads and if there is a difference should be already visible at 30"+0.1", perhaps at 2'+2" (I really don't understand the +2", anyhow..+0.1 is to avoid losing on time due to GUI lag, but +2" has no sense to me) difference is bigger but you need a week of test that is not feasible so to me it's kind of way to say: don't do it and is not nice trying to prevent possible useful tests of other engines.
My recommendation is to have an average move time of 3 to 5 seconds, you can use any TC that achieves this.
One probably needs at least 1,000 games, so this test could indeed take about 5 days. That's the price one has to pay for obtaining meaningful data.

Your suggestion that I would be "trying to prevent possible useful tests of other engines" sounds a bit paranoid. You should be appreciating that I'm sharing some of my experience in the field, and you may have noticed that I'm saying exactly the same things in this thread as Bob Hyatt.
So please stop reading my posts as cunning attempts to sabotage Stockfish.

Robert

I agree. Longer is better. SMP is wildly variable already. Using extremely short search times only serves to magnify that variability, which is not a good thing. Also, as I mentioned previously, as you search deeper, ordering near the root becomes more accurate because of hashing, etc. That serves to improve SMP performance. Of course, there is little point in testing at one hour per move where SMP will be very effective, since we are not going to play real games that slowly... But something reasonable would be better...

I don't particularly like the games idea, because it takes a ton of games to average out the SMP variableness. I'd prefer to test over a representative set of positions and repeat the test many times and average the speedups for comparison, rather than committing to that many games.

zullil · Post by **zullil** » Thu Jan 06, 2011 5:48 pm

mcostalba wrote:
zullil wrote: I've learned (or been reminded about) a lot in this thread. Thanks to all who responded.
My very trivial and rude opinion here is the following.

We have two ways to proceeding:

1) Continue to "learning" discussing interesting theories in the forum

2) Verify what really happens doing a test on real games with proper conditions, well yes, maybe TC should be increased a bit say to 30"+0.1

Sorry to be so pragmatic

Hi Marco,

When I use "bench 1024 [8,16] 3 default time", I get only a 3% gain in nps when going from 8 to 16 threads. With "bench 1024 [8,16] 5 default time", the gain is still just 7%.

Thus, as Robert Houdart and Bob Hyatt have suggested, test games at short time controls are not likely to demonstrate anything---and that's also assuming that gains in nps result in better chess performance.

Both Tord and Bob have convinced me that increasing nps is not likely to increase chess performance in any meaningful way, unless time/depth is also being decreased (which doesn't seem to be happening).

Since testing at anything but the fastest time controls would completely occupy my (work

) computer for an extended period, I won't be able to run this test. Perhaps someone else with more hardware can run the test for you.

Thanks again for your work with SF.

Louis

bob · Post by **bob** » Thu Jan 06, 2011 5:52 pm

IQ wrote:
bob wrote:
zullil wrote:Is the following statement reasonable?

Suppose that for each position and each fixed amount of search time, 16 threads reaches a higher depth than 8 threads. Then 16 threads is likely to perform at least as well as 8 threads, as measured by winning chess games.
Absolutely. But incredibly unlikely. To the point of "winning-the-lottery" type probability.

And the same rule still applies. Not just one run but several. But if you can improve the depth, then the thing will be stronger. Just watch out for flying pigs while doing this test.
I disagree here. Even a higher displayed depth in a fixed time means nothing. It could very well be that through the non deterministic nature of the smp, hash table interaction and the high selectivity of modern programs that a higher depth is reached without playing stronger. The best test in my mind would be the TIME to SOLUTION of positions with known best moves (or as an approximation the MOVE where a reasonable large sample of engines agree on as depth goes to infinity). If you average time to solution over a reasonable number of positions (whose estimates themselves should be averages of multiple runs) you should be fine. Don't let yourself be fooled by depth and nodes programs display, in a parallel world and with modern selective programs their informative value is relative.

While I agree in principle, I don't agree in practice. If we were diddling around with extensions and reductions and modifying them, I would not use time-to-depth for anything. But we are not modifying the search or pruning/reduction rules. There are not very many cases where you find the answer at a different depth when using threads vs single search. There are a few, which is why I always advocate using a significant number of positions, and then weeding the oddballs out. I have several positions where 2 threads is way more than 2x faster to get the right answer. I try to make sure that I don't depend on such positions to compute speedups. If you pick a set of positions, some will show super-linear speedup, and those should count. But those should not be the _only_ ones that count...

Time to fixed depth is a good SMP test. There will be occasional oddities. You just have to repeat the tests enough times that they don't skew the results...

At least 95% of the time, time to depth and time to solution will be comparable when computing speedups.

bob · Post by **bob** » Thu Jan 06, 2011 5:53 pm

MikeB wrote:
zullil wrote: ...
while having both hyperthreading and Use Sleeping Threads enabled gives a speedup of about 10% compared to having no hyperthreading. (I have checked that my machine runs the 8 threads on 8 distinct physical cores. i.e., no hyperthreading.)

...
tha's what I saw too - it works best with both enabled ~ 10% gain I3, two physcial cores, 4 logical cores, Windows 7 , 64 bit.

Mike

Only problem is NPS is not important, time to a fixed depth is how chess is actually played. If your NPS goes up, _and_ your time to fixed depth goes up, you are not gaining anything, the program is weaker.

ernest · Post by **ernest** » Thu Jan 06, 2011 6:16 pm

bob wrote:...MP non-reproducibility...
Not sure how to interpret that.

I just meant (but no news to you...) that on a bi or a quad, if you repeatedly test "go depth 20" from a given position, you will get a broad span of "time" and "nodes", but kN/s will remain pretty much the same (+-10% span)

bob · Post by **bob** » Thu Jan 06, 2011 9:03 pm

ernest wrote:
bob wrote:...MP non-reproducibility...
Not sure how to interpret that.
I just meant (but no news to you...) that on a bi or a quad, if you repeatedly test "go depth 20" from a given position, you will get a broad span of "time" and "nodes", but kN/s will remain pretty much the same (+-10% span)

Yes, I would agree with that. But it doesn't say much about performance. Parallel search introduces significant overhead in terms of extra nodes searched. The NPS gain has to at least offset that, and 10-15% improvement is nowhere near enough.

mcostalba · Post by **mcostalba** » Thu Jan 06, 2011 11:17 pm

zullil wrote: Hi Marco,

When I use "bench 1024 [8,16] 3 default time", I get only a 3% gain in nps when going from 8 to 16 threads. With "bench 1024 [8,16] 5 default time", the gain is still just 7%.

Thus, as Robert Houdart and Bob Hyatt have suggested, test games at short time controls are not likely to demonstrate anything---and that's also assuming that gains in nps result in better chess performance.

Both Tord and Bob have convinced me that increasing nps is not likely to increase chess performance in any meaningful way, unless time/depth is also being decreased (which doesn't seem to be happening).

Since testing at anything but the fastest time controls would completely occupy my (work ) computer for an extended period, I won't be able to run this test. Perhaps someone else with more hardware can run the test for you.

Thanks again for your work with SF.

Louis

Hi Louis,

thanks to you for your very interesting tests. You are the first to report an nps improvment swicthing on HT. Of course this does not mean SF is stronger, but anyhow is something that until yesterday was not even thinkable for SF because the way threads are managed was not allowing this.

Thanks
Marco

Stockfish "Use Sleeping Threads" Test

Re: Stockfish "Use Sleeping Threads" Test

Re: Stockfish "Use Sleeping Threads" Test

Re: Stockfish "Use Sleeping Threads" Test

Re: Stockfish "Use Sleeping Threads" Test

Re: Stockfish "Use Sleeping Threads" Test

Re: Stockfish "Use Sleeping Threads" Test (Crafty

Re: Stockfish "Use Sleeping Threads" Test

Re: Stockfish "Use Sleeping Threads" Test

Re: Stockfish "Use Sleeping Threads" Test

Re: Stockfish "Use Sleeping Threads" Test