Stockfish "Use Sleeping Threads" Test

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test

Post by bob »

mcostalba wrote:
bob wrote: That's a strange comment. Every SMP program I have tried shows a modest NPS improvement
In SF when a thread has finished searching it keeps busy polling for new work to be done, this could be acceptable if each thread runs on a different CPU, but in case of HT when one thread keeps polling in a tight loop it drains resources from the sibling logical thread on the same CPU for no reason.

With the new "Sleeping thread" feature the threads that have finished searching and return to the split point root are put to sleep. For this to work you need a fast locking and condition variables scheme such are the newly introduces SRWLocks and Condition Variables under Windows so that the sleeping thread could be very quickly signaled to wake up when there's some new work to do.

To properly verify that the newly introduced feature does work is important to rely on modern hardware and on fast locking (so not the ones delivered in the official JA builds that instead aim at backward compatibility).
There is a fix for this. I think it is in my spinlock code. You need a "pause" instruction in the spin loop. That causes that thread to give up the physical cpu and let the other (useful) thread work. That way you don't spin and burn cpu resources significantly, assuming the other thread is always ready to run. When it has a memory/pipeline/etc stall, the spin loop will reactivate, but if you do the pause after each cycle, it will give up the physical core as soon as the other thread is ready to go again...

I did this back when the PIV came along. Intel had a note about this on their web site several years ago... Had forgotten about the "pause" issue until your comment above...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test (Crafty

Post by bob »

IQ wrote:But if we agree in principle, wouldnt then time to solutiion be a much better test than time to depth. I understand that time to depth is probably still ok, but in your reasoning you mention examples where it might fail and emphasize the need for a reasonable number of different positions to test. But why then, not use time to solution in the first place? In both cases you have to choose positions carefully and use multiple runs, but time to solution should be less error prone. Maybe somebody can do an experiment analyzing the SD of both approaches?
bob wrote:
IQ wrote:
bob wrote:
zullil wrote:Is the following statement reasonable?

Suppose that for each position and each fixed amount of search time, 16 threads reaches a higher depth than 8 threads. Then 16 threads is likely to perform at least as well as 8 threads, as measured by winning chess games.
Absolutely. But incredibly unlikely. To the point of "winning-the-lottery" type probability. :)

And the same rule still applies. Not just one run but several. But if you can improve the depth, then the thing will be stronger. Just watch out for flying pigs while doing this test. :)
I disagree here. Even a higher displayed depth in a fixed time means nothing. It could very well be that through the non deterministic nature of the smp, hash table interaction and the high selectivity of modern programs that a higher depth is reached without playing stronger. The best test in my mind would be the TIME to SOLUTION of positions with known best moves (or as an approximation the MOVE where a reasonable large sample of engines agree on as depth goes to infinity). If you average time to solution over a reasonable number of positions (whose estimates themselves should be averages of multiple runs) you should be fine. Don't let yourself be fooled by depth and nodes programs display, in a parallel world and with modern selective programs their informative value is relative.
While I agree in principle, I don't agree in practice. If we were diddling around with extensions and reductions and modifying them, I would not use time-to-depth for anything. But we are not modifying the search or pruning/reduction rules. There are not very many cases where you find the answer at a different depth when using threads vs single search. There are a few, which is why I always advocate using a significant number of positions, and then weeding the oddballs out. I have several positions where 2 threads is way more than 2x faster to get the right answer. I try to make sure that I don't depend on such positions to compute speedups. If you pick a set of positions, some will show super-linear speedup, and those should count. But those should not be the _only_ ones that count...

Time to fixed depth is a good SMP test. There will be occasional oddities. You just have to repeat the tests enough times that they don't skew the results...

At least 95% of the time, time to depth and time to solution will be comparable when computing speedups.
The problem will show up for either case. With non-deterministic behaviour, if you find the solution quicker, it will likely take you longer to reach that depth than on the runs when you don't find it quicker. In my testing, I don't see that very often, although it does happen. That data point is going to be an outlyer whether you time to the correct move being found or time to the specified depth being completed. You need enough runs over enough positions, so that those odd cases don't skew the final numbers...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test

Post by bob »

MikeB wrote:
bob wrote:
MikeB wrote:
zullil wrote: ...
while having both hyperthreading and Use Sleeping Threads enabled gives a speedup of about 10% compared to having no hyperthreading. (I have checked that my machine runs the 8 threads on 8 distinct physical cores. i.e., no hyperthreading.)

...
tha's what I saw too - it works best with both enabled ~ 10% gain I3, two physcial cores, 4 logical cores, Windows 7 , 64 bit.

Mike
Only problem is NPS is not important, time to a fixed depth is how chess is actually played. If your NPS goes up, _and_ your time to fixed depth goes up, you are not gaining anything, the program is weaker.
My coment referencing to fixed depth for m, the tiem to reach fixed depth is 10% or more faster. The NPS is about 30-40% higher with HT on my laptop Windows 7 laptop. HT is much different than how it was 5 years ago

===========================
MT=2 on a 2 Physical Core Machine

Total time (ms) : 53352
Nodes searched : 59118660
Nodes/second : 1108087

==============================
Same machine, same program, same bench test now using all 4 Logical Cores

Total time (ms) : 41542
Nodes searched : 61921860
Nodes/second : 1490584

I have tried it both long(60 seconds) and short searches ( 4- 5 seconds. HT enabled and used on my laptop consistly reaches a fixed depth faster than it does without enabled.
How many positions? If that is over hundreds of positions, and supported by a significant number of runs, then this would be a wonderfully efficient parallel search you are testing... Mine has always been better than most, and it can't do that...
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Stockfish "Use Sleeping Threads" Test

Post by MikeB »

bob wrote:
MikeB wrote:
bob wrote:
MikeB wrote:
zullil wrote: ...
while having both hyperthreading and Use Sleeping Threads enabled gives a speedup of about 10% compared to having no hyperthreading. (I have checked that my machine runs the 8 threads on 8 distinct physical cores. i.e., no hyperthreading.)

...
tha's what I saw too - it works best with both enabled ~ 10% gain I3, two physcial cores, 4 logical cores, Windows 7 , 64 bit.

Mike
Only problem is NPS is not important, time to a fixed depth is how chess is actually played. If your NPS goes up, _and_ your time to fixed depth goes up, you are not gaining anything, the program is weaker.
My coment referencing to fixed depth for m, the tiem to reach fixed depth is 10% or more faster. The NPS is about 30-40% higher with HT on my laptop Windows 7 laptop. HT is much different than how it was 5 years ago

===========================
MT=2 on a 2 Physical Core Machine

Total time (ms) : 53352
Nodes searched : 59118660
Nodes/second : 1108087

==============================
Same machine, same program, same bench test now using all 4 Logical Cores

Total time (ms) : 41542
Nodes searched : 61921860
Nodes/second : 1490584

I have tried it both long(60 seconds) and short searches ( 4- 5 seconds. HT enabled and used on my laptop consistly reaches a fixed depth faster than it does without enabled.
How many positions? If that is over hundreds of positions, and supported by a significant number of runs, then this would be a wonderfully efficient parallel search you are testing... Mine has always been better than most, and it can't do that...
No, it's just the 16 built in test positionss in stockfish bench.

After 335 games - ponder off , 2 physical core, 4 logical core, laptop, one version with mt=2 and the other mt =4, it is dead even. No worse, no better.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish "Use Sleeping Threads" Test

Post by mcostalba »

MikeB wrote: After 335 games - ponder off , 2 physical core, 4 logical core, laptop, one version with mt=2 and the other mt =4, it is dead even. No worse, no better.
Thanks for testing this.

Can I ask the time control you use ?

Thanks
Marco
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test

Post by bob »

MikeB wrote:
bob wrote:
MikeB wrote:
bob wrote:
MikeB wrote:
zullil wrote: ...
while having both hyperthreading and Use Sleeping Threads enabled gives a speedup of about 10% compared to having no hyperthreading. (I have checked that my machine runs the 8 threads on 8 distinct physical cores. i.e., no hyperthreading.)

...
tha's what I saw too - it works best with both enabled ~ 10% gain I3, two physcial cores, 4 logical cores, Windows 7 , 64 bit.

Mike
Only problem is NPS is not important, time to a fixed depth is how chess is actually played. If your NPS goes up, _and_ your time to fixed depth goes up, you are not gaining anything, the program is weaker.
My coment referencing to fixed depth for m, the tiem to reach fixed depth is 10% or more faster. The NPS is about 30-40% higher with HT on my laptop Windows 7 laptop. HT is much different than how it was 5 years ago

===========================
MT=2 on a 2 Physical Core Machine

Total time (ms) : 53352
Nodes searched : 59118660
Nodes/second : 1108087

==============================
Same machine, same program, same bench test now using all 4 Logical Cores

Total time (ms) : 41542
Nodes searched : 61921860
Nodes/second : 1490584

I have tried it both long(60 seconds) and short searches ( 4- 5 seconds. HT enabled and used on my laptop consistly reaches a fixed depth faster than it does without enabled.
How many positions? If that is over hundreds of positions, and supported by a significant number of runs, then this would be a wonderfully efficient parallel search you are testing... Mine has always been better than most, and it can't do that...
No, it's just the 16 built in test positionss in stockfish bench.

After 335 games - ponder off , 2 physical core, 4 logical core, laptop, one version with mt=2 and the other mt =4, it is dead even. No worse, no better.
Even worse, as that suggests that something is really biased in the test. I can play 335 games between two identical copies and not get "dead even" after just 335 games... This should be a 10-20 Elo change at least (-10 to -20) which means you need several thousand more games before the results begin to mean something.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Stockfish "Use Sleeping Threads" Test

Post by MikeB »

MikeB wrote:
bob wrote:
MikeB wrote:
bob wrote:
MikeB wrote:
zullil wrote: ...
while having both hyperthreading and Use Sleeping Threads enabled gives a speedup of about 10% compared to having no hyperthreading. (I have checked that my machine runs the 8 threads on 8 distinct physical cores. i.e., no hyperthreading.)

...
tha's what I saw too - it works best with both enabled ~ 10% gain I3, two physcial cores, 4 logical cores, Windows 7 , 64 bit.

Mike

Only problem is NPS is not important, time to a fixed depth is how chess is actually played. If your NPS goes up, _and_ your time to fixed depth goes up, you are not gaining anything, the program is weaker.
My coment referencing to fixed depth for m, the tiem to reach fixed depth is 10% or more faster. The NPS is about 30-40% higher with HT on my laptop Windows 7 laptop. HT is much different than how it was 5 years ago

===========================
MT=2 on a 2 Physical Core Machine

Total time (ms) : 53352
Nodes searched : 59118660
Nodes/second : 1108087

==============================
Same machine, same program, same bench test now using all 4 Logical Cores

Total time (ms) : 41542
Nodes searched : 61921860
Nodes/second : 1490584

I have tried it both long(60 seconds) and short searches ( 4- 5 seconds. HT enabled and used on my laptop consistly reaches a fixed depth faster than it does without enabled.
How many positions? If that is over hundreds of positions, and supported by a significant number of runs, then this would be a wonderfully efficient parallel search you are testing... Mine has always been better than most, and it can't do that...
No, it's just the 16 built in test positionss in stockfish bench.

After 335 games - ponder off , 2 physical core, 4 logical core, laptop, one version with mt=2 and the other mt =4, it is dead even. No worse, no better.
after 792 games - nst = no sleeping threads - so after 792 games, using the 4 threads with command 'use sleeping thread" =true, there is a slight advantage. TC ==10"/game plus 1" increment

1: Stockfish-201-64-mb 403.0/792
2: Stockfish-201-64-mb nst 389.0/792
pawnslinger
Posts: 42
Joined: Thu Jan 06, 2011 9:10 pm
Location: Mesa, AZ USA

Re: Stockfish "Use Sleeping Threads" Test

Post by pawnslinger »

I use Stockfish in conjunction with Aquarium. I analyze ongoing games that I am playing thru ICCF. With "sleeping threads" I have noticed that I am able to run more threads at the same core temperature. Even when not hyper-threading.

I run an OC'ed rig based on a i7-920 using air cooling, so I am very sensitive to my core temps. I frequently monitor them using "RealTemp". And try to maintain an average in the mid-60s C.

Before "Sleeping Threads", I was able to run Stockfish 1.9.1 with 4 cores and I would see average core temps run around 65C, depending on alot of extraneous factors... like ambient temp and dust clogging my air filter on the computer case.

So with "Sleeping Threads" enabled under Stockfish 2.0.1 I have been able to increase my threads to 6 and yet the core temps are still slightly lower than previously observed running 4 threads under Stockfish 1.9.1.

Also, it seems to me that now the size of the cache has an effect on core temps. I noticed the other night that a cache size of 2.4gb would cause higher core temps than a 1gb cache. Don't know why this is happening, but since this observation, I have kept my cache size on the smaller end of things.

Overall, I think that "sleeping threads" should be enabled by default. In my Aquarium setup, I have enabled them now, on a permanent basis.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish "Use Sleeping Threads" Test

Post by bob »

pawnslinger wrote:I use Stockfish in conjunction with Aquarium. I analyze ongoing games that I am playing thru ICCF. With "sleeping threads" I have noticed that I am able to run more threads at the same core temperature. Even when not hyper-threading.

I run an OC'ed rig based on a i7-920 using air cooling, so I am very sensitive to my core temps. I frequently monitor them using "RealTemp". And try to maintain an average in the mid-60s C.

Before "Sleeping Threads", I was able to run Stockfish 1.9.1 with 4 cores and I would see average core temps run around 65C, depending on alot of extraneous factors... like ambient temp and dust clogging my air filter on the computer case.

So with "Sleeping Threads" enabled under Stockfish 2.0.1 I have been able to increase my threads to 6 and yet the core temps are still slightly lower than previously observed running 4 threads under Stockfish 1.9.1.

Also, it seems to me that now the size of the cache has an effect on core temps. I noticed the other night that a cache size of 2.4gb would cause higher core temps than a 1gb cache. Don't know why this is happening, but since this observation, I have kept my cache size on the smaller end of things.

Overall, I think that "sleeping threads" should be enabled by default. In my Aquarium setup, I have enabled them now, on a permanent basis.
I would think that any increase in cache size would increase temp, for a simple reason. If the cpu stalls waiting on memory, temp will drop since it is not burning as much power. More cache = fewer memory stalls = more useful processor cycles that generate heat.

Are you talking about L3 cache here (since you mentioned gb it would have to be).
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish "Use Sleeping Threads" Test

Post by mcostalba »

pawnslinger wrote:I use Stockfish in conjunction with Aquarium. I analyze ongoing games that I am playing thru ICCF. With "sleeping threads" I have noticed that I am able to run more threads at the same core temperature. Even when not hyper-threading.

I run an OC'ed rig based on a i7-920 using air cooling, so I am very sensitive to my core temps. I frequently monitor them using "RealTemp". And try to maintain an average in the mid-60s C.

Before "Sleeping Threads", I was able to run Stockfish 1.9.1 with 4 cores and I would see average core temps run around 65C, depending on alot of extraneous factors... like ambient temp and dust clogging my air filter on the computer case.

So with "Sleeping Threads" enabled under Stockfish 2.0.1 I have been able to increase my threads to 6 and yet the core temps are still slightly lower than previously observed running 4 threads under Stockfish 1.9.1.

Also, it seems to me that now the size of the cache has an effect on core temps. I noticed the other night that a cache size of 2.4gb would cause higher core temps than a 1gb cache. Don't know why this is happening, but since this observation, I have kept my cache size on the smaller end of things.

Overall, I think that "sleeping threads" should be enabled by default. In my Aquarium setup, I have enabled them now, on a permanent basis.
When "sleeping threads" is on then there is less pressure on CPU becuase when a thread has finished its search job then simply stops running (sleeps) instead of continue ask for CPU resources as is with default setting. When new job comes ready then thread is awaken and starts another search.

We also have started to use "sleeping threads" for our development tests, not becuase is stronger (we still don't know) but because is "lighter" on the CPU pressure, for instance my QUAD runs much more quiet and cooler now (and is much more responsive to mouse, keyboard, etc. and other external events) and the result we want to achieve, i.e. understand if a modification is good or bad, can be done the same and in the same way than before.