New Stockfish with Lazy_SMP, but what about the TC bug ?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by syzygy »

Milos wrote:Just read the whole discussion on node polling. Why the hell do you need to poll on all the threads???
Why the hell not? Doesn't really seem worth it to let each search thread first check whether it is the main thread or not before checking whether it's time to poll.

Is checking the time really slow on Windows? (At least on Linux it's not.)
Then check_time() could be modified to check whether it is being called from the main thread or from some other thread.

I agree each thread should use its own counter. Having a global counter will either slow down if it's updated atomically or mess up once in a while if it's updated non-atomically.
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by Dann Corbit »

200 ms was an example.
I don't know what the worst case lag is.
But yes, that is the general idea.
Do not try to use every last microsecond.

At tournament time control, the last little gasp of time will add no measurable Elo.

At tournament time control, exceeding out time allotment will subtract enormous Elo.

Of course, there is no way to eliminate all loss on time. Windows could essentially freeze for 10 minutes and force a loss on time in some drastic scenario.

We could also lose on time in a scenario that is our fault. Suppose we are almost out of time and we see a massive fail low (lose a queen, mate on the horizon or some such). We may lose on time trying to resolve it due to our very low time left. But we would have lost anyway so that example would not matter.

But if the sleep patterns are studied we could (for instance) take the +2 std deviation max as the time slice to toss out. Or some other value as revealed by testing as best.
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by Dann Corbit »

syzygy wrote: {snip}
Note that this can be done no matter whether you sleep once for 5 minutes, 60,000 times for 5ms, or log2(60,000) times.
True, but so far as I can see, {currently} Stockfish is trying to use it all.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by syzygy »

Dann Corbit wrote:
syzygy wrote: {snip}
Note that this can be done no matter whether you sleep once for 5 minutes, 60,000 times for 5ms, or log2(60,000) times.
True, but so far as I can see, {currently} Stockfish is trying to use it all.
So? That SF has/had a problem is clear, otherwise it would not have lost on time.

Original point was that your "binary search" is not the solution to SF's problem.

The main cause of SF's problem is the way it relies on sleep(). What you proposed has exactly the same problem. The solution of using a safety margin is easy and obvious, and applies just as well to what you propose as to how SF currently functions.

(A better solution is not relying on a separate sleep()ing thread and simply do what other engines do.)
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by Dann Corbit »

syzygy wrote:
Dann Corbit wrote:
syzygy wrote: {snip}
Note that this can be done no matter whether you sleep once for 5 minutes, 60,000 times for 5ms, or log2(60,000) times.
True, but so far as I can see, {currently} Stockfish is trying to use it all.
So? That SF has/had a problem is clear, otherwise it would not have lost on time.

Original point was that your "binary search" is not the solution to SF's problem.

The main cause of SF's problem is the way it relies on sleep(). What you proposed has exactly the same problem. The solution of using a safety margin is easy and obvious, and applies just as well to what you propose as to how SF currently functions.

(A better solution is not relying on a separate sleep()ing thread and simply do what other engines do.)
What I proposed solves both issues (thousands of timer calls in one search and running over time).

The fact that is easy and obvious is irrelevant.

I do not think that doing it the way other people do it is better.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by syzygy »

Dann Corbit wrote:What I proposed solves both issues (thousands of timer calls in one search and running over time).
Sigh...

The thousands of timer calls was not the problem. Something that could be improved maybe, but not the problem.

The running out of time is simply not solved by what you proposed. That problem can be solved independently in an easy way that does not need your "binary search" idea.

But we keep repeating. Not useful.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by bob »

lucasart wrote:
bob wrote:
Jesse Gersenson wrote:Lucas, I had a hunch (edited my message as you were responding, asking whether the timer function was at issue).

Ok, so it's the sleeper function. How do programs in the financial sector solve this problem?
They don't, they don't depend on ms or usec process scheduling delays.
Ever heard of high-frequency algorithmic trading ?
Just like computer chess, banks have evolved in the last 30 years.
They are NOT going to do ms-level transactions over any network other than perhaps infiniband, which is not part of any backbone anywhere except in a single computer room. Network lag/jitter is a well known problem, one that has not been solved, and one that won't be solved so long as the network is shared.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by bob »

syzygy wrote:
Milos wrote:Just read the whole discussion on node polling. Why the hell do you need to poll on all the threads???
Why the hell not? Doesn't really seem worth it to let each search thread first check whether it is the main thread or not before checking whether it's time to poll.

Is checking the time really slow on Windows? (At least on Linux it's not.)
Then check_time() could be modified to check whether it is being called from the main thread or from some other thread.

I agree each thread should use its own counter. Having a global counter will either slow down if it's updated atomically or mess up once in a while if it's updated non-atomically.
And don't forget about the cache costs whether it is atomically accessed or not. Global counters are an absolute no-no nowadays, thanks to cache and NUMA issues.
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by Dann Corbit »

The suggestions I made would be trivial to add and while they would not mathematically solve the problem, they would pragmatically solve it (e.g. one failure in a million games verses one failure in 10 games).

Since they are so simple, it would have been possible to incorporate them into the code for the TCEC contest, but that won't happen.

A complete rewrite to use node count modulus or something similar won't happen in time for the contest and it would not work any better anyway.

I guess that Stockfish will lose TCEC because nobody bothered to address the bug that caused it to lose 3 games.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: New Stockfish with Lazy_SMP, but what about the TC bug ?

Post by mcostalba »

Dann Corbit wrote: What I proposed solves both issues (thousands of timer calls in one search and running over time).

The fact that is easy and obvious is irrelevant.

I do not think that doing it the way other people do it is better.
Do you realize that your solution to fix time lose is exactly equivalent (but uglier) to increase move overhead at 200 msecs?

Do you realize that your solution to reduce check time overhead fixes a non-issue?