The upcoming Y2038 catastrophe

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: The upcoming Y2038 catastrophe

Post by syzygy »

Wow, you've realised that sysadmins setting the time will cause problems for gettimeofday().
bob wrote:Sorry, go read up on ntpd. It GUARANTEES that the clock adjustments are applied fractionally so that there is NEVER any step backward. And as it learns how time slips on a specific machine, it gets even better in that it slowly adjusts the clock throughout the day, even between samples.
Show us where it says that...
https://www.ietf.org/rfc/rfc5905.txt
It does not.

Show us where it requires an OS able of adjusting time monotonically.
It does not.

POSIX supports adjtime() which can be used for small monotonic adjustments. Even on a POSIX system the reference implementation in the RFC is not monotonic:

Code: Select all

/*
 * step_time() - step system time to given offset value
 */
void
step_time(
        double  offset          /* clock offset */
        )
{
        struct timeval unix_time;
        tstamp  ntp_time;

        /*
         * Convert from double to native format (signed) and add to the
         * current time.  Note the addition is done in native format to
         * avoid overflow or loss of precision.
         */
        gettimeofday(&unix_time, NULL);
        ntp_time = D2LFP(offset) + U2LFP(unix_time);
        unix_time.tv_sec = ntp_time >> 32;
        unix_time.tv_usec = &#40;long&#41;((&#40;ntp_time - unix_time.tv_sec&#41; <<
            32&#41; / FRAC * 1e6&#41;;
        settimeofday&#40;&unix_time, NULL&#41;;
&#125;

...
        /*
         * Clock state machine transition function.  This is where the
         * action is and defines how the system reacts to large time
         * and frequency errors.  There are two main regimes&#58; when the
         * offset exceeds the step threshold and when it does not.
         */
        rval = SLEW;
        mu = p->t - s.t;
        freq = 0;
        if &#40;fabs&#40;offset&#41; > STEPT&#41; &#123;
                switch &#40;c.state&#41; &#123;
...
                default&#58;

                        /*
                         * This is the kernel set time function, usually
                         * implemented by the Unix settimeofday&#40;) system
                         * call.
                         */
                        step_time&#40;offset&#41;;
                        c.count = 0;
                        s.poll = MINPOLL;
                        rval = STEP;
                        if &#40;state == NSET&#41; &#123;
                                rstclock&#40;FREQ, p->t, 0&#41;;
                                return &#40;rval&#41;;
                        &#125;
                        break;
                &#125;
                rstclock&#40;SYNC, p->t, 0&#41;;
        &#125; else &#123;

                /*
                 * Compute the clock jitter as the RMS of exponentially
                 * weighted offset differences.  This is used by the
                 * poll-adjust code.
                 */
...
If the absolute adjustment to be made exceeds a threshold, it simply calls settimeofday() which certainly does not lead to monotonic behaviour.

And for your convenience, as I realise that reading and somewhat understanding the RFC is too much to ask of you, here is a relevant part:

Code: Select all

11.2.3.  Combine Algorithm

...

   Each time an update is received from the system peer, the clock
   update routine is called.  By rule, an update is discarded if its
   time of arrival p.t is not strictly later than the last update used
   s.t.  The labels IGNOR, PANIC, ADJ, and STEP refer to return codes
   from the local clock routine described in the next section.

   IGNORE means the update has been ignored as an outlier.  PANIC means
   the offset is greater than the panic threshold PANICT &#40;1000 s&#41; and
   SHOULD cause the program to exit with a diagnostic message to the
   system log.  STEP means the offset is less than the panic threshold,
   but greater than the step threshold STEPT &#40;125 ms&#41;.  In this case,
   the clock is stepped to the correct offset, but since this means all
   peer data have been invalidated, all associations MUST be reset and
   the client begins as at initial start.
Update more than 125ms, then the clock is stepped (forward or backward).
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The upcoming Y2038 catastrophe

Post by bob »

syzygy wrote:Wow, you've realised that sysadmins setting the time will cause problems for gettimeofday().
bob wrote:Sorry, go read up on ntpd. It GUARANTEES that the clock adjustments are applied fractionally so that there is NEVER any step backward. And as it learns how time slips on a specific machine, it gets even better in that it slowly adjusts the clock throughout the day, even between samples.
Show us where it says that...
https://www.ietf.org/rfc/rfc5905.txt
It does not.

Show us where it requires an OS able of adjusting time monotonically.
It does not.

POSIX supports adjtime() which can be used for small monotonic adjustments. Even on a POSIX system the reference implementation in the RFC is not monotonic:

Code: Select all

/*
 * step_time&#40;) - step system time to given offset value
 */
void
step_time&#40;
        double  offset          /* clock offset */
        )
&#123;
        struct timeval unix_time;
        tstamp  ntp_time;

        /*
         * Convert from double to native format &#40;signed&#41; and add to the
         * current time.  Note the addition is done in native format to
         * avoid overflow or loss of precision.
         */
        gettimeofday&#40;&unix_time, NULL&#41;;
        ntp_time = D2LFP&#40;offset&#41; + U2LFP&#40;unix_time&#41;;
        unix_time.tv_sec = ntp_time >> 32;
        unix_time.tv_usec = &#40;long&#41;((&#40;ntp_time - unix_time.tv_sec&#41; <<
            32&#41; / FRAC * 1e6&#41;;
        settimeofday&#40;&unix_time, NULL&#41;;
&#125;

...
        /*
         * Clock state machine transition function.  This is where the
         * action is and defines how the system reacts to large time
         * and frequency errors.  There are two main regimes&#58; when the
         * offset exceeds the step threshold and when it does not.
         */
        rval = SLEW;
        mu = p->t - s.t;
        freq = 0;
        if &#40;fabs&#40;offset&#41; > STEPT&#41; &#123;
                switch &#40;c.state&#41; &#123;
...
                default&#58;

                        /*
                         * This is the kernel set time function, usually
                         * implemented by the Unix settimeofday&#40;) system
                         * call.
                         */
                        step_time&#40;offset&#41;;
                        c.count = 0;
                        s.poll = MINPOLL;
                        rval = STEP;
                        if &#40;state == NSET&#41; &#123;
                                rstclock&#40;FREQ, p->t, 0&#41;;
                                return &#40;rval&#41;;
                        &#125;
                        break;
                &#125;
                rstclock&#40;SYNC, p->t, 0&#41;;
        &#125; else &#123;

                /*
                 * Compute the clock jitter as the RMS of exponentially
                 * weighted offset differences.  This is used by the
                 * poll-adjust code.
                 */
...
If the absolute adjustment to be made exceeds a threshold, it simply calls settimeofday() which certainly does not lead to monotonic behaviour.

And for your convenience, as I realise that reading and somewhat understanding the RFC is too much to ask of you, here is a relevant part:

Code: Select all

11.2.3.  Combine Algorithm

...

   Each time an update is received from the system peer, the clock
   update routine is called.  By rule, an update is discarded if its
   time of arrival p.t is not strictly later than the last update used
   s.t.  The labels IGNOR, PANIC, ADJ, and STEP refer to return codes
   from the local clock routine described in the next section.

   IGNORE means the update has been ignored as an outlier.  PANIC means
   the offset is greater than the panic threshold PANICT &#40;1000 s&#41; and
   SHOULD cause the program to exit with a diagnostic message to the
   system log.  STEP means the offset is less than the panic threshold,
   but greater than the step threshold STEPT &#40;125 ms&#41;.  In this case,
   the clock is stepped to the correct offset, but since this means all
   peer data have been invalidated, all associations MUST be reset and
   the client begins as at initial start.
Update more than 125ms, then the clock is stepped (forward or backward).
Wow. I realized that 30 years ago. I found a solution that eliminates ALL the issues. How cool is that? And I DID read the RFC. Nothing you wrote above contradicts anything I have written, nor Steven. I have no idea what you quoted any of that for. Once it is set up and running, it runs automatically, by itself, with no problems. Most set this up to use multiple servers so that if one goes away or misbehaves, there are others to keep time synchronized...

Of course you can actually set it up and watch the update log if you are interested in what is going on. Once set up there are no sharp jumps, period. Unless you want to claim the rest of the world suddenly jumps 20+ minutes ahead and confuses things.

It DOES work. 99.9% of all unix installations come with this already installed because it does work so well.
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: The upcoming Y2038 catastrophe

Post by syzygy »

bob wrote:I have no idea what you quoted any of that for.
Short-term memory problems, right?
bob wrote:It GUARANTEES that the clock adjustments are applied fractionally so that there is NEVER any step backward.
I've just spelt out for you how wrong you are.

Of course you go on saying you are right.

The intellectual dishonesty must be dripping from your keyboard. And that is why you're a waste of time.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The upcoming Y2038 catastrophe

Post by bob »

syzygy wrote:
bob wrote:I have no idea what you quoted any of that for.
Short-term memory problems, right?
bob wrote:It GUARANTEES that the clock adjustments are applied fractionally so that there is NEVER any step backward.
I've just spelt out for you how wrong you are.

Of course you go on saying you are right.

The intellectual dishonesty must be dripping from your keyboard. And that is why you're a waste of time.
Correct. And what I stated is EXACTLY what happens. If you want to talk about "what about when you boot up and the time is set wrong" that is a different issue. Once it is set, it is monotonic. And it remains that way until the next reboot with an invalid time set from somewhere.

So what exactly do you believe happens? Of course the "date" command in unix breaks the monotonic property. It is a bad idea to use it as a human, for that reason. With ntpd properly set up, you don't see the time jumping backward. And after it runs for a while, there is very little updating whatsoever since it "trains" the clock to reduce the error corrections. Fortunately for computers, there is no "forward clock slip" only backward, caused by missed timer interrupts. ntpd spends all of its time adding in tiny increments to keep it up to date, never over-correcting so that it has to back up and break the monotonic property.

What exactly is your issue here? It works. It has ALWAYS worked. Perhaps YOU couldn't get it to work right, I don't know. There's even test software around you can run to watch for "jumps" or even worse, "backward jumps". We've run this on our local ntp server on occasion with zero failures, as expected.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: The upcoming Y2038 catastrophe

Post by AlvaroBegue »

Perhaps gettimeofday has been good enough for the purpose of measuring thinking times in crafty, but it is true that it is not the right tool for the job of measuring elapsed time between events in general.

I understand Mr. Hyatt is unlikely to change his mind, because it's not his M.O., but perhaps others are following this conversation and want to know who is right. For those people, here's a link: http://blog.habets.pp.se/2010/09/gettim ... asure-time
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: The upcoming Y2038 catastrophe

Post by mvk »

syzygy wrote:
bob wrote:I have no idea what you quoted any of that for.
Short-term memory problems, right?
bob wrote:It GUARANTEES that the clock adjustments are applied fractionally so that there is NEVER any step backward.
I've just spelt out for you how wrong you are.

Of course you go on saying you are right.
When you boot up there can be negative steps.
Or when you you go offline for a while and reconnect.
Or when you close the lid and open it after some time.

There are no negative steps, except when there are.
[Account deleted]
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: The upcoming Y2038 catastrophe

Post by abulmo »

sje wrote:
abulmo wrote:
sje wrote:

Code: Select all

gettimeofday&#40;&tval, 0&#41;;
gettimeofday should not be used to measure elapsed time. You should use clock_gettime(CLOCK_MONOTONIC) or an equivalent function of your OS instead. If the time of our system is adjusted (by a user or automatically), gettimeofday can return a wrong value.
On my Linux machines, I run the network time daemon ntpd so there's never any need for a manual adjustment of the time. The Mac OS/X machines are the same. There are automatic adjustments; these are done with utmost subtlety so than interval measurements using gettimeofday() can be trusted.

The people who maintain and improve ntpd and the network time system have an almost unnatural affection for accuracy and precision, so I have faith in their work. I'd need a personal atomic time clock on my desk to get better results.
Well ntpd works well with clock_gettime(CLOCK_MONOTONIC,...), it does not with gettimeofday(). First, gettimeofday is marked obsolescent in the 2008 POSIX standard and may disappear in a future POSIX standard. The clock_gettime function is the function that replace it. If you want the behaviour of gettimeofday, you can call clock_gettime(CLOCK_REALTIME,...). if you want to measure the elapsed time of an event it is better to use clock_gettime(CLOCK_MONOTONIC,...). That's what POSIX 2008 says.
Richard
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

On POSIX compliance

Post by sje »

On POSIX compliance

The creat() C library call has been around from the early 1970s and has been deprecated for some thirty years. But it hasn't gone away, regardless of how much official disapproval it may have gotten. It will be around thirty years hence, along with gettimeofday().

This is not to say that POSIX has no merit. But my experience has been that when it came time to send a company engineer to a standards conference, the guy who got picked to go was always the guy least needed to do real design and development work.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: The upcoming Y2038 catastrophe

Post by abulmo »

bob wrote:
By far, a properly configured ntpd is the most commonly used and least problematic time fix that doesn't wreck things.

And for the record, ANYONE running with root access is absolutely a security risk. I don't think you can find a single person on the planet that has not broken something while running as root. Which is why most of us with any experience do not use root as our working account.
Not everybody is running a cluster of servers. On tablets, laptops, etc., with intermittent network connections, ntpd is not working as you except. If the time deviation is big enough (1000s), ntpd is not going to correct anything, and you have to change the time manually. For smaller time deviation, ntpd can call settimeofday and produce big time jump if you are using the obsolescent settimeofday function. This is only for small time deviation that ntpd works as you describe it. On a permanently connected computer, only small deviations may occur, but in real life, not all computer are permanently connected.
Richard
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: The upcoming Y2038 catastrophe

Post by abulmo »

mar wrote:
abulmo wrote:gettimeofday should not be used to measure elapsed time. You should use clock_gettime(CLOCK_MONOTONIC) or an equivalent function of your OS instead.
Yes, this is something that bothers me. Unfortunately CLOCK_MONOTONIC is not available on OSX
On OSX, Apple recommends to use mach_absolute_time() to measure elapsed time.
Richard