The upcoming Y2038 catastrophe

sje · Post by **sje** » Mon Sep 29, 2014 3:56 am

bob wrote:The problem is, when interrupts are disabled, and two RTC interrupts occur, you will only see one when you re-enable.

Yes, that's true. But how often are interrupts disabled? Usually, the only time this happens is during boot when the interrupt vector dispatch table is initialized. Maybe it happens during shutdown, too. Either way, no user code is running.

I suppose there could be an exception when a vector address is changed, but this happens so fast that's it's unlikely that TWO clock interrupts can occur during the swap.

bob · Post by **bob** » Mon Sep 29, 2014 5:04 am

sje wrote:
bob wrote:The problem is, when interrupts are disabled, and two RTC interrupts occur, you will only see one when you re-enable.
Yes, that's true. But how often are interrupts disabled? Usually, the only time this happens is during boot when the interrupt vector dispatch table is initialized. Maybe it happens during shutdown, too. Either way, no user code is running.

I suppose there could be an exception when a vector address is changed, but this happens so fast that's it's unlikely that TWO clock interrupts can occur during the swap.

It happens whenever an interrupt is being processed as the most common example, where it is automatic in fact.

sje · Post by **sje** » Mon Sep 29, 2014 6:38 am

bob wrote:It happens whenever an interrupt is being processed as the most common example, where it is automatic in fact.

Well, that depends on the hardware.

With Intel CPUs and the like, there are explicit instructions EI and DI for enabling/disabling interrupts. There's an interrupt flip flop that comes up reset at power up and stays that way until set by an EI then later reset with a DI.

I vaguely recall some boot code for various CPUs where the very first instruction executed was a DI or the equivalent, just to make sure that there would be no door knockers before the CPU could put its mind together.

Now a CPU, or an attached programmable interrupt controller chip, can have an interrupt mask which allows only certain interrupts to be acknowledged; the CPU can also have an interrupt level register which permits interrupts based on priority (e.g., Motorola CPUs). Either way, interrupts are not automatically disabled when one occurs, but one interrupt can suspend all others of less priority.

And if a CPU has an NMI (non maskable interrupt) capability, nothing can block that. On the early Macintosh models, an NMI could be manually generated by pressing the hidden Programmer's Switch; this would tickle the CPU's NMI pin to unfreeze a locked-up Mac and activate the internal debugger. I used this many times during the development of Spector. Next to the NMI button was a second secret button which would send a reset to the CPU's RESET pin, and I used that a lot as well.

Now I know that you know all of this, but perhaps there are some readers here who are less familiar with this particular low level technology.

--------

It was long ago and far away, but I can still recall an undergraduate programming class where there were only two assignments:

1) Write a microkernel which serviced interrupts with the servicing done with interrupts disabled;
2) Write a microkernel which NEVER disabled interrupts.

Many students had quite a bit of difficulty with the latter of the two.

bob · Post by **bob** » Mon Sep 29, 2014 5:31 pm

sje wrote:
bob wrote:It happens whenever an interrupt is being processed as the most common example, where it is automatic in fact.
Well, that depends on the hardware.

With Intel CPUs and the like, there are explicit instructions EI and DI for enabling/disabling interrupts. There's an interrupt flip flop that comes up reset at power up and stays that way until set by an EI then later reset with a DI.

I vaguely recall some boot code for various CPUs where the very first instruction executed was a DI or the equivalent, just to make sure that there would be no door knockers before the CPU could put its mind together.

Now a CPU, or an attached programmable interrupt controller chip, can have an interrupt mask which allows only certain interrupts to be acknowledged; the CPU can also have an interrupt level register which permits interrupts based on priority (e.g., Motorola CPUs). Either way, interrupts are not automatically disabled when one occurs, but one interrupt can suspend all others of less priority.

And if a CPU has an NMI (non maskable interrupt) capability, nothing can block that. On the early Macintosh models, an NMI could be manually generated by pressing the hidden Programmer's Switch; this would tickle the CPU's NMI pin to unfreeze a locked-up Mac and activate the internal debugger. I used this many times during the development of Spector. Next to the NMI button was a second secret button which would send a reset to the CPU's RESET pin, and I used that a lot as well.

Now I know that you know all of this, but perhaps there are some readers here who are less familiar with this particular low level technology.

--------

It was long ago and far away, but I can still recall an undergraduate programming class where there were only two assignments:

1) Write a microkernel which serviced interrupts with the servicing done with interrupts disabled;
2) Write a microkernel which NEVER disabled interrupts.

Many students had quite a bit of difficulty with the latter of the two.

The intel CPU automatically disables an interrupt when it is triggered because the OS sets the interrupt flag correctly. Otherwise it would be chaos when an I/O interrupt occurs, and while you are removing that request from the appropriate I/O queue, another I/O interrupt occurs and you re-enter the interrupt handler but with a broken I/O queue. You CAN allow interrupts to remain enabled, but I don't know of a system that does this.

I modified the kernel on our old xerox computer 30+ years ago to not depend on disabling interrupts. It was a royal pain that brought very little performance improvement. The only solution I found was to use software interrupts to defer things that could be deferred which might have a race, as the machine did have interrupt priorities that were adjustable. I/O is a bear and a half there, however.

And yes, non-maskable interrupts have always been around, dating back to the 60's. Most common example was "power failure" and/or "machine check".

sje · Post by **sje** » Tue Sep 30, 2014 8:57 am

Again. it depends on the hardware.

If interrupts are ordered by priority, then on most (all?) systems, when an interrupt is admitted, interrupts of equal or lower priority are not admitted; but interrupts of greater priority including the NMI door basher are still enabled -- unless a DI type instruction turns off all non-NMI interrupts.

After an interrupt is serviced, the last instruction in the service routine is (usually) a RETI (return from interrupt). This resets the interrupt priority level to what it was when the interrupt occurred.

--------

When I wrote my interrupts-enabled all-the-time microkernel, it wasn't that difficult; at least once I figured out the how and why all routines needed to be re-entrant.

Incidentally, this was done back in the late punch card era. My instructor, a silver-haired elderly woman, was the only member of the faculty who did not have a PhD. However, she was one of the world's first programmers, having worked with the Harvard Mark I machine during WW II.

bob · Post by **bob** » Tue Sep 30, 2014 6:52 pm

sje wrote:Again. it depends on the hardware.

If interrupts are ordered by priority, then on most (all?) systems, when an interrupt is admitted, interrupts of equal or lower priority are not admitted; but interrupts of greater priority including the NMI door basher are still enabled -- unless a DI type instruction turns off all non-NMI interrupts.

After an interrupt is serviced, the last instruction in the service routine is (usually) a RETI (return from interrupt). This resets the interrupt priority level to what it was when the interrupt occurred.

--------

When I wrote my interrupts-enabled all-the-time microkernel, it wasn't that difficult; at least once I figured out the how and why all routines needed to be re-entrant.

Incidentally, this was done back in the late punch card era. My instructor, a silver-haired elderly woman, was the only member of the faculty who did not have a PhD. However, she was one of the world's first programmers, having worked with the Harvard Mark I machine during WW II.

Those days were a lot simpler. You can't make all code reentrant. For example, when an I/O interrupt occurs you have to update an I/O queue, AND process table information. If you are in the "top half" of the kernel (the non-interrupt handler part) you can also be updating such things, and the solution has always been a "disable interrupts" instruction to avoid those problems. But they also exist in interrupt handlers as well. If you try to use priorities, then you are still disabling interrupts, just in a non-direct way, since the same interrupt can't happen again until the current one is dismissed.

In any case, it is an interesting thing to work with. But somewhere you always have to disable interrupts while handling interrupts, even if it is in an indirect way handled through multiple levels of priority (as one might do with Intel and the APIC.) But in the real world, when an interrupt can not reach the cpu, it is disabled whether you disabled it, the cpu disabled it, or the APIC just holds on to it due to priority.

sje · Post by **sje** » Wed Oct 01, 2014 6:58 am

bob wrote:If you try to use priorities, then you are still disabling interrupts, just in a non-direct way, since the same interrupt can't happen again until the current one is dismissed.

Yes, at some point, the current interrupt has to be disabled, or de-prioritized, for everything to work consistently. Depending on other factors, the use of a global disable/enable may or may not be the best approach, although it's certainty the simplest one.

One way I've seen is for an ISR (interrupt service routine, the first responder to an interrupt) to snapshot the data associated with the interrupt and then stick this into a dequeue for a top half routine to process later. The problem is what should the ISR do if the dequeue is currently busy or full, and the simplest answer is to just toss the snapshot.

--------

On the old Control Data 6000 series mainframes, there was an optional (!) real time clock attached to one of the I/O channels. The O/S would read the RTC at boot and also once a day at midnight. The system time jumped several seconds every midnight, but only the system operator ever noticed this. The RTC was nothing fancy, just a digital clock which used the power company's 60 Hz sine wave. I suppose Seymour Cray already had a nice wristwatch and he didn't need another one.

bob · Post by **bob** » Wed Oct 01, 2014 8:16 pm

sje wrote:
bob wrote:If you try to use priorities, then you are still disabling interrupts, just in a non-direct way, since the same interrupt can't happen again until the current one is dismissed.
Yes, at some point, the current interrupt has to be disabled, or de-prioritized, for everything to work consistently. Depending on other factors, the use of a global disable/enable may or may not be the best approach, although it's certainty the simplest one.

One way I've seen is for an ISR (interrupt service routine, the first responder to an interrupt) to snapshot the data associated with the interrupt and then stick this into a dequeue for a top half routine to process later. The problem is what should the ISR do if the dequeue is currently busy or full, and the simplest answer is to just toss the snapshot.

--------

On the old Control Data 6000 series mainframes, there was an optional (!) real time clock attached to one of the I/O channels. The O/S would read the RTC at boot and also once a day at midnight. The system time jumped several seconds every midnight, but only the system operator ever noticed this. The RTC was nothing fancy, just a digital clock which used the power company's 60 Hz sine wave. I suppose Seymour Cray already had a nice wristwatch and he didn't need another one.

tossing the snapshot sounds ugly. A process that is blocked waiting on an I/O might be pretty unhappy to sit there forever.

sje · Post by **sje** » Thu Oct 02, 2014 1:24 am

bob wrote:tossing the snapshot sounds ugly. A process that is blocked waiting on an I/O might be pretty unhappy to sit there forever.

Yep, it's bad, but not as bad as a system freeze. I'm thinking of the 1201/1202 alarms which came from the Apollo 11 lunar module guidance computer in the last minute or so prior to landing. Too many external requests and not enough time to handle all the interrupts. The alarms were scary, but not as scary as would have been a seized up computer.

--------

On some 1970s Unix systems, it was possible for a fast typist to cause the terminal line input buffer to overflow; input characters would be tossed without warning. This was one reason that full duplex operation was preferred, so that a very fast typist could see when they were being too fast.

sje · Post by **sje** » Fri Oct 03, 2014 7:50 am

A true story example of lethal consequences of losing interrupts

Vaguely related: I recall an engineering screw-up which resulted in multiple cancers and some deaths from a failure to catch interrupts in a timely manner.

It was perhaps a few decades ago that there was a particular model of a hospital x-ray imaging machine which produced large scale radiographs. When a patient was positioned in the machine, a hospital technician would set the beam intensity using a dial which had an optical shaft encoder. There were many different possible intensity settings and it was common for the tech to spin the dial, perhaps several times, to have it set in the prescribed position.

The first design error was that the signal coming from the shaft encoder did not give the absolute position of the dial, but rather only a pulse train that counted the number of position transitions.

The second design error was using a cheap microcontroller to catch the shaft interrupts. The chip just couldn't handle incoming interrupts at a rate greater than perhaps 10 Hz, and so a technician's dial spinning would sometimes produce a setting which appeared correct visually, but had little relation to what the microcontroller had counted. This would at times set the beam intensity/duration far, far above what it should have been. Many unfortunate patients developed cancers and several died before the nature of the problem was determined.

The upcoming Y2038 catastrophe

Re: If you can read this...

Re: If you can read this...

Door knockers

Re: Door knockers

Re: Door knockers

Re: Door knockers

Re: Door knockers

Re: Door knockers

Re: Door knockers

Lethal consequences of losing interrupts