RETOUR MOT_VERS TITRE_DOC_KERNEL
Interrupt Handling Internals in Linux Kernel
By Gaurav Dhiman
Reviewed by Mathieu Deschamps
Introduction
This article talks about internal details of Interrupt Handling in Linux
Kernel. This will discuss, the hardware prospective of interrupt
handling from CPU, Linux Kernel's Interrupt Routing subsystem, Device
Drivers's role in Interrupt handling.
Term Interrupt is self
defined, Interrupts are signals sent to CPU on an INTR bus (connected
to CPU) whenever any device want to get attention of CPU. As soon as
the interrupt signal occurs, CPU defer the current activity and service
the interrupt by executing the interrupt handler corresponding to that
interrupt number (also know as IRQ number).
One of the clasifications of Interrupts can be done as follows:
- Synchronous Interrupts (also know on as software interrupts)
- Asynchronous Interrupts (also know as hardware interrupts)
Basic difference between these is that, synchronous interrupts are generated
by CPU's control unit on facing some abnormal condition; these are also
know as exception in Intel's terminology. These are interrupts which
are generated by CPU itself either when CPU detects an abnormal
condition or CPU executes some of the special instructions like 'int'
or 'int3' etc. On other hand, asynchronous interupts are those, which
actually are generated by outside world (devices connected to CPU), As
these interrupts can occur at any point of time, these are known as
asynchronous interrupts.
Its important to note that both
synchornous and asynchronous interrupts are handled by CPU on
instruction completion as the interrupt occur. Execution of
a machine instruction is not done in one single CPU cycle, it take some
cycles to complete. Any interrupt occurring in between the execution of
instruction, will not be handled immediately, rather CPU will check out
interrupts upon instruction completion.
CPU support to handle interrupts
To handle interrupts, there are few things which we expect CPU
to do on occurence of every interrupt. When ever an interrupt occurs, CPU
performs some hardware checks, which are very much needed to
make the system secure. Before explaining the hardware checks, we will
understand how interrupts are routed to CPU from hardware devices.
Details of Programmable Interrupt Controller
On Intel architecture, system devices (device controllers) are connected
to a special device known as PIC (Programmable Interrupt Controller).
CPU have two lines to receive interrupt signals (NMI and INTR). NMI
line is to recieve non-maskable interrupts; the interrupts which can
not be masked, means which can not be blocked at any cost. These
interrupts are of hightest priority and are rarely used.
INTR line is the line on which all interrupts from system devices are received.
Theses interrupts can be masked or blocked. As all interrupt signals
need to be multiplxed on a single CPU line, we need some mechanism
through which interrupts from different device controllers can be
routed to CPU single line. This routing or multiplexing is done PIC.
PIC stands between system devices and CPU, it has multiple input lines;
each connected to different device contollers. However PIC have only one output
line which is connected to CPU's INTR line on which it sends signal
to CPU.
There are two PIC controllers joined up (output of second PIC controller is connected to
second input of first PCI). This setup allows maximum of 15 input lines on which
different system device controllers can be connected. PIC have some
programmable registers, through which CPU communicates with it (give
command, mask/unmask interrupt lines, read status). Both PICs have their
own following registers:
- Mask Register
- Status Register
Mask register is used to mask/unmask a specific interrupt line. CPU can ask
the PIC to mask (block) the specific interrupt by setting the
corresponding bit in mask register. Unmasking can be done by clearing
that bit. When a particular interrupt is being masked, PIC do receive
interrupts on its corresponding line, but do not send interrupt
to CPU. When an interrupts are being masked, they are not lost, rather PIC remembers
those and do send interrupts to CPU when CPU unmasks that interrupt
line. Masking is different from blocking all the interrupts to CPU.
CPU can ignore all the interrupts coming on INTR line by clearing the IF
flag in EFLAGS register of CPU. When this bit is cleared, interrupts
coming on INTR line are simply ignored by CPU, we can consider it to be
blocking interrupts. So now we understand that masking is done at
PIC level and individual interrupt lines can be masked or unmasked,
whereas blocking is done at CPU level and is done for all the
interrupts except NMI (Non-Maskable Interrupt), which is received on
CPU NMI line and can't be blocked or ignored.
Nowadays,
interrupt architecture is not as simple as shown above. Nowadays
machines uses the APIC (Advanced Programmable Interrupt Controller),
which can support upto 256 interrupt lines. Along with APIC, every CPU
also have builtin IO-APIC. We wont go into those details right now
(it will be covered in future articles).
Hardware checks performed by CPU
Once interrupt signal is received by CPU, CPU performs some hardware
checks for which no software machine instruction are executed. Before
looking into what these checks are, we need to understand some
architecture specific data structures maintained by the kernel.
Details of Interrupt Descriptor Table
Kernel need to maintain one IDT (Interrupt Descriptor Table), which actually
maps interrupt line with an interrupt handler routine. This is a 256 entries table
long and each entry takes 8 bytes. First 32 entries of
this table are used for exceptions and the rest is used for hardware
interrupts received from outer world. This table can contain three
different types of entries; these three different types are as follows:
- Task Gates
- Trap Gates
- Interrupt Gates
Task Gates
Here is task gates format table :
Bits |
Description |
0-15 |
reserved (not used) |
16-31 |
points to the TSS (Task State Segment) entry of the process to which we need to switch |
32-39 |
these bits are reserved and are not currently used |
40-43 |
specify the type of entry (its value for task gate is 0101) |
40-43 |
specify the type of entry (its value for task gate is 0101) |
44 |
always 0, not used |
45-46 |
this specifies the DPL (Decsriptor Previlege Level) level of gate entry |
47 |
specifies if this entry is valid or not (1 - valid, 0 - invalid) |
48-63 |
reserved (not used) |
Basically task gates are used in IDT, to allow the user process to make a
context switch with another process without requesting the kernel to do
this. As soon as this gate is hit (interrupt received on line for which
there is a task gate in IDT), CPU saves the context (state of processor
registers) of currently running process to the TSS (Task State Segment) of current process,
whose address is saved in TR (Task Register) of CPU. After saving
context of current process, CPU sets CPU registers with values
stored in TSS of new process, whose pointer is saved in 16-31
bits of task gate. Once registers are set with those new
values, processor gets new process and context switch is done.
Linux do not use task gates, it only uses trap and interrupt gates in IDT.
So I will not explain the task gates any further.
Trap Gates
Format of trap gates is as follows:
Bits |
Description |
0-15 |
first 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit |
16-31 |
indicates the index of segment descriptor in GDT (Global Descriptor Table) |
32-36 |
these bits are reserved and are not currently used |
37-39 |
always 000, not used |
40-43 |
specify entry type(its value for trap gate is 1111) |
44 |
always 0, not used |
45-46 |
this specifies the DPL (Descriptor Previlege Level) level of gate entry |
47 |
specifies if this entry is valid or not (1 - valid, 0 - invalid) |
48-63 |
last 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit |
Trap gates are basically used to handle exceptions generated by CPU. 0-15
bits and 48-63 bits together form pointer to a kernel function (offset in segment
identified by 16-31 bits of this entry).
The only difference between trap gates and interrupt gates is that, whenever an
interrupt gate is hit, CPU automatically disables interrupts by
clearing the IF flag in CPU's EFLAG register, whereas in case of trap
gate it's not done and interrupts remain enabled. As mentionned
earlier trap gates are used for exceptions, so first 32 entries in IDT
are initialized with trap gates. In addition, Linux Kernel also
use trap gate for system call entry (entry 128 of IDT).
Interrupt Gates
Format of interrupt gates is the same as trap gates explained above, expect for
type field value (40-43 bits). In case of trap gates, it's 1111 value and in case
of interrupts its 1110. Format is as follows:
Bits |
Description |
0-15 |
first 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit |
16-31 |
indicates the index of segment descriptor in GDT (Global Descriptor Table) |
32-36 |
these bits are reserved and are not currently used |
37-39 |
always 000, not used |
40-43 |
specify entry type (its value for interrupt gate is 1110) |
44 |
always 0, not used |
45-46 |
this specifies the DPL (Descriptor Previlege Level) level of gate entry |
47 |
specifies if this entry is valid or not (1 - valid, 0 - invalid) |
48-63 |
last 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit |
Note: whenever the interrupt gate is hit, interrupts are disabled automatically.
Hardware Checks for Interrupts and Exceptions
Whenever an exception or interrupt occurs, corresponding trap/interrupt gate is
hit and CPU performs some checks with fields of these gates. Things
done by CPU are as follows:
- get the ith entry from IDT (physical address and size of IDT is stored in IDTR register of CPU).
- read segment descriptor index from 16-31 bits of IDT entry, lets say this to be 'n'
- gets segment descriptor from 'n'th entry in GDT (physical address and size of GDT is stored in GDTR register of CPU)
- DPL of the 'n'th entry in the GDT should be less that equal to CPL (Current Previelge Level, specified in the read-only lowermost two bits of CS register). Incase DPL > CPL, CPU will generate general protection exception. We will see ahead, what does this check mean and why this is done. Simply saying:
- DPL (of GDT entry) <= CPL : ok, switches stack if DPL < CPL
- DPL (of GDT entry) > CPL : general protection exception
If DPL (of GDT entry) < CPL, we are entering the higher previlege level
(probably from user to kernel mode). In this case CPU switches hardware stack
(SS and ESP registers) from currently running process's user mode stack to
its kernel mode stack. We will see ahead, how this stack switch is exactly done.
Note: stack switching idea has been mentioned here, but it actually happens after the 5th step mentioned
below.
- for software interrupts (generated by assembly
instructions 'int'), one more check is done. This check is not
performed for hardware interrupts (interrupts are generated by system
devices and forwarded by PIC). Simply saying:
- DPL (of IDT entry) >= CPL : ok, we have permission to enter through this gate
- DPL (of IDT entry) < CPL : genreal protection exception
- switches stack if DPL (of GDT entry) < CPL. In addition to this
CPU mode (least significant two bits of CS) are also changed from
CPL to DPL (of GDT entry)
- if stack switch has taken place (SS and ESP registers reset to kernelstack), then pushes the old
values of SS and ESP (pointing to user stack) on this new stack (kernel stack)
- pushes the EFALGS, CS and EIP registers on stack (note: now we are working on kernel stack).
This actually saves pointer to user application instruction to which we need to return back after servicing interrupt
or exception
- In case of exceptions, if there is any harware code, processor pushes that also on kernel stack
- loads CS with GDT entry value and EIP with IDT offset entry (0-15 bits + 48-63 bits)
All above action is done by CPU hardware without execution of any
software instruction. Checks performed at step 4th and 5th are important.
4th checks make sure that code we are going to execute (Interrupt Service Routine) does not
fall in a segment with lesser previlege. Obivously the ISR can not be in lesser previlege
segment that what we are into. DPL or CPL can have 4 values (0,1,2 for
kernel mode and 3 for user mode). Out of these four only two are used,
that is 0 (for kernel mode) and 3 (for user mode).
5th check makes sure that application can enter kernel mode through specific
gates only, in Linux only through 128th gate entry which is for system
call invocation. If we set IDT entry's DPL field to be 0,1 or 2,
application program (running with CPL 3) cannot enter through that
gate entry. If it tries, CPU will generate general protection
exception. This is the reason that in Linux, DPL fields of all the IDT
entries (except 128th entry used for system call) are initialized with
value '0', this makes sure only kernel code can access these gates not
application code. In Linux 128th entry (used for system call) is of
trap gate type and its DPL value is initialized to 3, so that
application code can enter through this gate by assembly instruction
"int 0x80"
Now lets see how does stack switch happens when DPL (of GDT entry) < CPL.
CPU have TR (Task Register) register, which actually points to TSS (Task Sate Segment)
of currently running process. TSS is an architecture defined data structure which
contains processor state registers whenever a context switch of
this process happens. TSS include three sets of ESS and ESP fields, one
for each level of processor (0,1 and 2). These fields specifies
stack to be used whenever we enter that processor level. Lets say
DPL value in GDT entry is 0, in this case, CPU will load SS register with SS value in TSS field for 0 level
and ESP register with ESP field value in TSS for 0 level. After loading SS and ESP with these values,
CPU starts pointing to new kernel level stack o current process. Old values of SS and ESP (CPU remembers them
somehow) are now pushed on this new kernel level stack; this is done as we need to return back to old stack once
we service interrupts, exception or system call.
Prudent readers must be wondering, why there is no field for level 3 stack in TSS. Well reason for this is that
we never use the CPU's stack switching mechanism to switch from higher CPU level (kernel mode - 0,1 and 2) to lower CPU level (user mode - 3). This is the reason why CPU while entering higher level(kernel mode) saves previously used lower level stack (user mode) on the kernel stack.
Once all this CPU action is done, CPU's CS and EIP registers are pointing to kernel functions written to handle
interrupts or exceptions. CPU simply start executing instructions at this point (now we are in kernel mode - level 0)
Kernel Support for Handling Interrupts
In this section, we will be covering kernel code executed in interrupt context.
I will be refering to code as per 2.4.18 kernel release.
Low Level Interrupt Stubs
Whenever n interrupt occurs, CPU performs above mentioned hardware checks
and start executing following assembly instructions in kernel,
whose pointer (offest in kernel code segment) is stored corresponding
IDT entry.
File: include/asm-i386/hw_irq.h
155#define BUILD_COMMON_IRQ()
156 asmlinkage void call_do_IRQ(void);
157 __asm__(
158 "\n" __ALIGN_STR"\n"
159 "common_interrupt:\n\t"
160 SAVE_ALL
161 SYMBOL_NAME_STR(call_do_IRQ)":\n\t"
162 "call " SYMBOL_NAME_STR(do_IRQ) "\n\t"
163 "jmp ret_from_intr\n");
[...]
175 #define BUILD_IRQ(nr)
176 asmlinkage void IRQ_NAME(nr);
177 __asm__(
178 "\n"__ALIGN_STR"\n"
179 SYMBOL_NAME_STR(IRQ) #nr "_interrupt:\n\t"
180 "pushl $"#nr"-256\n\t"
181 "jmp common_interrupt");
This macros is used at kernel initialization time to write out
lowest interrupt stubs, which can be called from IDT by saving there
offsets (pointers) in IDT gates. Kernel maintains one global array of
function pointers (name of array - interrupt) in which it stores
pointer of these stubs. Code related to creation of these stubs (using
above mentioned BUILD_IRQ macro) and saving their pointers in
global array "interrupt[NR_IRQS]" can be seen in file
below. In this file you will see the usage of
BUILD_IRQ macro to create the interrupt stubs as follows:
File: arch/i386/kernel/i8259.c
40 #define BI(x,y)
41 BUILD_IRQ(x##y)
42
43 #define BUILD_16_IRQS(x)
44 BI(x,0) BI(x,1) BI(x,2) BI(x,3)
45 BI(x,4) BI(x,5) BI(x,6) BI(x,7)
46 BI(x,8) BI(x,9) BI(x,a) BI(x,b)
47 BI(x,c) BI(x,d) BI(x,e) BI(x,f)
48
49 /*
50 * ISA PIC or low IO-APIC triggered (INTA-cycle or APIC) interrupts:
51 * (these are usually mapped to vectors 0x20-0x2f)
52 */
53 BUILD_16_IRQS(0x0)
54
55 #ifdef CONFIG_X86_IO_APIC
56 /*
57 * The IO-APIC gives us many more interrupt sources. Most of these
58 * are unused but an SMP system is supposed to have enough memory ...
59 * sometimes (mostly wrt. hw bugs) we get corrupted vectors all
60 * across the spectrum, so we really want to be prepared to get all
61 * of these. Plus, more powerful systems might have more than 64
62 * IO-APIC registers.
63 *
64 * (these are usually mapped into the 0x30-0xff vector range)
65 */
66 BUILD_16_IRQS(0x1) BUILD_16_IRQS(0x2) BUILD_16_IRQS(0x3)
67 BUILD_16_IRQS(0x4) BUILD_16_IRQS(0x5) BUILD_16_IRQS(0x6) BUILD_16_IRQS(0x7)
68 BUILD_16_IRQS(0x8) BUILD_16_IRQS(0x9) BUILD_16_IRQS(0xa) BUILD_16_IRQS(0xb)
69 BUILD_16_IRQS(0xc) BUILD_16_IRQS(0xd)
70 #endif
71
72 #undef BUILD_16_IRQS
73 #undef BI
Above code actually creates interrupt stubs and do not place there
pointers in interrupt[NR_IRQS] array. Code which places pointers
of these stubs in global array is as follows and can be found in same
file "arch/x86_64/kernel/i8259.c"
File: arch/i386/kernel/i8259.c
100 #define IRQ(x,y)
101 IRQ##x##y##_interrupt
102
103 #define IRQLIST_16(x)
104 IRQ(x,0), IRQ(x,1), IRQ(x,2), IRQ(x,3),
105 IRQ(x,4), IRQ(x,5), IRQ(x,6), IRQ(x,7),
106 IRQ(x,8), IRQ(x,9), IRQ(x,a), IRQ(x,b),
107 IRQ(x,c), IRQ(x,d), IRQ(x,e), IRQ(x,f)
108
109 void (*interrupt[NR_IRQS])(void) = {
110 IRQLIST_16(0x0),
111
112 #ifdef CONFIG_X86_IO_APIC
113 IRQLIST_16(0x1), IRQLIST_16(0x2), IRQLIST_16(0x3),
114 IRQLIST_16(0x4), IRQLIST_16(0x5), IRQLIST_16(0x6), IRQLIST_16(0x7),
115 IRQLIST_16(0x8), IRQLIST_16(0x9), IRQLIST_16(0xa), IRQLIST_16(0xb),
116 IRQLIST_16(0xc), IRQLIST_16(0xd)
117 #endif
118 };
119
120 #undef IRQ
121 #undef IRQLIST_16
Above code actually files global array of function pointers (array name
interrupt[NR_IRQS]). Once global array is initialized with
pointers to interrupt stubs, we initialize IDT (Interrupt
Descriptor Table) in function "init_IRQ()" using this global array as
follows:
File: arch/i386/kernel/i8259.c, Function: init_IRQ()
for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
int vector = FIRST_EXTERNAL_VECTOR + i;
if (i >= NR_IRQS)
break;
if (vector != IA32_SYSCALL_VECTOR && vector != KDB_VECTOR) {
set_intr_gate(vector, interrupt[i]);
}
}
In above loop, we loop over all IDT entries staring from
"FIRST_EXTERNAL_VECTOR" (32, because first 32 enteries are for
exception) and call "set_intr_gate()" function which actually set
interrupt gate descriptor. For entry 128, which is for system call
invocation, interrupt gte is not set, for this rather trap gate is set
and that is done in function trap_init(). In the same function
init_IRQ(), after this looping, we initialize IPI (Interprocessor
Interrupts). These interrupts are sent from one CPU to another CPU in
SMP machines.
Now we can see once these IDT eneries are set, whenever an interrupt occurs,
CPU directly jumps to the code given in BUILD_IRQ macro. Now lets analyse what
this macro do. Following is the code for BUILD_IRQ macro:
File: include/asm-i386/hw_irq.h
#define BUILD_IRQ(nr) asmlinkage void IRQ_NAME(nr);
__asm__( "\n.p2align\n" "IRQ" #nr "_interrupt:\n\t" "push $" #nr "-256 ; " "jmp common_interrupt");
This assembly code first subtracts IRQ number from 256 and pushes result on kernel stack.
After doing this it jumps to "common_interrupt" assembly label, which simply saves
context of interrupted process (CPU resigters) on to kernel stack and then calls the C language
function "do_IRQ()".
Details of do_IRQ() function, interrupt handling core
do_IRQ() is the common function to all hardware interrupts. This function is the
most important one to understand interrupt handling perspective. We will cut interleave function code
with line by line explanation.
File: arch/i386/kernel/irq.c
563 asmlinkage unsigned int do_IRQ(struct pt_regs regs)
564 {
565 /*
566 * We ack quickly, we don't want the irq controller
567 * thinking we're snobs just because some other CPU has
568 * disabled global interrupts (we have already done the
569 * INT_ACK cycles, it's too late to try to pretend to the
570 * controller that we aren't taking the interrupt).
571 *
572 * 0 return value means that this irq is already being
573 * handled by some other CPU. (or is disabled)
574 */
575 int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_code */
576 int cpu = smp_processor_id();
577 irq_desc_t *desc = irq_desc + irq;
578 struct irqaction * action;
579 unsigned int status;
580
Line - 575 to 577
Get the number of the interrupt that got triggered. Its pushed on
kernel stack before pushing interrupted process context. Get
processor or CPU id on which this code is being executed or in other
words processor handling's CPU id fot this interrupt. Get pointer
to IRQ descriptor. IRQ descriptor is a kernel data structure which
actually binds together different ISRs (Interrupt Service Routines)
register by device drivers on same IRQ line. As mentioned
earlier also, same IRQ line can be shared between different devices, so
their device drivers need to register their own ISRs to handle
interrupts genetated by these devices. IRQ descriptor data structure is
defined as follows:
typedef struct {
unsigned int status;
hw_irq_controller *handler;
struct irqaction *action;
unsigned int depth;
spinlock_t lock;
} ____cacheline_aligned irq_desc_t;
Here is a description of theses fields :
- status : It's a bit mask of different flags to identify particular IRQ line's state.
We will see its differnet flags used further in this article.
- handler : This is the structure pointer, whose each element is a pointer to function related to
handling of physical PIC handling. These functions are used to mask/unmask particular interrupt lines in PIC
or to acknowledge interrupts to PIC. Definitions of these PIC related
functions can be found in file "arch/i386/kernel/i8259.c"
- action : This element is ISRs list pointer registered by
different device drivers for this IRQ line. When a device driver
registers its ISR to kernel using kernel function "irq_request()", the
ISR is added to this list for that particular IRQ line.
- lock : This is a spinlock to handle synchronization problem while accessing
any element in IRQ descriptor. Kernel execution context access
different elements of IRQ descriptor, but before doing so they should
acquire this spinlock so that synchronization can be maintained between probable concurrent access.
File: arch/i386/kernel/irq.c
581 kstat.irqs[cpu][irq]++;
582 spin_lock(&desc->lock);
583 desc->handler->ack(irq);
Line - 581 to 583
Here we increment interrupt count received by this CPU, this is
maintained for accountancy purpose. Hold the spinlock before accessing
any element of IRQ descriptor for our interrupt line. We also mask
and acknowledge interrupt to PIC using our IRQ descriptor's handler function.
584 /*
585 REPLAY is when Linux resends an IRQ that was dropped earlier
586 WAITING is used by probe to mark irqs that are being tested
587 */
588 status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
589 status |= IRQ_PENDING; /* we _want_ to handle it */
590
Line - 588 to 589
Now we clear IRQ_REPLAY and IRQ_WAITING flags from our IRQ descriptor flag.
As mentioned earlier this is used to maintain an interrupt handling line status.
We clear these flags because now we are going to handle this interrupt that
will not be in reply or waiting mode anymore. Actually IRQ_WAITING
flag is used by device drivers in conjunction with IRQ_AUTODETECT flag
for auto-detecting IRQ line to which their device is connected.
Device drivers use probe_irq_on() function, which actually sets
IRQ_AUTODETECT and IRQ_WAITING flag for all IRQ descriptors for
whose no ISR has yet been registered. After calling probe_irq_on()
function, device driver instructs device to trigger an interrupt
and then calls probe_irq_off(0 function. probe_irq_off() function
actually looks for those IRQ descriptors whose IRQ_AUTODETECT flag is
still set but IRQ_WAITING flag has been cleared. Then it returns IRQ
line number to device driver.
After clearing IRQ_REPLAY and IRQ_WAITING flags in do_IRQ() function we set IRQ_PENDING function.
This is done to indicate that we are planning to handle this interrupt
if this interrupt is not disabled or not being already handled by
another CPU (in case of SMP machines). Use of setting IRQ_PENDING
flag is explained in details in next few lines.
We have seen
the interrupt and want to handle it by calling ISRs set
(Interrupt Serive Routines) registered by different device drivers. We
set IRQ_PENDING flag because seeing an interrupt does not mean we will
for sure handle it. IRQ_PENDING flag helps us in following two cases:
- In case interrupt is disabled (set flag IRQ_DISABLED), we will not
service the interrupt and will just keep it marked as pending (set flag
IRQ_PENDING). Once the interrupt is again enabled (clear flag
IRQ_DISABLED), ISRs will be called to service the interrupt. So
IRQ_PENDING helps us to remember the intterupt which occured while that
interrupt was disabled due to some reason.
Note: Here disabling interrupt does not mean masking a particular line at PIC level or
disabling all the interrupt at CPU level by clearing the IF flag of CPU
EFLAG register. Disabling here means kernel has been asked not to
service interrupt, however hardware triggering of interrupt signal
is not being stopped at all.
- In case another CPU is already
handling the previous interrupt requests on this IRQ line. In this case
flag IRQ_INPROGRESS will already be set by that another CPU. Our role
will be to just mark the interrupt as IRQ_PENDING and in away asks that
other CPU to service this interrupt request also. When that CPU will
finish its handling of previous interrupt, it will check this flag.
Because of this flag being set by us, that CPU will again go and call
all the ISRs once agian to service interrupt request we received on
this IRQ line.
591 /*
592 * If the IRQ is disabled for whatever reason, we cannot
593 * use the action we have.
594 */
595 action = NULL;
596 if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) {
597 action = desc->action;
598 status &= ~IRQ_PENDING; /* we commit to handling */
599 status |= IRQ_INPROGRESS; /* we are handling it */
600 }
601 desc->status = status;
602
Line - 595 to 601
Now we check if this interrupt is not disabled (flag IRQ_DISABLED is clear)
and at the same time is also not being handled by another CPU (flag IRQ_INPROGRESS is
also clear), we go forward and clear IRQ_PENDING flag and sets
IRQ_INPROGRESS flag to indicate that we take responsibility of
handling this interrupt request. Now while we are handling this
interrupt request, let another CPU receives an interrupt on same
IRQ line, that CPU will simple mark IRQ_PENDING flag and will
transfer his responsibility to us, in that case we (CPU that is on execution)
will be responsible to serve that interrupt request also.
603 /*
604 * If there is no IRQ handler or it was disabled, exit early.
605 Since we set PENDING, if another processor is handling
606 a different instance of this same irq, the other processor
607 will take care of it.
608 */
609 if (!action)
610 goto out;
Line - 609 to 610
If there is no registered ISR for this IRQ line,
we simply return from interrupt context after releasig the lock we hold
and serving the softirqs (if any pending).
611
612 /*
613 * Edge triggered interrupts need to remember
614 * pending events.
615 * This applies to any hw interrupts that allow a second
616 * instance of the same irq to arrive while we are in do_IRQ
617 * or in the handler. But the code here only handles the _second_
618 * instance of the irq, not the third or fourth. So it is mostly
619 * useful for irq hardware that does not mask cleanly in an
620 * SMP environment.
621 */
622 for (;;) {
623 spin_unlock(&desc->lock);
624 handle_IRQ_event(irq, ®s, action);
625 spin_lock(&desc->lock);
626
627 if (!(desc->status & IRQ_PENDING))
628 break;
629 desc->status &= ~IRQ_PENDING;
630 }
Line - 622 to 630
Now we are ready to call registered ISRs (device driver's functions), so that
they can figure out which device connected to this IRQ line has
actually triggered an interrupt and can serve it properly. Before
calling ISRs, we release IRQ descriptor spinlock so that while
we are executing ISRs this spinlock can be acquired by another
interrupt context, which may execute on another CPU for the same IRQ
line. This interrupt context on another CPU will simply tag
IRQ_PENDING flag and return without handling interrupt itself. In
this infite loop we call handle_IRQ_event() function which actually
calls one by one all registrered ISRs for this IRQ line. After
exhausting ISRs list, we again acquire the IRQ descriptor
spinlock as we need to again check and update flag element of IRQ
descriptor. After spinlock acquisition, we check if IRQ_PENDING
flag is clear then we break out of this infinite loop, else we clear that
IRQ_PENDING flag of our IRQ descriptor and again go into
handle_IRQ_event() function to serve new interrupt request as
indicated by IRQ_PENDING flag.
631 desc->status &= ~IRQ_INPROGRESS;
Line - 631
Finally we come out of above mentioned infinite loop only if there is not pending request
for this IRQ line. Once we are out, it's mostly done for that part, so we clear IRQ_INPROGRESS flag.
632 out:
633 /*
634 * The ->end() handler has to deal with interrupts which got
635 * disabled while the handler was running.
636 */
637 desc->handler->end(irq);
638 spin_unlock(&desc->lock);
639
640 if (softirq_pending(cpu))
641 do_softirq();
642 return 1;
643 }
Line - 637 to 638
Now we call last function of PIC related functions stored in our IRQ descriptor's handler
field. This function take care of situation when interrupt we were handling got disabled while we were
handling it. Let's say while we were serving interrupt by calling all ISRs for it, interrupt got disabled
(flag IRQ_DISABLED is set) by code running on another CPU for instance, then in this case we should not
unmask interrupt line (which we masked by calling the PIC related ack() function, line 583).
If IRQ is not yet disabled, this function end() will simply unmask interrupt line at PIC level and return.
After this we go ahead and do serve pending softirqs (if any marked). We will see in next section what are softirqs.
I will soon post details of softirqs, tasklets and bottom halfs, so keep looking for that on my blog.
Agregated and Rearranged content, posted by Linux Kernel Group @
Blogspot