TITRE_DOC_KERNEL

Interrupt Handling Internals in Linux Kernel
By Gaurav Dhiman
Reviewed by Mathieu Deschamps

Introduction

This article talks about internal details of Interrupt Handling in Linux Kernel. This will discuss, the hardware prospective of interrupt handling from CPU, Linux Kernel's Interrupt Routing subsystem, Device Drivers's role in Interrupt handling.

Term Interrupt is self defined, Interrupts are signals sent to CPU on an INTR bus (connected to CPU) whenever any device want to get attention of CPU. As soon as the interrupt signal occurs, CPU defer the current activity and service the interrupt by executing the interrupt handler corresponding to that interrupt number (also know as IRQ number).

One of the clasifications of Interrupts can be done as follows:

Synchronous Interrupts (also know on as software interrupts)
Asynchronous Interrupts (also know as hardware interrupts)

Basic difference between these is that, synchronous interrupts are generated by CPU's control unit on facing some abnormal condition; these are also know as exception in Intel's terminology. These are interrupts which are generated by CPU itself either when CPU detects an abnormal condition or CPU executes some of the special instructions like 'int' or 'int3' etc. On other hand, asynchronous interupts are those, which actually are generated by outside world (devices connected to CPU), As these interrupts can occur at any point of time, these are known as asynchronous interrupts.

Its important to note that both synchornous and asynchronous interrupts are handled by CPU on instruction completion as the interrupt occur. Execution of a machine instruction is not done in one single CPU cycle, it take some cycles to complete. Any interrupt occurring in between the execution of instruction, will not be handled immediately, rather CPU will check out interrupts upon instruction completion.

CPU support to handle interrupts

To handle interrupts, there are few things which we expect CPU to do on occurence of every interrupt. When ever an interrupt occurs, CPU performs some hardware checks, which are very much needed to make the system secure. Before explaining the hardware checks, we will understand how interrupts are routed to CPU from hardware devices.

Details of Programmable Interrupt Controller

On Intel architecture, system devices (device controllers) are connected to a special device known as PIC (Programmable Interrupt Controller).
CPU have two lines to receive interrupt signals (NMI and INTR). NMI line is to recieve non-maskable interrupts; the interrupts which can not be masked, means which can not be blocked at any cost. These interrupts are of hightest priority and are rarely used.
INTR line is the line on which all interrupts from system devices are received. Theses interrupts can be masked or blocked. As all interrupt signals need to be multiplxed on a single CPU line, we need some mechanism through which interrupts from different device controllers can be routed to CPU single line. This routing or multiplexing is done PIC. PIC stands between system devices and CPU, it has multiple input lines; each connected to different device contollers. However PIC have only one output line which is connected to CPU's INTR line on which it sends signal to CPU.
There are two PIC controllers joined up (output of second PIC controller is connected to second input of first PCI). This setup allows maximum of 15 input lines on which different system device controllers can be connected. PIC have some programmable registers, through which CPU communicates with it (give command, mask/unmask interrupt lines, read status). Both PICs have their own following registers:

Mask Register
Status Register

Mask register is used to mask/unmask a specific interrupt line. CPU can ask the PIC to mask (block) the specific interrupt by setting the corresponding bit in mask register. Unmasking can be done by clearing that bit. When a particular interrupt is being masked, PIC do receive interrupts on its corresponding line, but do not send interrupt to CPU. When an interrupts are being masked, they are not lost, rather PIC remembers those and do send interrupts to CPU when CPU unmasks that interrupt line. Masking is different from blocking all the interrupts to CPU. CPU can ignore all the interrupts coming on INTR line by clearing the IF flag in EFLAGS register of CPU. When this bit is cleared, interrupts coming on INTR line are simply ignored by CPU, we can consider it to be blocking interrupts. So now we understand that masking is done at PIC level and individual interrupt lines can be masked or unmasked, whereas blocking is done at CPU level and is done for all the interrupts except NMI (Non-Maskable Interrupt), which is received on CPU NMI line and can't be blocked or ignored.

Nowadays, interrupt architecture is not as simple as shown above. Nowadays machines uses the APIC (Advanced Programmable Interrupt Controller), which can support upto 256 interrupt lines. Along with APIC, every CPU also have builtin IO-APIC. We wont go into those details right now (it will be covered in future articles).

Hardware checks performed by CPU

Once interrupt signal is received by CPU, CPU performs some hardware checks for which no software machine instruction are executed. Before looking into what these checks are, we need to understand some architecture specific data structures maintained by the kernel.

Details of Interrupt Descriptor Table

Kernel need to maintain one IDT (Interrupt Descriptor Table), which actually maps interrupt line with an interrupt handler routine. This is a 256 entries table long and each entry takes 8 bytes. First 32 entries of this table are used for exceptions and the rest is used for hardware interrupts received from outer world. This table can contain three different types of entries; these three different types are as follows:

Task Gates
Trap Gates
Interrupt Gates

Task Gates

Here is task gates format table :

`Bits`	Description
`0-15`	reserved (not used)
`16-31`	points to the TSS (Task State Segment) entry of the process to which we need to switch
`32-39`	these bits are reserved and are not currently used
`40-43`	specify the type of entry (its value for task gate is 0101)
`40-43`	specify the type of entry (its value for task gate is 0101)
`44`	always 0, not used
`45-46`	this specifies the DPL (Decsriptor Previlege Level) level of gate entry
`47`	specifies if this entry is valid or not (1 - valid, 0 - invalid)
`48-63`	reserved (not used)

Basically task gates are used in IDT, to allow the user process to make a context switch with another process without requesting the kernel to do this. As soon as this gate is hit (interrupt received on line for which there is a task gate in IDT), CPU saves the context (state of processor registers) of currently running process to the TSS (Task State Segment) of current process, whose address is saved in TR (Task Register) of CPU. After saving context of current process, CPU sets CPU registers with values stored in TSS of new process, whose pointer is saved in 16-31 bits of task gate. Once registers are set with those new values, processor gets new process and context switch is done. Linux do not use task gates, it only uses trap and interrupt gates in IDT. So I will not explain the task gates any further.

Trap Gates

Format of trap gates is as follows:

`Bits`	Description
`0-15`	first 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit
`16-31`	indicates the index of segment descriptor in GDT (Global Descriptor Table)
`32-36`	these bits are reserved and are not currently used
`37-39`	always 000, not used
`40-43`	specify entry type(its value for trap gate is 1111)
`44`	always 0, not used
`45-46`	this specifies the DPL (Descriptor Previlege Level) level of gate entry
`47`	specifies if this entry is valid or not (1 - valid, 0 - invalid)
`48-63`	last 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit

Trap gates are basically used to handle exceptions generated by CPU. 0-15 bits and 48-63 bits together form pointer to a kernel function (offset in segment identified by 16-31 bits of this entry). The only difference between trap gates and interrupt gates is that, whenever an interrupt gate is hit, CPU automatically disables interrupts by clearing the IF flag in CPU's EFLAG register, whereas in case of trap gate it's not done and interrupts remain enabled. As mentionned earlier trap gates are used for exceptions, so first 32 entries in IDT are initialized with trap gates. In addition, Linux Kernel also use trap gate for system call entry (entry 128 of IDT).

Interrupt Gates

Format of interrupt gates is the same as trap gates explained above, expect for type field value (40-43 bits). In case of trap gates, it's 1111 value and in case of interrupts its 1110. Format is as follows:

`Bits`	Description
`0-15`	first 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit
`16-31`	indicates the index of segment descriptor in GDT (Global Descriptor Table)
`32-36`	these bits are reserved and are not currently used
`37-39`	always 000, not used
`40-43`	specify entry type (its value for interrupt gate is 1110)
`44`	always 0, not used
`45-46`	this specifies the DPL (Descriptor Previlege Level) level of gate entry
`47`	specifies if this entry is valid or not (1 - valid, 0 - invalid)
`48-63`	last 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit

Note: whenever the interrupt gate is hit, interrupts are disabled automatically.

Hardware Checks for Interrupts and Exceptions

Whenever an exception or interrupt occurs, corresponding trap/interrupt gate is hit and CPU performs some checks with fields of these gates. Things done by CPU are as follows:

get the ith entry from IDT (physical address and size of IDT is stored in IDTR register of CPU).

read segment descriptor index from 16-31 bits of IDT entry, lets say this to be 'n'

gets segment descriptor from 'n'th entry in GDT (physical address and size of GDT is stored in GDTR register of CPU)

DPL of the 'n'th entry in the GDT should be less that equal to CPL (Current Previelge Level, specified in the read-only lowermost two bits of CS register). Incase DPL > CPL, CPU will generate general protection exception. We will see ahead, what does this check mean and why this is done. Simply saying:
- DPL (of GDT entry) <= CPL : ok, switches stack if DPL < CPL
- DPL (of GDT entry) > CPL : general protection exception

Note: stack switching idea has been mentioned here, but it actually happens after the 5th step mentioned below.

for software interrupts (generated by assembly instructions 'int'), one more check is done. This check is not performed for hardware interrupts (interrupts are generated by system devices and forwarded by PIC). Simply saying:
- DPL (of IDT entry) >= CPL : ok, we have permission to enter through this gate
- DPL (of IDT entry) < CPL : genreal protection exception

switches stack if DPL (of GDT entry) < CPL. In addition to this CPU mode (least significant two bits of CS) are also changed from CPL to DPL (of GDT entry)

if stack switch has taken place (SS and ESP registers reset to kernelstack), then pushes the old values of SS and ESP (pointing to user stack) on this new stack (kernel stack)

pushes the EFALGS, CS and EIP registers on stack (note: now we are working on kernel stack). This actually saves pointer to user application instruction to which we need to return back after servicing interrupt or exception

In case of exceptions, if there is any harware code, processor pushes that also on kernel stack

loads CS with GDT entry value and EIP with IDT offset entry (0-15 bits + 48-63 bits)

All above action is done by CPU hardware without execution of any software instruction. Checks performed at step 4th and 5th are important.
4th checks make sure that code we are going to execute (Interrupt Service Routine) does not fall in a segment with lesser previlege. Obivously the ISR can not be in lesser previlege segment that what we are into. DPL or CPL can have 4 values (0,1,2 for kernel mode and 3 for user mode). Out of these four only two are used, that is 0 (for kernel mode) and 3 (for user mode).

5th check makes sure that application can enter kernel mode through specific gates only, in Linux only through 128th gate entry which is for system call invocation. If we set IDT entry's DPL field to be 0,1 or 2, application program (running with CPL 3) cannot enter through that gate entry. If it tries, CPU will generate general protection exception. This is the reason that in Linux, DPL fields of all the IDT entries (except 128th entry used for system call) are initialized with value '0', this makes sure only kernel code can access these gates not application code. In Linux 128th entry (used for system call) is of trap gate type and its DPL value is initialized to 3, so that application code can enter through this gate by assembly instruction "int 0x80"

Now lets see how does stack switch happens when DPL (of GDT entry) < CPL. CPU have TR (Task Register) register, which actually points to TSS (Task Sate Segment) of currently running process. TSS is an architecture defined data structure which contains processor state registers whenever a context switch of this process happens. TSS include three sets of ESS and ESP fields, one for each level of processor (0,1 and 2). These fields specifies stack to be used whenever we enter that processor level. Lets say DPL value in GDT entry is 0, in this case, CPU will load SS register with SS value in TSS field for 0 level and ESP register with ESP field value in TSS for 0 level. After loading SS and ESP with these values, CPU starts pointing to new kernel level stack o current process. Old values of SS and ESP (CPU remembers them somehow) are now pushed on this new kernel level stack; this is done as we need to return back to old stack once we service interrupts, exception or system call.
Prudent readers must be wondering, why there is no field for level 3 stack in TSS. Well reason for this is that we never use the CPU's stack switching mechanism to switch from higher CPU level (kernel mode - 0,1 and 2) to lower CPU level (user mode - 3). This is the reason why CPU while entering higher level(kernel mode) saves previously used lower level stack (user mode) on the kernel stack.

Once all this CPU action is done, CPU's CS and EIP registers are pointing to kernel functions written to handle interrupts or exceptions. CPU simply start executing instructions at this point (now we are in kernel mode - level 0)

Kernel Support for Handling Interrupts

In this section, we will be covering kernel code executed in interrupt context. I will be refering to code as per 2.4.18 kernel release.

Low Level Interrupt Stubs

Whenever n interrupt occurs, CPU performs above mentioned hardware checks and start executing following assembly instructions in kernel, whose pointer (offest in kernel code segment) is stored corresponding IDT entry.

File: include/asm-i386/hw_irq.h

155#define BUILD_COMMON_IRQ() 
156 asmlinkage void call_do_IRQ(void);
157 __asm__( 
158 "\n" __ALIGN_STR"\n" 
159 "common_interrupt:\n\t" 
160 SAVE_ALL 
161 SYMBOL_NAME_STR(call_do_IRQ)":\n\t" 
162 "call " SYMBOL_NAME_STR(do_IRQ) "\n\t"
163 "jmp ret_from_intr\n");
[...]
175 #define BUILD_IRQ(nr) 
176 asmlinkage void IRQ_NAME(nr); 
177 __asm__(
178 "\n"__ALIGN_STR"\n" 
179 SYMBOL_NAME_STR(IRQ) #nr "_interrupt:\n\t"
180 "pushl $"#nr"-256\n\t" 
181 "jmp common_interrupt");

This macros is used at kernel initialization time to write out lowest interrupt stubs, which can be called from IDT by saving there offsets (pointers) in IDT gates. Kernel maintains one global array of function pointers (name of array - interrupt) in which it stores pointer of these stubs. Code related to creation of these stubs (using above mentioned BUILD_IRQ macro) and saving their pointers in global array "interrupt[NR_IRQS]" can be seen in file below. In this file you will see the usage of BUILD_IRQ macro to create the interrupt stubs as follows:

File: arch/i386/kernel/i8259.c

40 #define BI(x,y)
41         BUILD_IRQ(x##y)
42
43 #define BUILD_16_IRQS(x) 
44 BI(x,0) BI(x,1) BI(x,2) BI(x,3) 
45 BI(x,4) BI(x,5) BI(x,6) BI(x,7) 
46 BI(x,8) BI(x,9) BI(x,a) BI(x,b) 
47 BI(x,c) BI(x,d) BI(x,e) BI(x,f)
48
49 /*
50  * ISA PIC or low IO-APIC triggered (INTA-cycle or APIC) interrupts:
51  * (these are usually mapped to vectors 0x20-0x2f)
52  */
53 BUILD_16_IRQS(0x0)
54
55 #ifdef CONFIG_X86_IO_APIC
56 /*
57  * The IO-APIC gives us many more interrupt sources. Most of these
58  * are unused but an SMP system is supposed to have enough memory ...
59  * sometimes (mostly wrt. hw bugs) we get corrupted vectors all
60  * across the spectrum, so we really want to be prepared to get all
61  * of these. Plus, more powerful systems might have more than 64
62  * IO-APIC registers.
63  *
64  * (these are usually mapped into the 0x30-0xff vector range)
65  */
66                    BUILD_16_IRQS(0x1) BUILD_16_IRQS(0x2) BUILD_16_IRQS(0x3)
67 BUILD_16_IRQS(0x4) BUILD_16_IRQS(0x5) BUILD_16_IRQS(0x6) BUILD_16_IRQS(0x7)
68 BUILD_16_IRQS(0x8) BUILD_16_IRQS(0x9) BUILD_16_IRQS(0xa) BUILD_16_IRQS(0xb)
69 BUILD_16_IRQS(0xc) BUILD_16_IRQS(0xd)
70 #endif
71
72 #undef BUILD_16_IRQS
73 #undef BI

Above code actually creates interrupt stubs and do not place there pointers in interrupt[NR_IRQS] array. Code which places pointers of these stubs in global array is as follows and can be found in same file "arch/x86_64/kernel/i8259.c"

File: arch/i386/kernel/i8259.c

100 #define IRQ(x,y)
101         IRQ##x##y##_interrupt
102
103 #define IRQLIST_16(x) 
104 IRQ(x,0), IRQ(x,1), IRQ(x,2), IRQ(x,3), 
105 IRQ(x,4), IRQ(x,5), IRQ(x,6), IRQ(x,7), 
106 IRQ(x,8), IRQ(x,9), IRQ(x,a), IRQ(x,b), 
107 IRQ(x,c), IRQ(x,d), IRQ(x,e), IRQ(x,f)
108
109 void (*interrupt[NR_IRQS])(void) = {
110         IRQLIST_16(0x0),
111
112 #ifdef CONFIG_X86_IO_APIC
113         IRQLIST_16(0x1), IRQLIST_16(0x2), IRQLIST_16(0x3),
114         IRQLIST_16(0x4), IRQLIST_16(0x5), IRQLIST_16(0x6), IRQLIST_16(0x7),
115         IRQLIST_16(0x8), IRQLIST_16(0x9), IRQLIST_16(0xa), IRQLIST_16(0xb),
116         IRQLIST_16(0xc), IRQLIST_16(0xd)
117 #endif
118 };
119
120 #undef IRQ
121 #undef IRQLIST_16

Above code actually files global array of function pointers (array name interrupt[NR_IRQS]). Once global array is initialized with pointers to interrupt stubs, we initialize IDT (Interrupt Descriptor Table) in function "init_IRQ()" using this global array as follows:

File: arch/i386/kernel/i8259.c, Function: init_IRQ()


for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
       int vector = FIRST_EXTERNAL_VECTOR + i;
       if (i >= NR_IRQS)
              break;
       if (vector != IA32_SYSCALL_VECTOR && vector != KDB_VECTOR) {
              set_intr_gate(vector, interrupt[i]);
	   }
}

In above loop, we loop over all IDT entries staring from "FIRST_EXTERNAL_VECTOR" (32, because first 32 enteries are for exception) and call "set_intr_gate()" function which actually set interrupt gate descriptor. For entry 128, which is for system call invocation, interrupt gte is not set, for this rather trap gate is set and that is done in function trap_init(). In the same function init_IRQ(), after this looping, we initialize IPI (Interprocessor Interrupts). These interrupts are sent from one CPU to another CPU in SMP machines.
Now we can see once these IDT eneries are set, whenever an interrupt occurs, CPU directly jumps to the code given in BUILD_IRQ macro. Now lets analyse what this macro do. Following is the code for BUILD_IRQ macro:

File: include/asm-i386/hw_irq.h


#define BUILD_IRQ(nr) asmlinkage void IRQ_NAME(nr); 
__asm__( "\n.p2align\n"	  "IRQ" #nr "_interrupt:\n\t" "push $" #nr "-256 ; " "jmp common_interrupt");

This assembly code first subtracts IRQ number from 256 and pushes result on kernel stack. After doing this it jumps to "common_interrupt" assembly label, which simply saves context of interrupted process (CPU resigters) on to kernel stack and then calls the C language function "do_IRQ()".

Details of do_IRQ() function, interrupt handling core

do_IRQ() is the common function to all hardware interrupts. This function is the most important one to understand interrupt handling perspective. We will cut interleave function code with line by line explanation.

File: arch/i386/kernel/irq.c


563 asmlinkage unsigned int do_IRQ(struct pt_regs regs)
564 {
565         /*
566          * We ack quickly, we don't want the irq controller
567          * thinking we're snobs just because some other CPU has
568          * disabled global interrupts (we have already done the
569          * INT_ACK cycles, it's too late to try to pretend to the
570          * controller that we aren't taking the interrupt).
571          *
572          * 0 return value means that this irq is already being
573          * handled by some other CPU. (or is disabled)
574          */
575         int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_code  */
576         int cpu = smp_processor_id();
577         irq_desc_t *desc = irq_desc + irq;
578         struct irqaction * action;
579         unsigned int status;
580

Line - 575 to 577
Get the number of the interrupt that got triggered. Its pushed on kernel stack before pushing interrupted process context. Get processor or CPU id on which this code is being executed or in other words processor handling's CPU id fot this interrupt. Get pointer to IRQ descriptor. IRQ descriptor is a kernel data structure which actually binds together different ISRs (Interrupt Service Routines) register by device drivers on same IRQ line. As mentioned earlier also, same IRQ line can be shared between different devices, so their device drivers need to register their own ISRs to handle interrupts genetated by these devices. IRQ descriptor data structure is defined as follows:

typedef struct {
        unsigned int status;
        hw_irq_controller *handler;
        struct irqaction *action;
        unsigned int depth;
        spinlock_t lock;
} ____cacheline_aligned irq_desc_t;

Here is a description of theses fields :

status : It's a bit mask of different flags to identify particular IRQ line's state. We will see its differnet flags used further in this article.

handler : This is the structure pointer, whose each element is a pointer to function related to handling of physical PIC handling. These functions are used to mask/unmask particular interrupt lines in PIC or to acknowledge interrupts to PIC. Definitions of these PIC related functions can be found in file "arch/i386/kernel/i8259.c"

action : This element is ISRs list pointer registered by different device drivers for this IRQ line. When a device driver registers its ISR to kernel using kernel function "irq_request()", the ISR is added to this list for that particular IRQ line.

lock : This is a spinlock to handle synchronization problem while accessing any element in IRQ descriptor. Kernel execution context access different elements of IRQ descriptor, but before doing so they should acquire this spinlock so that synchronization can be maintained between probable concurrent access.

File: arch/i386/kernel/irq.c

581         kstat.irqs[cpu][irq]++;
582         spin_lock(&desc->lock);
583         desc->handler->ack(irq);

Line - 581 to 583
Here we increment interrupt count received by this CPU, this is maintained for accountancy purpose. Hold the spinlock before accessing any element of IRQ descriptor for our interrupt line. We also mask and acknowledge interrupt to PIC using our IRQ descriptor's handler function.

584         /*
585            REPLAY is when Linux resends an IRQ that was dropped earlier
586            WAITING is used by probe to mark irqs that are being tested
587         */
588         status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
589         status |= IRQ_PENDING; /* we _want_ to handle it */
590

Line - 588 to 589
Now we clear IRQ_REPLAY and IRQ_WAITING flags from our IRQ descriptor flag. As mentioned earlier this is used to maintain an interrupt handling line status. We clear these flags because now we are going to handle this interrupt that will not be in reply or waiting mode anymore. Actually IRQ_WAITING flag is used by device drivers in conjunction with IRQ_AUTODETECT flag for auto-detecting IRQ line to which their device is connected. Device drivers use probe_irq_on() function, which actually sets IRQ_AUTODETECT and IRQ_WAITING flag for all IRQ descriptors for whose no ISR has yet been registered. After calling probe_irq_on() function, device driver instructs device to trigger an interrupt and then calls probe_irq_off(0 function. probe_irq_off() function actually looks for those IRQ descriptors whose IRQ_AUTODETECT flag is still set but IRQ_WAITING flag has been cleared. Then it returns IRQ line number to device driver.

After clearing IRQ_REPLAY and IRQ_WAITING flags in do_IRQ() function we set IRQ_PENDING function. This is done to indicate that we are planning to handle this interrupt if this interrupt is not disabled or not being already handled by another CPU (in case of SMP machines). Use of setting IRQ_PENDING flag is explained in details in next few lines.

We have seen the interrupt and want to handle it by calling ISRs set (Interrupt Serive Routines) registered by different device drivers. We set IRQ_PENDING flag because seeing an interrupt does not mean we will for sure handle it. IRQ_PENDING flag helps us in following two cases:

In case interrupt is disabled (set flag IRQ_DISABLED), we will not service the interrupt and will just keep it marked as pending (set flag IRQ_PENDING). Once the interrupt is again enabled (clear flag IRQ_DISABLED), ISRs will be called to service the interrupt. So IRQ_PENDING helps us to remember the intterupt which occured while that interrupt was disabled due to some reason.

Note: Here disabling interrupt does not mean masking a particular line at PIC level or disabling all the interrupt at CPU level by clearing the IF flag of CPU EFLAG register. Disabling here means kernel has been asked not to service interrupt, however hardware triggering of interrupt signal is not being stopped at all.

In case another CPU is already handling the previous interrupt requests on this IRQ line. In this case flag IRQ_INPROGRESS will already be set by that another CPU. Our role will be to just mark the interrupt as IRQ_PENDING and in away asks that other CPU to service this interrupt request also. When that CPU will finish its handling of previous interrupt, it will check this flag. Because of this flag being set by us, that CPU will again go and call all the ISRs once agian to service interrupt request we received on this IRQ line.

591         /*
592          * If the IRQ is disabled for whatever reason, we cannot
593          * use the action we have.
594          */
595         action = NULL;
596         if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) {
597                 action = desc->action;
598                 status &= ~IRQ_PENDING; /* we commit to handling */
599                 status |= IRQ_INPROGRESS; /* we are handling it */
600         }
601         desc->status = status;
602

Line - 595 to 601
Now we check if this interrupt is not disabled (flag IRQ_DISABLED is clear) and at the same time is also not being handled by another CPU (flag IRQ_INPROGRESS is also clear), we go forward and clear IRQ_PENDING flag and sets IRQ_INPROGRESS flag to indicate that we take responsibility of handling this interrupt request. Now while we are handling this interrupt request, let another CPU receives an interrupt on same IRQ line, that CPU will simple mark IRQ_PENDING flag and will transfer his responsibility to us, in that case we (CPU that is on execution) will be responsible to serve that interrupt request also.

603 /* 604 * If there is no IRQ handler or it was disabled, exit early. 605 Since we set PENDING, if another processor is handling 606 a different instance of this same irq, the other processor 607 will take care of it. 608 */ 609 if (!action) 610 goto out;

Line - 609 to 610
If there is no registered ISR for this IRQ line, we simply return from interrupt context after releasig the lock we hold and serving the softirqs (if any pending).

611
612         /*
613          * Edge triggered interrupts need to remember
614          * pending events.
615          * This applies to any hw interrupts that allow a second
616          * instance of the same irq to arrive while we are in do_IRQ
617          * or in the handler. But the code here only handles the _second_
618          * instance of the irq, not the third or fourth. So it is mostly
619          * useful for irq hardware that does not mask cleanly in an
620          * SMP environment.
621          */
622         for (;;) {
623                 spin_unlock(&desc->lock);
624                 handle_IRQ_event(irq, &regs, action);
625                 spin_lock(&desc->lock);
626
627                 if (!(desc->status & IRQ_PENDING))
628                         break;
629                 desc->status &= ~IRQ_PENDING;
630         }

Line - 622 to 630
Now we are ready to call registered ISRs (device driver's functions), so that they can figure out which device connected to this IRQ line has actually triggered an interrupt and can serve it properly. Before calling ISRs, we release IRQ descriptor spinlock so that while we are executing ISRs this spinlock can be acquired by another interrupt context, which may execute on another CPU for the same IRQ line. This interrupt context on another CPU will simply tag IRQ_PENDING flag and return without handling interrupt itself. In this infite loop we call handle_IRQ_event() function which actually calls one by one all registrered ISRs for this IRQ line. After exhausting ISRs list, we again acquire the IRQ descriptor spinlock as we need to again check and update flag element of IRQ descriptor. After spinlock acquisition, we check if IRQ_PENDING flag is clear then we break out of this infinite loop, else we clear that IRQ_PENDING flag of our IRQ descriptor and again go into handle_IRQ_event() function to serve new interrupt request as indicated by IRQ_PENDING flag.

631         desc->status &= ~IRQ_INPROGRESS;

Line - 631
Finally we come out of above mentioned infinite loop only if there is not pending request for this IRQ line. Once we are out, it's mostly done for that part, so we clear IRQ_INPROGRESS flag.

632 out:
633         /*
634          * The ->end() handler has to deal with interrupts which got
635          * disabled while the handler was running.
636          */
637         desc->handler->end(irq);
638         spin_unlock(&desc->lock);
639
640         if (softirq_pending(cpu))
641                 do_softirq();
642         return 1;
643 }

Line - 637 to 638
Now we call last function of PIC related functions stored in our IRQ descriptor's handler field. This function take care of situation when interrupt we were handling got disabled while we were handling it. Let's say while we were serving interrupt by calling all ISRs for it, interrupt got disabled (flag IRQ_DISABLED is set) by code running on another CPU for instance, then in this case we should not unmask interrupt line (which we masked by calling the PIC related ack() function, line 583). If IRQ is not yet disabled, this function end() will simply unmask interrupt line at PIC level and return. After this we go ahead and do serve pending softirqs (if any marked). We will see in next section what are softirqs. I will soon post details of softirqs, tasklets and bottom halfs, so keep looking for that on my blog.

Agregated and Rearranged content, posted by Linux Kernel Group @ Blogspot

COPYRIGHT_DOC

May the 9, 2025

Last udpated on 28 Oct 07