6. PCIe for DV Engineers - Interrupts (INTx, MSI, MSI-X)
Part 6 of the PCIe for DV Engineers series
Your DUT just completed a DMA transfer — 4 KB of data sitting in host memory, ready for the CPU. But how does the CPU know? It doesn't poll. It doesn't spin-wait. Something has to tap it on the shoulder and say "your data is ready." That something is an interrupt.
PCIe supports three generations of interrupt mechanisms, each born from the pain points of the one before it. In this post, we'll trace the end-to-end delivery flow for each — from the moment a device decides to interrupt, through the PCIe fabric, to the CPU running your ISR. Along the way, we'll build verification scenarios for the tricky corners where bugs love to hide.
Prerequisites: Part 1 — Architecture, Part 4 — Transaction Layer (TLP basics)
Legacy INTx: Shared Wires, Shared Pain
PCI had four physical interrupt pins — INTA# through INTD#. Multiple devices shared them, and the ISR had to poll every device to find which one actually interrupted. PCIe has no physical interrupt wires, but it emulates this legacy behavior using in-band message TLPs.
How INTx Works
When a PCIe endpoint needs to raise an interrupt, it sends an Assert_INTx message TLP upstream to the Root Complex. When the interrupt condition is cleared (after the ISR runs), the device sends a Deassert_INTx message TLP. This pair emulates level-triggered behavior over a packet-based link.
sequenceDiagram
participant EP as Endpoint
participant SW as Switch
participant RC as Root Complex
participant CPU as CPU
EP->>SW: Assert_INTA Message TLP
SW->>RC: Forward (with swizzle)
RC->>CPU: Assert interrupt line
Note over CPU: ISR runs, clears device
EP->>SW: Deassert_INTA Message TLP
SW->>RC: Forward
RC->>CPU: Deassert interrupt line
The four interrupt lines map to message codes 0x20–0x27:
| Message Code | Assert | Deassert |
|---|---|---|
| INTA | 0x20 | 0x24 |
| INTB | 0x21 | 0x25 |
| INTC | 0x22 | 0x26 |
| INTD | 0x23 | 0x27 |
Config Space: Interrupt Pin & Line
Two registers in the standard config header control INTx:
- Interrupt Pin (offset
0x3D, read-only): Which virtual pin this function uses (1=INTA, 2=INTB, 3=INTC, 4=INTD, 0=none) - Interrupt Line (offset
0x3C, read-write): Written by firmware/OS during enumeration. Maps to system interrupt controller input. Has no hardware effect on PCIe — purely software bookkeeping.
The Command Register bit 10 (Interrupt Disable) suppresses Assert_INTx. If set while already asserted, the device must send Deassert_INTx.
The Problems
INTx has real issues that matter for verification:
- Sharing — Multiple devices on the same interrupt line. ISR must poll every device.
- Two TLPs per interrupt — Assert + Deassert pair, vs. a single message for MSI.
- Race conditions — If the ISR clears the device interrupt but the Deassert TLP is delayed, spurious interrupts can occur.
- No vector info — The ISR only knows "something on INTA interrupted" — it must determine the cause.
DV Insight: Verify that Assert/Deassert ordering is strict — no Deassert without a prior Assert, no duplicate Assert without an intervening Deassert. Also test the Command Register Interrupt Disable behavior: setting bit 10 while asserted must trigger a Deassert_INTx TLP.
$ lspci -s 03:00.0 -vv | grep -i interrupt
Interrupt: pin A routed to IRQ 16
This tells us the device uses INTA (pin A) and firmware mapped it to system IRQ 16.
MSI: Messages Replace Wires
MSI (Message Signaled Interrupts), introduced in PCI 2.2, replaced the entire wire-based approach with a beautifully simple idea: an interrupt is just a memory write.
How MSI Works
Instead of toggling a virtual wire, the device writes a specific data value to a specific memory address. That address targets the CPU's interrupt controller (Local APIC on x86, at address 0xFEExxxxx). The write is a normal Memory Write TLP — no special handling needed by the PCIe fabric.
sequenceDiagram
participant EP as Endpoint
participant Fabric as PCIe Fabric
participant RC as Root Complex
participant APIC as Interrupt Controller
participant CPU as CPU
Note over EP: Interrupt condition
EP->>Fabric: Memory Write TLP<br/>Addr: 0xFEExxxxx<br/>Data: vector info
Fabric->>RC: Normal posted write routing
RC->>APIC: Deliver to LAPIC
APIC->>CPU: Interrupt vector N
Note over CPU: ISR runs directly
One TLP. No assert/deassert pair. No sharing. The data payload tells the interrupt controller exactly which vector to fire.
MSI Capability Structure (ID 0x05)
The MSI capability lives in configuration space:
| Register | Key Fields |
|---|---|
| Message Control | MSI Enable (bit 0), Multiple Message Capable (bits 3:1, log₂ of vectors, max 32), Multiple Message Enable (bits 6:4, granted by software), 64-bit Capable (bit 7), Per-Vector Masking (bit 8) |
| Message Address | Target address for interrupt delivery (typically APIC region) |
| Message Data | Base vector info; lower bits modified for multi-vector mode |
| Mask Bits | Per-vector mask register (optional) |
| Pending Bits | Read-only, set when masked interrupt fires (optional) |
Vector Selection
When a function requests vector N out of M allocated vectors, it modifies the lower log₂(M) bits of the Message Data:
Allocated: 8 vectors (MME = 3, so lower 3 bits used)
Base Data: 0x0040
Vector 0 → Data = 0x0040 (lower 3 bits = 000)
Vector 5 → Data = 0x0045 (lower 3 bits = 101)
Vector 7 → Data = 0x0047 (lower 3 bits = 111)
Software must allocate contiguous, naturally-aligned vectors. All vectors share the same address — you can't route different vectors to different CPUs.
DV Insight: The vector negotiation is a common bug source. The device requests M vectors (Multiple Message Capable), but software may grant fewer (Multiple Message Enable ≤ MMC). Verify the device operates correctly with fewer vectors than requested and never uses more than granted.
MSI-X: Scalable, Per-Queue Interrupts
MSI-X, introduced in PCI 3.0, takes the MSI concept and makes it fully scalable. Instead of a single address/data pair in config space, MSI-X uses a memory-mapped table in BAR space with independent entries per vector.
How MSI-X Works
sequenceDiagram
participant EP as Endpoint
participant Table as MSI-X Table<br/>(in BAR)
participant Fabric as PCIe Fabric
participant RC as Root Complex
participant CPU as CPU
Note over EP: Interrupt on vector N
EP->>Table: Read Table[N]
Note over Table: Addr, Data, Mask
alt Vector unmasked
EP->>Fabric: Memory Write TLP<br/>Addr: Table[N].Addr<br/>Data: Table[N].Data
Fabric->>RC: Posted write
RC->>CPU: Interrupt vector N
else Vector masked
Note over EP: Set PBA[N] = 1
Note over EP: No TLP sent
end
MSI-X Capability Structure (ID 0x11)
The capability header is compact — just 3 DWORDs — because the actual data lives in BAR memory:
| Register | Key Fields |
|---|---|
| Message Control | MSI-X Enable (bit 15), Function Mask (bit 14), Table Size (bits 10:0, value = N-1, max 2048) |
| Table Offset / BIR | Bits 2:0 = which BAR contains the table; Bits 31:3 = byte offset within that BAR |
| PBA Offset / BIR | Bits 2:0 = which BAR contains the PBA; Bits 31:3 = byte offset |
The MSI-X Table
Each table entry is 16 bytes, independently programmable:
Offset Field Size Notes
+0x00 Message Address 32-bit Lower address (can target different CPUs per vector)
+0x04 Message Upper Address 32-bit Upper address (for 64-bit addressing)
+0x08 Message Data 32-bit Full 32-bit data (vs 16-bit for MSI)
+0x0C Vector Control 32-bit Bit 0 = Mask (1=masked, 0=unmasked)
This is the key advantage over MSI: each vector has its own address/data pair. Vector 0 can target CPU 0, vector 1 can target CPU 3 — enabling true per-queue interrupt affinity. Modern NVMe controllers use this to assign one interrupt vector per submission queue, each targeting the CPU that owns that queue.
The Pending Bit Array (PBA)
The PBA solves a critical problem: what happens when an interrupt fires while masked?
- Vector masked + interrupt fires → hardware sets
PBA[N] = 1, no TLP sent - Software unmasks the vector → hardware checks PBA
- If
PBA[N] == 1→ hardware sends the interrupt TLP and clears the PBA bit
This guarantees no interrupts are lost during masking. The PBA is read-only to software — only hardware sets and clears it.
Function Mask vs. Per-Vector Mask
MSI-X has two masking layers:
- Function Mask (Message Control bit 14): Globally masks all vectors. Doesn't alter individual mask bits.
- Per-Vector Mask (Vector Control bit 0): Masks individual vectors.
Both must be 0 for a vector to deliver interrupts. Clearing Function Mask re-evaluates all per-vector masks.
DV Insight: The masking interaction is a verification goldmine. Test: set Function Mask while interrupts are pending → verify PBA bits get set. Clear Function Mask → verify only individually-unmasked vectors fire. This is where mask/unmask race condition bugs live.
$ lspci -vv -s 03:00.0 | grep -A3 MSI-X
Capabilities: [70] MSI-X: Enable+ Count=33 Masked-
Vector table: BAR=0 offset=00003000
PBA: BAR=0 offset=00003100
$ cat /proc/interrupts | grep nvme
31: 1024 0 0 0 PCI-MSI 524288-edge nvme0q0
32: 0 8451 0 0 PCI-MSI 524289-edge nvme0q1
33: 0 0 7234 0 PCI-MSI 524290-edge nvme0q2
34: 0 0 0 6118 PCI-MSI 524291-edge nvme0q3
Notice each queue targets a different CPU — that's MSI-X per-vector addressing in action.
Side-by-Side Comparison
| Feature | INTx | MSI | MSI-X |
|---|---|---|---|
| Mechanism | Assert/Deassert message TLPs | Memory Write TLP | Memory Write TLP |
| Max Vectors | 4 (shared) | 32 | 2048 |
| Sharing | Yes | No | No |
| Per-Vector Address | N/A | No (single address) | Yes |
| Per-Vector Masking | N/A | Optional | Mandatory |
| Config Space | Interrupt Pin/Line | Capability 0x05 | Capability 0x11 + BAR table |
| TLPs per Interrupt | 2 (Assert + Deassert) | 1 | 1 |
| Data Ordering | Separate from data TLPs | Guaranteed (posted write) | Guaranteed (posted write) |
The ordering guarantee is worth emphasizing: MSI and MSI-X interrupts are Memory Write TLPs. PCIe ordering rules guarantee that posted writes are delivered in order. So when the ISR runs, all preceding DMA data is guaranteed to be in host memory. No explicit flush needed — the protocol handles it.
Verification Deep Dive
End-to-End Interrupt Test Sequence
// UVM test: MSI-X interrupt delivery with masking
class pcie_msix_interrupt_test extends pcie_base_test;
virtual task body();
pcie_config_seq cfg_seq;
pcie_dma_seq dma_seq;
// Step 1: Enable MSI-X
cfg_seq = pcie_config_seq::type_id::create("cfg_seq");
cfg_seq.write_msix_enable(1'b1);
cfg_seq.start(env.agent.sequencer);
// Step 2: Program vector 0 (address, data, unmask)
cfg_seq.write_msix_table_entry(
.vector(0),
.addr(APIC_BASE_ADDR),
.data(32'h0040),
.mask(1'b0)
);
cfg_seq.start(env.agent.sequencer);
// Step 3: Trigger DMA that generates interrupt
dma_seq = pcie_dma_seq::type_id::create("dma_seq");
dma_seq.set_interrupt_vector(0);
dma_seq.start(env.agent.sequencer);
// Step 4: Wait for and verify interrupt TLP
wait_for_interrupt_tlp(
.expected_addr(APIC_BASE_ADDR),
.expected_data(32'h0040),
.timeout(INTERRUPT_TIMEOUT)
);
// Step 5: Verify DMA data arrived before interrupt
check_dma_data_valid();
endtask
endclass
Coverage Model
covergroup cg_interrupt_mechanisms @(posedge clk);
// Which mechanism is active
cp_active_mode: coverpoint active_interrupt_mode {
bins intx = {MODE_INTX};
bins msi = {MODE_MSI};
bins msix = {MODE_MSIX};
}
// Mode transitions (only one active at a time)
cp_mode_transition: coverpoint active_interrupt_mode {
bins intx_to_msi = (MODE_INTX => MODE_MSI);
bins intx_to_msix = (MODE_INTX => MODE_MSIX);
bins msi_to_msix = (MODE_MSI => MODE_MSIX);
bins msix_to_intx = (MODE_MSIX => MODE_INTX);
}
// MSI-X masking scenarios
cp_msix_mask_state: coverpoint {func_mask, vec_mask, pba_pending} {
bins unmasked_idle = {3'b000};
bins vec_masked_pending = {3'b011};
bins func_masked_pending = {3'b101};
bins both_masked_pending = {3'b111};
bins unmask_with_pending = {3'b010}; // transition target
}
// MSI vector utilization
cp_msi_vector: coverpoint msi_vector_used {
bins vectors[] = {[0:31]};
}
endgroup
Key Assertions
// Mutual exclusion: only one mechanism active at a time
assert property (@(posedge clk)
$onehot0({intx_active, msi_enabled, msix_enabled})
) else $error("Multiple interrupt mechanisms active simultaneously");
// INTx: no Deassert without prior Assert
assert property (@(posedge clk)
deassert_intx_sent |-> intx_is_currently_asserted
) else $error("Deassert_INTx sent without prior Assert");
// MSI-X: PBA set when masked vector has pending interrupt
assert property (@(posedge clk)
(interrupt_pending && vector_masked) |=> pba_bit_set
) else $error("PBA not set for masked pending interrupt");
// MSI-X: interrupt TLP not sent while vector is masked
assert property (@(posedge clk)
vector_masked |-> !interrupt_tlp_sent_for_vector
) else $error("Interrupt TLP sent for masked vector");
// Ordering: interrupt TLP must not pass preceding data writes
assert property (@(posedge clk)
interrupt_tlp_queued |-> all_prior_posted_writes_committed
) else $error("Interrupt TLP overtook preceding data write");
Common Bugs to Hunt
| Bug | Where | What to Check |
|---|---|---|
| Wrong vector in MSI data | MSI | Lower bits of Message Data not matching vector number |
| Stale PBA bits | MSI-X | PBA not cleared after unmask + delivery → ghost interrupts |
| Mask/pending race | MSI-X | Interrupt fires between software writing mask and hardware reading it |
| INTx not deasserted | INTx | Device clears interrupt source but never sends Deassert TLP |
| Wrong BIR offset | MSI-X | Table/PBA BIR points to wrong BAR → accesses wrong memory |
| Interrupt after FLR | MSI-X | Function Level Reset doesn't clear PBA → spurious interrupt post-reset |
Interview Corner
Q1: Why can't MSI route different vectors to different CPUs?
All MSI vectors share a single Message Address register. Since the address determines which CPU's APIC receives the interrupt, all vectors go to the same CPU. MSI-X solves this with per-vector address/data entries.
Q2: How does PCIe guarantee data arrives before the interrupt?
MSI/MSI-X interrupts are Memory Write TLPs (posted). PCIe ordering rules require posted writes to be delivered in order. Since the interrupt TLP is queued after the DMA data writes, the data is guaranteed to be in memory when the ISR runs.
Q3: What happens when you enable MSI-X while INTx is asserted?
The device must send Deassert_INTx before MSI-X takes over. The PCIe spec requires mutual exclusion — only one mechanism is active at a time. Verification must cover this transition to catch stale assert states.
Q4: Why does MSI-X use a BAR-mapped table instead of config space?
Config space access is slow (Type 0/1 Configuration TLPs). MSI-X can have up to 2048 entries at 16 bytes each = 32 KB of data. This doesn't fit in config space and would be painfully slow to program. BAR-mapped memory allows fast MMIO access using regular Memory Write TLPs.
Key Takeaways
- INTx emulates legacy interrupt wires with Assert/Deassert message TLPs — shared, slow, two TLPs per interrupt
- MSI replaces wires with a Memory Write TLP — no sharing, single TLP, up to 32 vectors, but all vectors share one address
- MSI-X scales MSI with a BAR-mapped table — per-vector addressing, up to 2048 vectors, mandatory per-vector masking with PBA
- The ordering guarantee (interrupt can't pass preceding data) is a protocol-level feature of MSI/MSI-X — it comes free from PCIe posted write ordering
- PBA behavior (set on masked interrupt, clear on unmask) is a rich verification target — mask/unmask races and stale bits are common bugs
- Only one mechanism can be active per function at a time — transitions between them must be verified
What's Next
In Part 7, we'll explore DMA & IOMMU — how PCIe devices read and write host memory directly, how the IOMMU provides address translation and isolation, and the verification challenges of bus mastering, scatter-gather lists, and IOMMU bypass testing.
Previous: Part 5 — Configuration Space & BARs | Next: Part 7 — DMA & IOMMU (coming soon)
Comments (0)
Leave a Comment