Architecture & Design Verification: 6. PCIe for DV Engineers

Part 6 of the PCIe for DV Engineers series

Your DUT just completed a DMA transfer — 4 KB of data sitting in host memory, ready for the CPU. But how does the CPU know? It doesn't poll. It doesn't spin-wait. Something has to tap it on the shoulder and say "your data is ready." That something is an interrupt.

PCIe supports three generations of interrupt mechanisms, each born from the pain points of the one before it. In this post, we'll trace the end-to-end delivery flow for each — from the moment a device decides to interrupt, through the PCIe fabric, to the CPU running your ISR. Along the way, we'll build verification scenarios for the tricky corners where bugs love to hide.

Prerequisites: Part 1 — Architecture, Part 4 — Transaction Layer (TLP basics)

Legacy INTx: Shared Wires, Shared Pain

PCI had four physical interrupt pins — INTA# through INTD#. Multiple devices shared them, and the ISR had to poll every device to find which one actually interrupted. PCIe has no physical interrupt wires, but it emulates this legacy behavior using in-band message TLPs.

How INTx Works

When a PCIe endpoint needs to raise an interrupt, it sends an Assert_INTx message TLP upstream to the Root Complex. When the interrupt condition is cleared (after the ISR runs), the device sends a Deassert_INTx message TLP. This pair emulates level-triggered behavior over a packet-based link.

sequenceDiagram
    participant EP as Endpoint
    participant SW as Switch
    participant RC as Root Complex
    participant CPU as CPU

    EP->>SW: Assert_INTA Message TLP
    SW->>RC: Forward (with swizzle)
    RC->>CPU: Assert interrupt line
    Note over CPU: ISR runs, clears device
    EP->>SW: Deassert_INTA Message TLP
    SW->>RC: Forward
    RC->>CPU: Deassert interrupt line

The four interrupt lines map to message codes 0x20–0x27:

Message Code	Assert	Deassert
INTA	`0x20`	`0x24`
INTB	`0x21`	`0x25`
INTC	`0x22`	`0x26`
INTD	`0x23`	`0x27`

Config Space: Interrupt Pin & Line

Two registers in the standard config header control INTx:

Interrupt Pin (offset 0x3D, read-only): Which virtual pin this function uses (1=INTA, 2=INTB, 3=INTC, 4=INTD, 0=none)
Interrupt Line (offset 0x3C, read-write): Written by firmware/OS during enumeration. Maps to system interrupt controller input. Has no hardware effect on PCIe — purely software bookkeeping.

The Command Register bit 10 (Interrupt Disable) suppresses Assert_INTx. If set while already asserted, the device must send Deassert_INTx.

The Problems

INTx has real issues that matter for verification:

Sharing — Multiple devices on the same interrupt line. ISR must poll every device.
Two TLPs per interrupt — Assert + Deassert pair, vs. a single message for MSI.
Race conditions — If the ISR clears the device interrupt but the Deassert TLP is delayed, spurious interrupts can occur.
No vector info — The ISR only knows "something on INTA interrupted" — it must determine the cause.

DV Insight: Verify that Assert/Deassert ordering is strict — no Deassert without a prior Assert, no duplicate Assert without an intervening Deassert. Also test the Command Register Interrupt Disable behavior: setting bit 10 while asserted must trigger a Deassert_INTx TLP.

$ lspci -s 03:00.0 -vv | grep -i interrupt
    Interrupt: pin A routed to IRQ 16

This tells us the device uses INTA (pin A) and firmware mapped it to system IRQ 16.

MSI: Messages Replace Wires

MSI (Message Signaled Interrupts), introduced in PCI 2.2, replaced the entire wire-based approach with a beautifully simple idea: an interrupt is just a memory write.

How MSI Works

Instead of toggling a virtual wire, the device writes a specific data value to a specific memory address. That address targets the CPU's interrupt controller (Local APIC on x86, at address 0xFEExxxxx). The write is a normal Memory Write TLP — no special handling needed by the PCIe fabric.

sequenceDiagram
    participant EP as Endpoint
    participant Fabric as PCIe Fabric
    participant RC as Root Complex
    participant APIC as Interrupt Controller
    participant CPU as CPU

    Note over EP: Interrupt condition
    EP->>Fabric: Memory Write TLP<br/>Addr: 0xFEExxxxx<br/>Data: vector info
    Fabric->>RC: Normal posted write routing
    RC->>APIC: Deliver to LAPIC
    APIC->>CPU: Interrupt vector N
    Note over CPU: ISR runs directly

One TLP. No assert/deassert pair. No sharing. The data payload tells the interrupt controller exactly which vector to fire.

MSI Capability Structure (ID 0x05)

The MSI capability lives in configuration space:

Register	Key Fields
Message Control	MSI Enable (bit 0), Multiple Message Capable (bits 3:1, log₂ of vectors, max 32), Multiple Message Enable (bits 6:4, granted by software), 64-bit Capable (bit 7), Per-Vector Masking (bit 8)
Message Address	Target address for interrupt delivery (typically APIC region)
Message Data	Base vector info; lower bits modified for multi-vector mode
Mask Bits	Per-vector mask register (optional)
Pending Bits	Read-only, set when masked interrupt fires (optional)

Vector Selection

When a function requests vector N out of M allocated vectors, it modifies the lower log₂(M) bits of the Message Data:

Allocated: 8 vectors (MME = 3, so lower 3 bits used)
Base Data: 0x0040
Vector 0 → Data = 0x0040 (lower 3 bits = 000)
Vector 5 → Data = 0x0045 (lower 3 bits = 101)
Vector 7 → Data = 0x0047 (lower 3 bits = 111)

Software must allocate contiguous, naturally-aligned vectors. All vectors share the same address — you can't route different vectors to different CPUs.

DV Insight: The vector negotiation is a common bug source. The device requests M vectors (Multiple Message Capable), but software may grant fewer (Multiple Message Enable ≤ MMC). Verify the device operates correctly with fewer vectors than requested and never uses more than granted.

MSI-X: Scalable, Per-Queue Interrupts

MSI-X, introduced in PCI 3.0, takes the MSI concept and makes it fully scalable. Instead of a single address/data pair in config space, MSI-X uses a memory-mapped table in BAR space with independent entries per vector.

How MSI-X Works

sequenceDiagram
    participant EP as Endpoint
    participant Table as MSI-X Table<br/>(in BAR)
    participant Fabric as PCIe Fabric
    participant RC as Root Complex
    participant CPU as CPU

    Note over EP: Interrupt on vector N
    EP->>Table: Read Table[N]
    Note over Table: Addr, Data, Mask
    alt Vector unmasked
        EP->>Fabric: Memory Write TLP<br/>Addr: Table[N].Addr<br/>Data: Table[N].Data
        Fabric->>RC: Posted write
        RC->>CPU: Interrupt vector N
    else Vector masked
        Note over EP: Set PBA[N] = 1
        Note over EP: No TLP sent
    end

MSI-X Capability Structure (ID 0x11)

The capability header is compact — just 3 DWORDs — because the actual data lives in BAR memory:

Register	Key Fields
Message Control	MSI-X Enable (bit 15), Function Mask (bit 14), Table Size (bits 10:0, value = N-1, max 2048)
Table Offset / BIR	Bits 2:0 = which BAR contains the table; Bits 31:3 = byte offset within that BAR
PBA Offset / BIR	Bits 2:0 = which BAR contains the PBA; Bits 31:3 = byte offset

The MSI-X Table

Each table entry is 16 bytes, independently programmable:

Offset  Field                   Size    Notes
+0x00   Message Address         32-bit  Lower address (can target different CPUs per vector)
+0x04   Message Upper Address   32-bit  Upper address (for 64-bit addressing)
+0x08   Message Data            32-bit  Full 32-bit data (vs 16-bit for MSI)
+0x0C   Vector Control          32-bit  Bit 0 = Mask (1=masked, 0=unmasked)

This is the key advantage over MSI: each vector has its own address/data pair. Vector 0 can target CPU 0, vector 1 can target CPU 3 — enabling true per-queue interrupt affinity. Modern NVMe controllers use this to assign one interrupt vector per submission queue, each targeting the CPU that owns that queue.

The Pending Bit Array (PBA)

The PBA solves a critical problem: what happens when an interrupt fires while masked?

Vector masked + interrupt fires → hardware sets PBA[N] = 1, no TLP sent
Software unmasks the vector → hardware checks PBA
If PBA[N] == 1 → hardware sends the interrupt TLP and clears the PBA bit

This guarantees no interrupts are lost during masking. The PBA is read-only to software — only hardware sets and clears it.

Function Mask vs. Per-Vector Mask

MSI-X has two masking layers:

Function Mask (Message Control bit 14): Globally masks all vectors. Doesn't alter individual mask bits.
Per-Vector Mask (Vector Control bit 0): Masks individual vectors.

Both must be 0 for a vector to deliver interrupts. Clearing Function Mask re-evaluates all per-vector masks.

DV Insight: The masking interaction is a verification goldmine. Test: set Function Mask while interrupts are pending → verify PBA bits get set. Clear Function Mask → verify only individually-unmasked vectors fire. This is where mask/unmask race condition bugs live.

$ lspci -vv -s 03:00.0 | grep -A3 MSI-X
    Capabilities: [70] MSI-X: Enable+ Count=33 Masked-
        Vector table: BAR=0 offset=00003000
        PBA: BAR=0 offset=00003100

$ cat /proc/interrupts | grep nvme
 31:   1024      0      0      0  PCI-MSI  524288-edge  nvme0q0
 32:      0   8451      0      0  PCI-MSI  524289-edge  nvme0q1
 33:      0      0   7234      0  PCI-MSI  524290-edge  nvme0q2
 34:      0      0      0   6118  PCI-MSI  524291-edge  nvme0q3

Notice each queue targets a different CPU — that's MSI-X per-vector addressing in action.

Side-by-Side Comparison

Feature	INTx	MSI	MSI-X
Mechanism	Assert/Deassert message TLPs	Memory Write TLP	Memory Write TLP
Max Vectors	4 (shared)	32	2048
Sharing	Yes	No	No
Per-Vector Address	N/A	No (single address)	Yes
Per-Vector Masking	N/A	Optional	Mandatory
Config Space	Interrupt Pin/Line	Capability 0x05	Capability 0x11 + BAR table
TLPs per Interrupt	2 (Assert + Deassert)	1	1
Data Ordering	Separate from data TLPs	Guaranteed (posted write)	Guaranteed (posted write)

The ordering guarantee is worth emphasizing: MSI and MSI-X interrupts are Memory Write TLPs. PCIe ordering rules guarantee that posted writes are delivered in order. So when the ISR runs, all preceding DMA data is guaranteed to be in host memory. No explicit flush needed — the protocol handles it.

Verification Deep Dive

End-to-End Interrupt Test Sequence

// UVM test: MSI-X interrupt delivery with masking
class pcie_msix_interrupt_test extends pcie_base_test;

  virtual task body();
    pcie_config_seq cfg_seq;
    pcie_dma_seq    dma_seq;

    // Step 1: Enable MSI-X
    cfg_seq = pcie_config_seq::type_id::create("cfg_seq");
    cfg_seq.write_msix_enable(1'b1);
    cfg_seq.start(env.agent.sequencer);

    // Step 2: Program vector 0 (address, data, unmask)
    cfg_seq.write_msix_table_entry(
      .vector(0),
      .addr(APIC_BASE_ADDR),
      .data(32'h0040),
      .mask(1'b0)
    );
    cfg_seq.start(env.agent.sequencer);

    // Step 3: Trigger DMA that generates interrupt
    dma_seq = pcie_dma_seq::type_id::create("dma_seq");
    dma_seq.set_interrupt_vector(0);
    dma_seq.start(env.agent.sequencer);

    // Step 4: Wait for and verify interrupt TLP
    wait_for_interrupt_tlp(
      .expected_addr(APIC_BASE_ADDR),
      .expected_data(32'h0040),
      .timeout(INTERRUPT_TIMEOUT)
    );

    // Step 5: Verify DMA data arrived before interrupt
    check_dma_data_valid();
  endtask

endclass

Coverage Model

covergroup cg_interrupt_mechanisms @(posedge clk);

  // Which mechanism is active
  cp_active_mode: coverpoint active_interrupt_mode {
    bins intx  = {MODE_INTX};
    bins msi   = {MODE_MSI};
    bins msix  = {MODE_MSIX};
  }

  // Mode transitions (only one active at a time)
  cp_mode_transition: coverpoint active_interrupt_mode {
    bins intx_to_msi   = (MODE_INTX  => MODE_MSI);
    bins intx_to_msix  = (MODE_INTX  => MODE_MSIX);
    bins msi_to_msix   = (MODE_MSI   => MODE_MSIX);
    bins msix_to_intx  = (MODE_MSIX  => MODE_INTX);
  }

  // MSI-X masking scenarios
  cp_msix_mask_state: coverpoint {func_mask, vec_mask, pba_pending} {
    bins unmasked_idle       = {3'b000};
    bins vec_masked_pending  = {3'b011};
    bins func_masked_pending = {3'b101};
    bins both_masked_pending = {3'b111};
    bins unmask_with_pending = {3'b010}; // transition target
  }

  // MSI vector utilization
  cp_msi_vector: coverpoint msi_vector_used {
    bins vectors[] = {[0:31]};
  }

endgroup

Key Assertions

// Mutual exclusion: only one mechanism active at a time
assert property (@(posedge clk)
  $onehot0({intx_active, msi_enabled, msix_enabled})
) else $error("Multiple interrupt mechanisms active simultaneously");

// INTx: no Deassert without prior Assert
assert property (@(posedge clk)
  deassert_intx_sent |-> intx_is_currently_asserted
) else $error("Deassert_INTx sent without prior Assert");

// MSI-X: PBA set when masked vector has pending interrupt
assert property (@(posedge clk)
  (interrupt_pending && vector_masked) |=> pba_bit_set
) else $error("PBA not set for masked pending interrupt");

// MSI-X: interrupt TLP not sent while vector is masked
assert property (@(posedge clk)
  vector_masked |-> !interrupt_tlp_sent_for_vector
) else $error("Interrupt TLP sent for masked vector");

// Ordering: interrupt TLP must not pass preceding data writes
assert property (@(posedge clk)
  interrupt_tlp_queued |-> all_prior_posted_writes_committed
) else $error("Interrupt TLP overtook preceding data write");

Common Bugs to Hunt

Bug	Where	What to Check
Wrong vector in MSI data	MSI	Lower bits of Message Data not matching vector number
Stale PBA bits	MSI-X	PBA not cleared after unmask + delivery → ghost interrupts
Mask/pending race	MSI-X	Interrupt fires between software writing mask and hardware reading it
INTx not deasserted	INTx	Device clears interrupt source but never sends Deassert TLP
Wrong BIR offset	MSI-X	Table/PBA BIR points to wrong BAR → accesses wrong memory
Interrupt after FLR	MSI-X	Function Level Reset doesn't clear PBA → spurious interrupt post-reset

Interview Corner

Q1: Why can't MSI route different vectors to different CPUs?

All MSI vectors share a single Message Address register. Since the address determines which CPU's APIC receives the interrupt, all vectors go to the same CPU. MSI-X solves this with per-vector address/data entries.

Q2: How does PCIe guarantee data arrives before the interrupt?

MSI/MSI-X interrupts are Memory Write TLPs (posted). PCIe ordering rules require posted writes to be delivered in order. Since the interrupt TLP is queued after the DMA data writes, the data is guaranteed to be in memory when the ISR runs.

Q3: What happens when you enable MSI-X while INTx is asserted?

The device must send Deassert_INTx before MSI-X takes over. The PCIe spec requires mutual exclusion — only one mechanism is active at a time. Verification must cover this transition to catch stale assert states.

Q4: Why does MSI-X use a BAR-mapped table instead of config space?

Config space access is slow (Type 0/1 Configuration TLPs). MSI-X can have up to 2048 entries at 16 bytes each = 32 KB of data. This doesn't fit in config space and would be painfully slow to program. BAR-mapped memory allows fast MMIO access using regular Memory Write TLPs.

Key Takeaways

INTx emulates legacy interrupt wires with Assert/Deassert message TLPs — shared, slow, two TLPs per interrupt
MSI replaces wires with a Memory Write TLP — no sharing, single TLP, up to 32 vectors, but all vectors share one address
MSI-X scales MSI with a BAR-mapped table — per-vector addressing, up to 2048 vectors, mandatory per-vector masking with PBA
The ordering guarantee (interrupt can't pass preceding data) is a protocol-level feature of MSI/MSI-X — it comes free from PCIe posted write ordering
PBA behavior (set on masked interrupt, clear on unmask) is a rich verification target — mask/unmask races and stale bits are common bugs
Only one mechanism can be active per function at a time — transitions between them must be verified

What's Next

In Part 7, we'll explore DMA & IOMMU — how PCIe devices read and write host memory directly, how the IOMMU provides address translation and isolation, and the verification challenges of bus mastering, scatter-gather lists, and IOMMU bypass testing.

Previous: Part 5 — Configuration Space & BARs | Next: Part 7 — DMA & IOMMU (coming soon)

6. PCIe for DV Engineers - Interrupts (INTx, MSI, MSI-X)