4. PCIe for DV Engineers - Transaction Layer

Part 4 of the PCIe for DV Engineers series | Part 3: Data Link Layer

The Transaction Layer is where PCIe gets interesting for DV engineers. This layer defines how devices communicate through Transaction Layer Packets (TLPs), implementing memory reads, writes, configuration access, and more. Understanding TLP structure and ordering rules is essential for thorough verification.

Where the Transaction Layer Fits

The Transaction Layer sits at the top of the PCIe stack:

It receives requests from software/device logic and creates TLPs that flow down through the Data Link and Physical layers.

1. Transaction Layer Overview

The Transaction Layer handles:

  • TLP Assembly: Packing requests into properly formatted packets
  • Transaction Types: Memory, I/O, Configuration, Messages
  • Ordering Rules: Ensuring correct transaction sequencing
  • Flow Control: Coordinating with Data Link Layer credits
  • Completion Handling: Matching responses to requests
flowchart TB
    subgraph TL["Transaction Layer"]
        subgraph TX["Transmit"]
            REQ[Request from Core]
            TLP_GEN[TLP Generator]
            ORD[Ordering Check]
            FC_CHK[Credit Check]
        end
        subgraph RX["Receive"]
            TLP_RX[TLP from DLL]
            TLP_DEC[TLP Decoder]
            CPL_MATCH[Completion Matching]
            RESP[Response to Core]
        end
    end
    REQ --> TLP_GEN --> ORD --> FC_CHK
    TLP_RX --> TLP_DEC --> CPL_MATCH --> RESP

2. TLP Structure

Every TLP consists of a header (3 or 4 DWs), optional data payload, and optional ECRC:

TLP Format Overview

ComponentSizeDescription
Header3 or 4 DWType, routing, address, length
Data Payload0-1024 DWWrite data or completion data
ECRC1 DW (optional)End-to-end CRC for data integrity

TLP Header - First DW (Common to All TLPs)

Byte 0 (Fmt + Type):
  [7:5] Fmt    - Format: header size + data presence
  [4:0] Type   - Transaction type

Byte 1:
  [7]   R      - Reserved
  [6:4] TC     - Traffic Class (0-7)
  [3]   R      - Reserved
  [2]   Attr   - ID-Based Ordering
  [1]   R      - Reserved
  [0]   TH     - TLP Processing Hints

Byte 2:
  [7]   TD     - TLP Digest (ECRC present)
  [6]   EP     - Error Poisoned
  [5:4] Attr   - Relaxed Ordering, No Snoop
  [3:2] AT     - Address Type
  [1:0] Length - Length[9:8]

Byte 3:
  [7:0] Length - Length[7:0] (in DW, 0=1024 DW)

Format (Fmt) Field

Fmt[2:0]HeaderDataDescription
0003 DWNo32-bit address, no data
0014 DWNo64-bit address, no data
0103 DWYes32-bit address, with data
0114 DWYes64-bit address, with data
1003 DW-TLP Prefix

DV Insight: Always verify the Fmt field matches the actual header size and data presence. Mismatches are a common bug source.

3. Transaction Types

PCIe defines several transaction types, each with specific use cases:

Memory Transactions

TypeFmt+TypePosted?Completion?Use Case
Memory Read (32-bit)0_00000NoYesRead from memory-mapped device
Memory Read (64-bit)1_00000NoYesRead from address >4GB
Memory Write (32-bit)10_00000YesNoWrite to memory-mapped device
Memory Write (64-bit)11_00000YesNoWrite to address >4GB
Memory Read Lock0_00001NoYesAtomic read (legacy)

I/O Transactions

TypeFmt+TypePosted?Completion?Use Case
IO Read0_00010NoYesLegacy I/O port read
IO Write10_00010NoYesLegacy I/O port write

Configuration Transactions

TypeFmt+TypePosted?Completion?Use Case
Config Read Type 00_00100NoYesRead config of target device
Config Write Type 010_00100NoYesWrite config of target device
Config Read Type 10_00101NoYesConfig access through bridge
Config Write Type 110_00101NoYesConfig access through bridge

Message Transactions

TypeFmt+TypePosted?Use Case
Message (no data)01_10xxxYesInterrupts, PM, errors
Message (with data)11_10xxxYesVendor-defined messages

Completion Transactions

TypeFmt+TypeDescription
Completion (no data)0_01010Response to Non-Posted write
Completion (with data)10_01010Response to read request
Completion Locked0_01011Response to locked read

4. Posted vs Non-Posted Transactions

Understanding this distinction is crucial for DV:

flowchart LR
    subgraph POSTED["Posted Transactions"]
        direction TB
        MW[Memory Write]
        MSG[Messages]
        P_NOTE["Fire and forget
No completion required"] end subgraph NONPOSTED["Non-Posted Transactions"] direction TB MR[Memory Read] IOR[IO Read/Write] CFG[Config Read/Write] NP_NOTE["Requires completion
Blocks until response"] end POSTED -.-> P_NOTE NONPOSTED -.-> NP_NOTE

Key Differences

AspectPostedNon-Posted
CompletionNot requiredRequired
BlockingNoYes (waits for Cpl)
Flow ControlPH/PD creditsNPH/NPD credits
ExamplesMemory Write, MessagesReads, Config, I/O
PerformanceHigher throughputLower (round-trip)

DV Insight: Non-Posted transactions require tracking outstanding requests. Verify your DUT correctly handles the completion timeout (typically 50ms-200ms).

5. TLP Routing

TLPs are routed through the PCIe fabric using one of three methods:

Address Routing

Used by Memory and I/O transactions. The address in the TLP header determines the destination:

  • Switches check address against BAR ranges
  • 32-bit addresses: 3DW header
  • 64-bit addresses: 4DW header

ID Routing

Used by Configuration and Completion transactions. Routes based on Bus:Device.Function (BDF):

Requester ID / Completer ID:
  [15:8]  Bus Number
  [7:3]   Device Number
  [2:0]   Function Number

Implicit Routing

Used by Messages. Routing determined by message code:

  • Routed to Root Complex
  • Routed by Address
  • Routed by ID
  • Broadcast from Root Complex
  • Local - terminate at receiver
  • Gathered and routed to Root Complex

6. Completion Handling

Non-Posted transactions require completions. Here's the matching process:

Completion Header Fields

Completer ID:   [15:0]  BDF of completer
Status:         [15:13] Completion status
  000 = Successful Completion (SC)
  001 = Unsupported Request (UR)
  010 = Config Request Retry Status (CRS)
  100 = Completer Abort (CA)
BCM:            [12]    Byte Count Modified
Byte Count:     [11:0]  Remaining bytes
Requester ID:   [15:0]  Original requester BDF
Tag:            [7:0]   Original transaction tag
Lower Address:  [6:0]   Lower bits of byte address

Split Completions

Large read requests may return multiple completions:

sequenceDiagram
    participant REQ as Requester
    participant CPL as Completer
    REQ->>CPL: Memory Read (256 bytes)
    Note over CPL: MPS=128 bytes
    CPL-->>REQ: CplD (128 bytes, BC=256)
    CPL-->>REQ: CplD (128 bytes, BC=128)
    Note over REQ: Request complete

Tag Management

Tags uniquely identify outstanding Non-Posted requests:

  • 8-bit Tag: 256 outstanding requests (default)
  • 10-bit Tag: 1024 outstanding requests (Extended Tag)
  • 14-bit Tag: 16384 outstanding requests (Gen5+, requires TPH)

// Tag tracking in verification
class tag_manager;
  bit [9:0] tag_in_use[1024];  // 10-bit extended tags
  int unsigned outstanding_count;
  
  function bit [9:0] allocate_tag();
    for (int i = 0; i < 1024; i++) begin
      if (!tag_in_use[i]) begin
        tag_in_use[i] = 1;
        outstanding_count++;
        return i;
      end
    end
    `uvm_error("TAG", "No tags available!")
    return 0;
  endfunction
  
  function void release_tag(bit [9:0] tag);
    if (!tag_in_use[tag])
      `uvm_error("TAG", $sformatf("Releasing unused tag %0d", tag))
    tag_in_use[tag] = 0;
    outstanding_count--;
  endfunction
endclass

7. Ordering Rules

PCIe ordering rules ensure data coherency and are critical for DV. Violations can cause silent data corruption.

The Ordering Table

This table defines when a later transaction can pass an earlier one:

Pass?Row Passing Column
PostedNon-PostedCpl
PostedNoYesYes
Non-PostedNoNoYes
CompletionNoNoYes

Key Ordering Rules Explained

1. Posted cannot pass Posted (Producer-Consumer Model)

  • Ensures write ordering is preserved
  • Write A followed by Write B: B cannot complete before A

2. Non-Posted cannot pass Posted (Read after Write)

  • A read must see all previous writes
  • Prevents reading stale data

3. Completion cannot pass Posted (Write after Read)

  • Critical for memory-mapped I/O
  • Completion data reflects state after all previous writes

Relaxed Ordering (RO)

The RO attribute bit allows relaxing certain rules for performance:

  • When RO=1, a Posted Write can pass earlier Posted Writes
  • Useful for independent data streams
  • Must be used carefully to avoid coherency issues

ID-Based Ordering (IDO)

Allows transactions to different destinations to pass each other:

  • Transactions with different Requester IDs can reorder
  • Improves performance in multi-function devices

// Ordering rule checker
class ordering_checker extends uvm_subscriber #(pcie_tlp);
  pcie_tlp posted_queue[$];
  pcie_tlp nonposted_queue[$];
  
  function void write(pcie_tlp t);
    // Check: Non-Posted cannot pass Posted
    if (t.is_nonposted()) begin
      foreach (posted_queue[i]) begin
        if (same_address_range(t, posted_queue[i])) begin
          `uvm_error("ORDER", $sformatf(
            "Non-Posted (Tag=%0d) passed Posted to same address!",
            t.tag))
        end
      end
    end
    
    // Check: Posted cannot pass Posted (unless RO)
    if (t.is_posted() && !t.relaxed_ordering) begin
      foreach (posted_queue[i]) begin
        if (posted_queue[i].timestamp > t.timestamp) begin
          `uvm_error("ORDER", "Posted passed earlier Posted!")
        end
      end
    end
  endfunction
endclass

8. Traffic Classes and Virtual Channels

Traffic Classes (TC)

3-bit field (TC0-TC7) providing QoS differentiation:

  • TC0: Default, best-effort traffic
  • TC1-TC7: Higher priority classes
  • Higher TC generally gets priority in arbitration

Virtual Channels (VC)

Independent flow control domains mapped from TCs:

  • VC0: Required, carries TC0 by default
  • VC1-VC7: Optional additional channels
  • Each VC has independent credit pools
  • TC-to-VC mapping is configurable
flowchart LR
    subgraph TC["Traffic Classes"]
        TC0[TC0]
        TC1[TC1]
        TC7[TC7]
    end
    subgraph VC["Virtual Channels"]
        VC0[VC0]
        VC1[VC1]
    end
    TC0 --> VC0
    TC1 --> VC1
    TC7 --> VC1

DV Insight: Most designs only implement VC0. If your DUT supports multiple VCs, verify credit management and arbitration across all channels.

9. Max Payload Size and Read Request Size

Max Payload Size (MPS)

Maximum TLP data payload size:

MPS ValueMax Payload
000128 bytes
001256 bytes
010512 bytes
0111024 bytes
1002048 bytes
1014096 bytes
  • Configured in Device Control register
  • All devices in hierarchy must agree on MPS
  • Smaller of device capability and system setting

Max Read Request Size (MRRS)

Maximum size for memory read requests:

  • Same encoding as MPS (128 to 4096 bytes)
  • Larger MRRS = fewer requests = better bandwidth
  • Affects completion buffer sizing

// MPS/MRRS compliance checking
class payload_checker extends uvm_component;
  int mps = 128;  // Current system MPS
  int mrrs = 512; // Current MRRS
  
  function void check_tlp(pcie_tlp tlp);
    int payload_bytes = tlp.length * 4;
    
    // Check MPS for writes
    if (tlp.is_posted() && tlp.has_data()) begin
      if (payload_bytes > mps)
        `uvm_error("MPS", $sformatf(
          "Write payload %0d exceeds MPS %0d", payload_bytes, mps))
    end
    
    // Check MRRS for reads
    if (tlp.is_memory_read()) begin
      if (payload_bytes > mrrs)
        `uvm_error("MRRS", $sformatf(
          "Read request %0d exceeds MRRS %0d", payload_bytes, mrrs))
    end
    
    // Check 4KB boundary crossing
    if (crosses_4kb_boundary(tlp.address, payload_bytes))
      `uvm_error("4KB", "TLP crosses 4KB address boundary!")
  endfunction
endclass

10. Address Boundary Rules

TLPs must not cross certain address boundaries:

4KB Boundary Rule

  • No TLP can cross a 4KB (0x1000) address boundary
  • Must split into multiple TLPs if request spans boundary
  • Critical for address translation (pages are 4KB)

Read Completion Boundary (RCB)

  • 64 bytes or 128 bytes (Root Complex dependent)
  • Completions naturally aligned to RCB
  • Affects how reads are split into completions

// 4KB boundary check
function bit crosses_4kb_boundary(bit [63:0] addr, int length);
  bit [63:0] end_addr = addr + length - 1;
  return (addr[63:12] != end_addr[63:12]);
endfunction

// Split request at 4KB boundary
function void split_at_4kb(bit [63:0] addr, int length,
                           output int first_len, output int second_len);
  bit [63:0] boundary = (addr & ~64'hFFF) + 64'h1000;
  first_len = boundary - addr;
  second_len = length - first_len;
endfunction

11. DV Verification Scenarios

Must-Have Test Cases

ScenarioDescriptionCoverage Goal
All TLP TypesGenerate each transaction typeFunctional
32/64-bit AddressingBoth 3DW and 4DW headersFormat coverage
Ordering RulesVerify all ordering constraintsProtocol compliance
Tag ExhaustionUse all available tagsResource limits
MPS BoundaryRequests at exact MPS sizeBoundary
4KB CrossingRequests spanning 4KB boundaryError handling
Split CompletionsLarge reads with multiple CplsCompletion handling
Completion TimeoutNon-Posted without responseError handling
UR/CA StatusUnsuccessful completionsError paths
Relaxed OrderingRO bit behaviorAttribute coverage

TLP Coverage Model


covergroup tlp_cg with function sample(pcie_tlp tlp);
  
  // Transaction type coverage
  tlp_type: coverpoint tlp.tlp_type {
    bins mem_rd     = {MEM_RD_32, MEM_RD_64};
    bins mem_wr     = {MEM_WR_32, MEM_WR_64};
    bins io_rd      = {IO_RD};
    bins io_wr      = {IO_WR};
    bins cfg_rd     = {CFG_RD_0, CFG_RD_1};
    bins cfg_wr     = {CFG_WR_0, CFG_WR_1};
    bins msg        = {MSG, MSG_D};
    bins cpl        = {CPL, CPL_D};
  }
  
  // Header format
  fmt: coverpoint tlp.fmt {
    bins dw3_no_data = {3'b000};
    bins dw4_no_data = {3'b001};
    bins dw3_data    = {3'b010};
    bins dw4_data    = {3'b011};
  }
  
  // Payload length
  length: coverpoint tlp.length {
    bins zero     = {0};        // 1024 DW
    bins small    = {[1:4]};
    bins medium   = {[5:32]};
    bins large    = {[33:256]};
    bins max_mps  = {[257:$]};  // Near MPS limit
  }
  
  // Traffic class
  tc: coverpoint tlp.tc {
    bins tc0 = {0};
    bins tc_high = {[1:7]};
  }
  
  // Attributes
  relaxed_order: coverpoint tlp.attr[1];
  no_snoop: coverpoint tlp.attr[0];
  
  // Completion status
  cpl_status: coverpoint tlp.cpl_status iff (tlp.is_completion()) {
    bins sc  = {3'b000};  // Success
    bins ur  = {3'b001};  // Unsupported Request
    bins crs = {3'b010};  // Config Retry
    bins ca  = {3'b100};  // Completer Abort
  }
  
  // Cross coverage
  type_x_length: cross tlp_type, length;
  type_x_tc: cross tlp_type, tc;
  
endgroup

12. Common Transaction Layer Bugs

Watch for these issues during verification:

  • Tag Reuse: Reusing tag before completion received
  • Ordering Violation: Posted passing Posted incorrectly
  • 4KB Crossing: Single TLP spanning 4KB boundary
  • MPS Violation: Payload exceeding Max Payload Size
  • Completion Mismatch: Wrong tag or requester ID in completion
  • Byte Count Error: Incorrect byte count in split completions
  • Missing Completion: Non-Posted request never completed
  • Spurious Completion: Completion for non-existent request
  • Format/Type Mismatch: Fmt doesn't match TLP content

Key Takeaways

  • TLPs have 3DW or 4DW headers based on address size
  • Posted transactions (writes) don't require completion
  • Non-Posted transactions (reads, config) require completion
  • Ordering rules prevent data corruption - verify thoroughly
  • Tags track outstanding Non-Posted requests
  • 4KB boundary cannot be crossed by single TLP
  • MPS limits payload size; MRRS limits read request size

Next Up

In Part 5: Configuration Space & BARs, we'll explore how devices are discovered, enumerated, and their address spaces are programmed.

Author
Mayur Kubavat
VLSI Design and Verification Engineer sharing knowledge about SystemVerilog, UVM, and hardware verification methodologies.

Comments (0)

Leave a Comment