4. PCIe for DV Engineers - Transaction Layer
Part 4 of the PCIe for DV Engineers series | Part 3: Data Link Layer
The Transaction Layer is where PCIe gets interesting for DV engineers. This layer defines how devices communicate through Transaction Layer Packets (TLPs), implementing memory reads, writes, configuration access, and more. Understanding TLP structure and ordering rules is essential for thorough verification.
Where the Transaction Layer Fits
The Transaction Layer sits at the top of the PCIe stack:
It receives requests from software/device logic and creates TLPs that flow down through the Data Link and Physical layers.
1. Transaction Layer Overview
The Transaction Layer handles:
- TLP Assembly: Packing requests into properly formatted packets
- Transaction Types: Memory, I/O, Configuration, Messages
- Ordering Rules: Ensuring correct transaction sequencing
- Flow Control: Coordinating with Data Link Layer credits
- Completion Handling: Matching responses to requests
flowchart TB
subgraph TL["Transaction Layer"]
subgraph TX["Transmit"]
REQ[Request from Core]
TLP_GEN[TLP Generator]
ORD[Ordering Check]
FC_CHK[Credit Check]
end
subgraph RX["Receive"]
TLP_RX[TLP from DLL]
TLP_DEC[TLP Decoder]
CPL_MATCH[Completion Matching]
RESP[Response to Core]
end
end
REQ --> TLP_GEN --> ORD --> FC_CHK
TLP_RX --> TLP_DEC --> CPL_MATCH --> RESP
2. TLP Structure
Every TLP consists of a header (3 or 4 DWs), optional data payload, and optional ECRC:
TLP Format Overview
| Component | Size | Description |
|---|---|---|
| Header | 3 or 4 DW | Type, routing, address, length |
| Data Payload | 0-1024 DW | Write data or completion data |
| ECRC | 1 DW (optional) | End-to-end CRC for data integrity |
TLP Header - First DW (Common to All TLPs)
Byte 0 (Fmt + Type):
[7:5] Fmt - Format: header size + data presence
[4:0] Type - Transaction type
Byte 1:
[7] R - Reserved
[6:4] TC - Traffic Class (0-7)
[3] R - Reserved
[2] Attr - ID-Based Ordering
[1] R - Reserved
[0] TH - TLP Processing Hints
Byte 2:
[7] TD - TLP Digest (ECRC present)
[6] EP - Error Poisoned
[5:4] Attr - Relaxed Ordering, No Snoop
[3:2] AT - Address Type
[1:0] Length - Length[9:8]
Byte 3:
[7:0] Length - Length[7:0] (in DW, 0=1024 DW)
Format (Fmt) Field
| Fmt[2:0] | Header | Data | Description |
|---|---|---|---|
| 000 | 3 DW | No | 32-bit address, no data |
| 001 | 4 DW | No | 64-bit address, no data |
| 010 | 3 DW | Yes | 32-bit address, with data |
| 011 | 4 DW | Yes | 64-bit address, with data |
| 100 | 3 DW | - | TLP Prefix |
DV Insight: Always verify the Fmt field matches the actual header size and data presence. Mismatches are a common bug source.
3. Transaction Types
PCIe defines several transaction types, each with specific use cases:
Memory Transactions
| Type | Fmt+Type | Posted? | Completion? | Use Case |
|---|---|---|---|---|
| Memory Read (32-bit) | 0_00000 | No | Yes | Read from memory-mapped device |
| Memory Read (64-bit) | 1_00000 | No | Yes | Read from address >4GB |
| Memory Write (32-bit) | 10_00000 | Yes | No | Write to memory-mapped device |
| Memory Write (64-bit) | 11_00000 | Yes | No | Write to address >4GB |
| Memory Read Lock | 0_00001 | No | Yes | Atomic read (legacy) |
I/O Transactions
| Type | Fmt+Type | Posted? | Completion? | Use Case |
|---|---|---|---|---|
| IO Read | 0_00010 | No | Yes | Legacy I/O port read |
| IO Write | 10_00010 | No | Yes | Legacy I/O port write |
Configuration Transactions
| Type | Fmt+Type | Posted? | Completion? | Use Case |
|---|---|---|---|---|
| Config Read Type 0 | 0_00100 | No | Yes | Read config of target device |
| Config Write Type 0 | 10_00100 | No | Yes | Write config of target device |
| Config Read Type 1 | 0_00101 | No | Yes | Config access through bridge |
| Config Write Type 1 | 10_00101 | No | Yes | Config access through bridge |
Message Transactions
| Type | Fmt+Type | Posted? | Use Case |
|---|---|---|---|
| Message (no data) | 01_10xxx | Yes | Interrupts, PM, errors |
| Message (with data) | 11_10xxx | Yes | Vendor-defined messages |
Completion Transactions
| Type | Fmt+Type | Description |
|---|---|---|
| Completion (no data) | 0_01010 | Response to Non-Posted write |
| Completion (with data) | 10_01010 | Response to read request |
| Completion Locked | 0_01011 | Response to locked read |
4. Posted vs Non-Posted Transactions
Understanding this distinction is crucial for DV:
flowchart LR
subgraph POSTED["Posted Transactions"]
direction TB
MW[Memory Write]
MSG[Messages]
P_NOTE["Fire and forget
No completion required"]
end
subgraph NONPOSTED["Non-Posted Transactions"]
direction TB
MR[Memory Read]
IOR[IO Read/Write]
CFG[Config Read/Write]
NP_NOTE["Requires completion
Blocks until response"]
end
POSTED -.-> P_NOTE
NONPOSTED -.-> NP_NOTE
Key Differences
| Aspect | Posted | Non-Posted |
|---|---|---|
| Completion | Not required | Required |
| Blocking | No | Yes (waits for Cpl) |
| Flow Control | PH/PD credits | NPH/NPD credits |
| Examples | Memory Write, Messages | Reads, Config, I/O |
| Performance | Higher throughput | Lower (round-trip) |
DV Insight: Non-Posted transactions require tracking outstanding requests. Verify your DUT correctly handles the completion timeout (typically 50ms-200ms).
5. TLP Routing
TLPs are routed through the PCIe fabric using one of three methods:
Address Routing
Used by Memory and I/O transactions. The address in the TLP header determines the destination:
- Switches check address against BAR ranges
- 32-bit addresses: 3DW header
- 64-bit addresses: 4DW header
ID Routing
Used by Configuration and Completion transactions. Routes based on Bus:Device.Function (BDF):
Requester ID / Completer ID:
[15:8] Bus Number
[7:3] Device Number
[2:0] Function Number
Implicit Routing
Used by Messages. Routing determined by message code:
- Routed to Root Complex
- Routed by Address
- Routed by ID
- Broadcast from Root Complex
- Local - terminate at receiver
- Gathered and routed to Root Complex
6. Completion Handling
Non-Posted transactions require completions. Here's the matching process:
Completion Header Fields
Completer ID: [15:0] BDF of completer
Status: [15:13] Completion status
000 = Successful Completion (SC)
001 = Unsupported Request (UR)
010 = Config Request Retry Status (CRS)
100 = Completer Abort (CA)
BCM: [12] Byte Count Modified
Byte Count: [11:0] Remaining bytes
Requester ID: [15:0] Original requester BDF
Tag: [7:0] Original transaction tag
Lower Address: [6:0] Lower bits of byte address
Split Completions
Large read requests may return multiple completions:
sequenceDiagram
participant REQ as Requester
participant CPL as Completer
REQ->>CPL: Memory Read (256 bytes)
Note over CPL: MPS=128 bytes
CPL-->>REQ: CplD (128 bytes, BC=256)
CPL-->>REQ: CplD (128 bytes, BC=128)
Note over REQ: Request complete
Tag Management
Tags uniquely identify outstanding Non-Posted requests:
- 8-bit Tag: 256 outstanding requests (default)
- 10-bit Tag: 1024 outstanding requests (Extended Tag)
- 14-bit Tag: 16384 outstanding requests (Gen5+, requires TPH)
// Tag tracking in verification
class tag_manager;
bit [9:0] tag_in_use[1024]; // 10-bit extended tags
int unsigned outstanding_count;
function bit [9:0] allocate_tag();
for (int i = 0; i < 1024; i++) begin
if (!tag_in_use[i]) begin
tag_in_use[i] = 1;
outstanding_count++;
return i;
end
end
`uvm_error("TAG", "No tags available!")
return 0;
endfunction
function void release_tag(bit [9:0] tag);
if (!tag_in_use[tag])
`uvm_error("TAG", $sformatf("Releasing unused tag %0d", tag))
tag_in_use[tag] = 0;
outstanding_count--;
endfunction
endclass
7. Ordering Rules
PCIe ordering rules ensure data coherency and are critical for DV. Violations can cause silent data corruption.
The Ordering Table
This table defines when a later transaction can pass an earlier one:
| Pass? | Row Passing Column | |||
|---|---|---|---|---|
| Posted | Non-Posted | Cpl | ||
| Posted | No | Yes | Yes | |
| Non-Posted | No | No | Yes | |
| Completion | No | No | Yes | |
Key Ordering Rules Explained
1. Posted cannot pass Posted (Producer-Consumer Model)
- Ensures write ordering is preserved
- Write A followed by Write B: B cannot complete before A
2. Non-Posted cannot pass Posted (Read after Write)
- A read must see all previous writes
- Prevents reading stale data
3. Completion cannot pass Posted (Write after Read)
- Critical for memory-mapped I/O
- Completion data reflects state after all previous writes
Relaxed Ordering (RO)
The RO attribute bit allows relaxing certain rules for performance:
- When RO=1, a Posted Write can pass earlier Posted Writes
- Useful for independent data streams
- Must be used carefully to avoid coherency issues
ID-Based Ordering (IDO)
Allows transactions to different destinations to pass each other:
- Transactions with different Requester IDs can reorder
- Improves performance in multi-function devices
// Ordering rule checker
class ordering_checker extends uvm_subscriber #(pcie_tlp);
pcie_tlp posted_queue[$];
pcie_tlp nonposted_queue[$];
function void write(pcie_tlp t);
// Check: Non-Posted cannot pass Posted
if (t.is_nonposted()) begin
foreach (posted_queue[i]) begin
if (same_address_range(t, posted_queue[i])) begin
`uvm_error("ORDER", $sformatf(
"Non-Posted (Tag=%0d) passed Posted to same address!",
t.tag))
end
end
end
// Check: Posted cannot pass Posted (unless RO)
if (t.is_posted() && !t.relaxed_ordering) begin
foreach (posted_queue[i]) begin
if (posted_queue[i].timestamp > t.timestamp) begin
`uvm_error("ORDER", "Posted passed earlier Posted!")
end
end
end
endfunction
endclass
8. Traffic Classes and Virtual Channels
Traffic Classes (TC)
3-bit field (TC0-TC7) providing QoS differentiation:
- TC0: Default, best-effort traffic
- TC1-TC7: Higher priority classes
- Higher TC generally gets priority in arbitration
Virtual Channels (VC)
Independent flow control domains mapped from TCs:
- VC0: Required, carries TC0 by default
- VC1-VC7: Optional additional channels
- Each VC has independent credit pools
- TC-to-VC mapping is configurable
flowchart LR
subgraph TC["Traffic Classes"]
TC0[TC0]
TC1[TC1]
TC7[TC7]
end
subgraph VC["Virtual Channels"]
VC0[VC0]
VC1[VC1]
end
TC0 --> VC0
TC1 --> VC1
TC7 --> VC1
DV Insight: Most designs only implement VC0. If your DUT supports multiple VCs, verify credit management and arbitration across all channels.
9. Max Payload Size and Read Request Size
Max Payload Size (MPS)
Maximum TLP data payload size:
| MPS Value | Max Payload |
|---|---|
| 000 | 128 bytes |
| 001 | 256 bytes |
| 010 | 512 bytes |
| 011 | 1024 bytes |
| 100 | 2048 bytes |
| 101 | 4096 bytes |
- Configured in Device Control register
- All devices in hierarchy must agree on MPS
- Smaller of device capability and system setting
Max Read Request Size (MRRS)
Maximum size for memory read requests:
- Same encoding as MPS (128 to 4096 bytes)
- Larger MRRS = fewer requests = better bandwidth
- Affects completion buffer sizing
// MPS/MRRS compliance checking
class payload_checker extends uvm_component;
int mps = 128; // Current system MPS
int mrrs = 512; // Current MRRS
function void check_tlp(pcie_tlp tlp);
int payload_bytes = tlp.length * 4;
// Check MPS for writes
if (tlp.is_posted() && tlp.has_data()) begin
if (payload_bytes > mps)
`uvm_error("MPS", $sformatf(
"Write payload %0d exceeds MPS %0d", payload_bytes, mps))
end
// Check MRRS for reads
if (tlp.is_memory_read()) begin
if (payload_bytes > mrrs)
`uvm_error("MRRS", $sformatf(
"Read request %0d exceeds MRRS %0d", payload_bytes, mrrs))
end
// Check 4KB boundary crossing
if (crosses_4kb_boundary(tlp.address, payload_bytes))
`uvm_error("4KB", "TLP crosses 4KB address boundary!")
endfunction
endclass
10. Address Boundary Rules
TLPs must not cross certain address boundaries:
4KB Boundary Rule
- No TLP can cross a 4KB (0x1000) address boundary
- Must split into multiple TLPs if request spans boundary
- Critical for address translation (pages are 4KB)
Read Completion Boundary (RCB)
- 64 bytes or 128 bytes (Root Complex dependent)
- Completions naturally aligned to RCB
- Affects how reads are split into completions
// 4KB boundary check
function bit crosses_4kb_boundary(bit [63:0] addr, int length);
bit [63:0] end_addr = addr + length - 1;
return (addr[63:12] != end_addr[63:12]);
endfunction
// Split request at 4KB boundary
function void split_at_4kb(bit [63:0] addr, int length,
output int first_len, output int second_len);
bit [63:0] boundary = (addr & ~64'hFFF) + 64'h1000;
first_len = boundary - addr;
second_len = length - first_len;
endfunction
11. DV Verification Scenarios
Must-Have Test Cases
| Scenario | Description | Coverage Goal |
|---|---|---|
| All TLP Types | Generate each transaction type | Functional |
| 32/64-bit Addressing | Both 3DW and 4DW headers | Format coverage |
| Ordering Rules | Verify all ordering constraints | Protocol compliance |
| Tag Exhaustion | Use all available tags | Resource limits |
| MPS Boundary | Requests at exact MPS size | Boundary |
| 4KB Crossing | Requests spanning 4KB boundary | Error handling |
| Split Completions | Large reads with multiple Cpls | Completion handling |
| Completion Timeout | Non-Posted without response | Error handling |
| UR/CA Status | Unsuccessful completions | Error paths |
| Relaxed Ordering | RO bit behavior | Attribute coverage |
TLP Coverage Model
covergroup tlp_cg with function sample(pcie_tlp tlp);
// Transaction type coverage
tlp_type: coverpoint tlp.tlp_type {
bins mem_rd = {MEM_RD_32, MEM_RD_64};
bins mem_wr = {MEM_WR_32, MEM_WR_64};
bins io_rd = {IO_RD};
bins io_wr = {IO_WR};
bins cfg_rd = {CFG_RD_0, CFG_RD_1};
bins cfg_wr = {CFG_WR_0, CFG_WR_1};
bins msg = {MSG, MSG_D};
bins cpl = {CPL, CPL_D};
}
// Header format
fmt: coverpoint tlp.fmt {
bins dw3_no_data = {3'b000};
bins dw4_no_data = {3'b001};
bins dw3_data = {3'b010};
bins dw4_data = {3'b011};
}
// Payload length
length: coverpoint tlp.length {
bins zero = {0}; // 1024 DW
bins small = {[1:4]};
bins medium = {[5:32]};
bins large = {[33:256]};
bins max_mps = {[257:$]}; // Near MPS limit
}
// Traffic class
tc: coverpoint tlp.tc {
bins tc0 = {0};
bins tc_high = {[1:7]};
}
// Attributes
relaxed_order: coverpoint tlp.attr[1];
no_snoop: coverpoint tlp.attr[0];
// Completion status
cpl_status: coverpoint tlp.cpl_status iff (tlp.is_completion()) {
bins sc = {3'b000}; // Success
bins ur = {3'b001}; // Unsupported Request
bins crs = {3'b010}; // Config Retry
bins ca = {3'b100}; // Completer Abort
}
// Cross coverage
type_x_length: cross tlp_type, length;
type_x_tc: cross tlp_type, tc;
endgroup
12. Common Transaction Layer Bugs
Watch for these issues during verification:
- Tag Reuse: Reusing tag before completion received
- Ordering Violation: Posted passing Posted incorrectly
- 4KB Crossing: Single TLP spanning 4KB boundary
- MPS Violation: Payload exceeding Max Payload Size
- Completion Mismatch: Wrong tag or requester ID in completion
- Byte Count Error: Incorrect byte count in split completions
- Missing Completion: Non-Posted request never completed
- Spurious Completion: Completion for non-existent request
- Format/Type Mismatch: Fmt doesn't match TLP content
Key Takeaways
- TLPs have 3DW or 4DW headers based on address size
- Posted transactions (writes) don't require completion
- Non-Posted transactions (reads, config) require completion
- Ordering rules prevent data corruption - verify thoroughly
- Tags track outstanding Non-Posted requests
- 4KB boundary cannot be crossed by single TLP
- MPS limits payload size; MRRS limits read request size
Next Up
In Part 5: Configuration Space & BARs, we'll explore how devices are discovered, enumerated, and their address spaces are programmed.
Comments (0)
Leave a Comment