Memory Management: Surviving Long-Running Simulations
Your block-level testbench runs beautifully. Then you integrate at SoC level, launch a 24-hour regression, and watch memory consumption climb until the OOM killer terminates your simulation at 3 AM. Welcome to the memory management problem that every DV engineer eventually faces.
Unlike software engineers who have decades of memory management tools and techniques, verification engineers work in SystemVerilog—a language with primitive garbage collection and no explicit memory control. This post explores why testbenches leak memory, what SystemVerilog can't do, and how modern approaches including hybrid languages offer solutions.
The Problem: Memory That Never Dies
Consider a typical verification scenario:
- Block-level test: 10,000 transactions, 2 hours, 4GB RAM—no problem
- SoC-level test: 10,000,000 transactions, 48 hours, ???
If each transaction object consumes 1KB and you store them all, that's 10GB just for transactions—before counting scoreboards, coverage, and infrastructure. But the real problem isn't size; it's accumulation. Objects that should be temporary become permanent.
class typical_scoreboard extends uvm_scoreboard;
transaction expected_q[$];
transaction history[$]; // "For debug"—grows forever
function void check(transaction actual);
foreach (expected_q[i]) begin
if (matches(expected_q[i], actual)) begin
history.push_back(expected_q[i]); // Never cleaned
expected_q.delete(i);
return;
end
end
// No match? expected_q keeps the orphan forever
`uvm_error("SB", "Unexpected transaction")
endfunction
endclass
This scoreboard has two leaks: history grows without bound, and unmatched expectations accumulate. At block level, you'll never notice. At SoC level, it's fatal.
Why SystemVerilog Memory Management Is Primitive
SystemVerilog provides garbage collection, but it's far from modern GC implementations:
| Feature | Modern GC (Java, Python, Go) | SystemVerilog |
|---|---|---|
| Generational collection | Yes—optimizes for short-lived objects | No—all objects treated equally |
| Explicit deallocation | Optional (delete, del) | Not available |
| Weak references | Yes—references that don't prevent collection | No |
| Memory pools | Built-in or library support | Manual implementation only |
| Profiling tools | Rich ecosystem | Simulator-specific, limited |
| Deterministic cleanup | Finalizers, context managers, RAII | No |
| Cycle detection | Standard | Simulator-dependent |
Worse, SystemVerilog GC is implementation-dependent. VCS, Xcelium, and Questa each implement it differently. Code that's leak-free on one simulator may leak on another.
The Fundamental Mismatch
SystemVerilog was designed for hardware modeling, where most objects are static (modules, interfaces) or short-lived (transactions that flow through). Modern testbenches break this assumption:
- Scoreboards accumulate state across millions of transactions
- Coverage databases grow continuously
- Analysis subscribers may store history indefinitely
- Configuration objects persist for entire simulation
The Five Memory Leak Patterns
Pattern 1: The Unbounded Queue
// Grows without limit
class packet_history;
packet packets[$];
function void record(packet p);
packets.push_back(p); // Never removed
endfunction
endclass
Fix: Bounded buffer with eviction
class bounded_history #(int MAX_SIZE = 10000);
packet packets[$];
function void record(packet p);
if (packets.size() >= MAX_SIZE)
void'(packets.pop_front()); // Evict oldest
packets.push_back(p);
endfunction
endclass
Pattern 2: The Orphaned Expectation
// Expected transactions that never get matched
class scoreboard;
transaction expected[$];
function void add_expected(transaction t);
expected.push_back(t);
endfunction
function void check(transaction actual);
// If protocol error causes mismatch, expected grows forever
foreach (expected[i]) begin
if (expected[i].id == actual.id) begin
expected.delete(i);
return;
end
end
endfunction
endclass
Fix: Timeout-based cleanup
class robust_scoreboard;
typedef struct {
transaction txn;
time added_time;
} expected_entry_t;
expected_entry_t expected[$];
time TIMEOUT = 1ms;
function void add_expected(transaction t);
expected.push_back('{txn: t, added_time: $time});
endfunction
function void check(transaction actual);
// First, clean stale entries
cleanup_stale();
// Then match
foreach (expected[i]) begin
if (expected[i].txn.id == actual.id) begin
expected.delete(i);
return;
end
end
endfunction
function void cleanup_stale();
for (int i = expected.size() - 1; i >= 0; i--) begin
if ($time - expected[i].added_time > TIMEOUT) begin
`uvm_warning("SB", $sformatf("Stale expectation removed: id=%0h",
expected[i].txn.id))
expected.delete(i);
end
end
endfunction
endclass
Pattern 3: The Deep Copy Cascade
// Every subscriber clones the transaction
class coverage_subscriber extends uvm_subscriber #(packet);
packet stored[$];
function void write(packet t);
packet copy;
$cast(copy, t.clone()); // Deep copy #1
stored.push_back(copy); // Stored forever
endfunction
endclass
// If 5 subscribers all clone: 5× memory per transaction
Fix: Read-only access, selective storage
class efficient_subscriber extends uvm_subscriber #(packet);
// Only store what you need, not entire objects
bit [31:0] seen_addresses[$];
function void write(packet t);
// Extract only necessary data
seen_addresses.push_back(t.addr);
// Don't clone entire packet
endfunction
endclass
Pattern 4: The Circular Reference
class tree_node;
tree_node parent; // Points up
tree_node children[$]; // Points down
function void add_child(tree_node child);
children.push_back(child);
child.parent = this; // Circular reference!
endfunction
endclass
// Even when you "forget" the root, the cycle keeps all nodes alive
Fix: Break cycles explicitly
class tree_node;
tree_node parent;
tree_node children[$];
// Call before discarding tree
function void destroy();
foreach (children[i]) begin
children[i].destroy(); // Recursive cleanup
end
children.delete();
parent = null; // Break upward reference
endfunction
endclass
Pattern 5: The Config DB Accumulator
// Configuration entries persist forever
class dynamic_test extends uvm_test;
task run_phase(uvm_phase phase);
for (int i = 0; i < 1000; i++) begin
config_object cfg = new($sformatf("cfg_%0d", i));
// Each iteration adds permanent config entry
uvm_config_db#(config_object)::set(this, "*", $sformatf("cfg_%0d", i), cfg);
end
endtask
endclass
Fix: Reuse config objects, limit scope
// Set once, modify in place
config_object cfg = new("cfg");
uvm_config_db#(config_object)::set(this, "*", "cfg", cfg);
// Later: modify, don't recreate
cfg.update_values(new_values);
Object Pooling: The Traditional Solution
Since you can't control when GC runs, reuse objects instead of creating new ones:
class transaction_pool #(type T = uvm_sequence_item);
protected T pool[$];
protected int created = 0;
protected int reused = 0;
// Get object from pool or create new
function T acquire();
if (pool.size() > 0) begin
reused++;
return pool.pop_back();
end
created++;
return T::type_id::create("pooled_item");
endfunction
// Return object to pool
function void release(T item);
item.clear(); // Reset to clean state
pool.push_back(item);
endfunction
function void report();
real reuse_pct = (created > 0) ? (100.0 * reused / (created + reused)) : 0;
`uvm_info("POOL", $sformatf(
"Created: %0d | Reused: %0d | Reuse rate: %.1f%% | Pool size: %0d",
created, reused, reuse_pct, pool.size()), UVM_LOW)
endfunction
endclass
// Usage
class my_sequence extends uvm_sequence #(my_transaction);
static transaction_pool #(my_transaction) pool = new();
task body();
my_transaction txn;
repeat (1000000) begin
txn = pool.acquire();
// ... configure and send txn ...
start_item(txn);
finish_item(txn);
pool.release(txn); // Return to pool
end
endtask
endclass
Hybrid Language Approaches
SystemVerilog's memory limitations aren't fundamental to verification—they're language limitations. Modern verification increasingly uses hybrid approaches.
Python with cocotb
Python offers mature memory management that SystemVerilog lacks:
# Python: Explicit cleanup with context managers
class TransactionBuffer:
def __init__(self, max_size=1000):
self.buffer = []
self.max_size = max_size
def __enter__(self):
return self
def __exit__(self, *args):
self.buffer.clear() # Guaranteed cleanup
def add(self, txn):
if len(self.buffer) >= self.max_size:
self.buffer.pop(0) # Evict oldest
self.buffer.append(txn)
# Usage: Automatic cleanup when block exits
async def test_sequence(dut):
with TransactionBuffer() as buf:
for _ in range(1000000):
txn = await generate_transaction(dut)
buf.add(txn)
await verify_response(dut, txn)
# Buffer automatically cleared here
Python Memory Advantages:
# 1. Generators: Process without storing
def transaction_stream():
while True:
yield Transaction() # One at a time, not all in memory
async def test_streaming(dut):
for txn in transaction_stream():
await drive_and_check(dut, txn)
# txn goes out of scope immediately
# 2. Weak references: Cache without preventing cleanup
import weakref
class TransactionCache:
def __init__(self):
self.cache = weakref.WeakValueDictionary()
def get(self, key):
return self.cache.get(key) # Returns None if collected
def put(self, key, txn):
self.cache[key] = txn # Won't prevent GC
# 3. Memory profiling: Find leaks easily
import tracemalloc
tracemalloc.start()
# ... run test ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics('lineno')[:10]:
print(stat) # Shows top memory consumers by line
C++ via DPI
For performance-critical components, C++ offers full memory control:
// C++ side: Custom allocator with pooling
#include <memory>
#include <vector>
class TransactionPool {
std::vector<std::unique_ptr<Transaction>> pool;
public:
Transaction* acquire() {
if (!pool.empty()) {
auto txn = std::move(pool.back());
pool.pop_back();
return txn.release();
}
return new Transaction();
}
void release(Transaction* txn) {
txn->reset();
pool.push_back(std::unique_ptr<Transaction>(txn));
}
};
// DPI export
extern "C" {
static TransactionPool g_pool;
void* dpi_acquire_transaction() {
return g_pool.acquire();
}
void dpi_release_transaction(void* txn) {
g_pool.release(static_cast<Transaction*>(txn));
}
}
// SystemVerilog side
import "DPI-C" function chandle dpi_acquire_transaction();
import "DPI-C" function void dpi_release_transaction(chandle txn);
class dpi_managed_sequence extends uvm_sequence;
task body();
chandle txn_handle;
repeat (1000000) begin
txn_handle = dpi_acquire_transaction();
// ... use transaction via DPI ...
dpi_release_transaction(txn_handle); // Explicit cleanup
end
endtask
endclass
The Hybrid Architecture
flowchart LR
subgraph SV["SystemVerilog"]
DRV[Driver]
MON[Monitor]
end
subgraph PY["Python"]
SB[Scoreboard]
COV[Coverage]
ANAL[Analysis]
end
subgraph CPP["C++ via DPI"]
POOL[Object Pool]
REF[Reference Model]
end
MON -->|Transactions| PY
DRV <-->|Pooled Objects| CPP
PY <-->|High-perf checks| CPP
style SV fill:#dbeafe,stroke:#3b82f6
style PY fill:#d1fae5,stroke:#10b981
style CPP fill:#fef3c7,stroke:#f59e0b
Division of responsibility:
- SystemVerilog: Signal-level driving/monitoring (must be SV)
- Python: Scoreboarding, coverage, analysis (memory-managed, easy to debug)
- C++: Reference models, high-performance algorithms (full memory control)
Memory Profiling Techniques
Simulator-Specific Profiling
# VCS
vcs -simprofile mem ...
simv +simprofile+mem
profile_viewer profile.db # GUI shows memory over time
# Xcelium
xrun -memprof ...
# Generates memory profile report
# Questa
vsim -memprofile ...
# Memory stats in transcript
DIY Memory Tracking
// Track object lifecycle manually
class tracked_object extends uvm_object;
`uvm_object_utils(tracked_object)
static int total_created = 0;
static int total_active = 0;
function new(string name = "tracked_object");
super.new(name);
total_created++;
total_active++;
endfunction
// Call explicitly before releasing
function void release();
total_active--;
endfunction
static function void report_stats();
`uvm_info("MEM_TRACK", $sformatf(
"Created: %0d | Active: %0d | Potentially leaked: %0d",
total_created, total_active, total_active), UVM_LOW)
endfunction
endclass
// Call at end of simulation
final begin
tracked_object::report_stats();
end
Phase-Based Memory Monitoring
class memory_monitor extends uvm_component;
`uvm_component_utils(memory_monitor)
int phase_start_objects;
function void phase_started(uvm_phase phase);
phase_start_objects = get_active_object_count();
`uvm_info("MEM", $sformatf("%s started: %0d active objects",
phase.get_name(), phase_start_objects), UVM_MEDIUM)
endfunction
function void phase_ended(uvm_phase phase);
int phase_end_objects = get_active_object_count();
int delta = phase_end_objects - phase_start_objects;
`uvm_info("MEM", $sformatf("%s ended: %0d active objects (delta: %+0d)",
phase.get_name(), phase_end_objects, delta), UVM_MEDIUM)
if (delta > 1000)
`uvm_warning("MEM", "Significant object accumulation detected")
endfunction
endclass
Modern and Future Solutions
Cloud-Based Isolation
Run each test in a fresh container/VM:
# Kubernetes job for simulation
apiVersion: batch/v1
kind: Job
metadata:
name: sim-test-001
spec:
template:
spec:
containers:
- name: simulator
image: verification/sim:latest
resources:
limits:
memory: "32Gi" # Hard limit
command: ["./run_test.sh", "test_001"]
restartPolicy: Never
Benefits:
- Memory leaks don't accumulate across tests
- Each test starts with clean state
- OOM kills one test, not entire regression
- Easy to identify memory-hungry tests
Streaming Verification Architecture
Process transactions without storing:
// Don't store transactions—stream them
class streaming_scoreboard extends uvm_scoreboard;
// Minimal state: just what's needed for pending checks
bit [15:0] pending_ids[$];
function void expected(transaction t);
pending_ids.push_back(t.id);
endfunction
function void actual(transaction t);
int idx = pending_ids.find_first_index(x) with (x == t.id);
if (idx >= 0) begin
pending_ids.delete(idx);
// Transaction checked and discarded—no storage
end else begin
`uvm_error("SB", "Unexpected transaction")
end
endfunction
// Only store IDs, not full transactions
// 16 bits vs 1KB+ per transaction = 64× less memory
endclass
Incremental Coverage
# Python: Stream coverage to database, don't accumulate in memory
import sqlite3
class StreamingCoverage:
def __init__(self, db_path):
self.conn = sqlite3.connect(db_path)
self.conn.execute('''CREATE TABLE IF NOT EXISTS coverage
(bin TEXT PRIMARY KEY, hits INTEGER)''')
def sample(self, bin_name):
# Write directly to DB, don't store in memory
self.conn.execute(
'INSERT OR REPLACE INTO coverage VALUES (?,
COALESCE((SELECT hits FROM coverage WHERE bin=?), 0) + 1)',
(bin_name, bin_name))
def commit(self):
self.conn.commit() # Periodic flush
Key Takeaways
- SystemVerilog GC is primitive: No explicit control, no weak references, simulator-dependent
- Leaks hide at block level: Only manifest in long-running SoC simulations
- Five leak patterns: Unbounded queues, orphaned expectations, deep copies, circular references, config accumulation
- Object pooling helps: Reuse objects instead of relying on GC
- Hybrid languages offer solutions: Python for managed memory, C++ for explicit control
- Profile before optimizing: Use simulator tools or DIY tracking to find actual leaks
- Modern architectures: Cloud isolation, streaming verification, incremental coverage
Further Reading
- Effective Java, Chapter 2 (Creating and Destroying Objects) - Joshua Bloch
- Working Effectively with Legacy Code, Chapter 8 (Memory Management) - Michael Feathers
- cocotb Documentation - docs.cocotb.org
- Python Memory Management - Python C API docs
- Your simulator's memory profiling guide
Interview Corner
Q: Your simulation runs out of memory after 12 hours. How do you debug?
A: First, use simulator memory profiling to identify which components are growing. Common culprits: scoreboards storing history, coverage with exploding crosses, analysis subscribers cloning transactions. Check for unbounded queues and missing cleanup on error paths. Consider whether you need to store full transactions or just essential data (IDs, addresses). If the design requires long-running tests, consider hybrid approaches with Python/C++ for better memory control, or cloud-based isolation where each test runs in a fresh container.
Q: Why can't you just call delete in SystemVerilog like in C++?
A: SystemVerilog uses automatic garbage collection without explicit deallocation. The language designers chose this to prevent use-after-free bugs common in C/C++, but it means you can't force immediate cleanup. Objects become eligible for collection when no references remain, but when GC actually runs is simulator-dependent and unpredictable. This is why object pooling and hybrid language approaches are important for memory-sensitive applications.
Return to Software Engineering for DV
Comments (0)
Leave a Comment