Architecture & Design Verification: Memory Management: Surviving Long-Running Simulations

Your block-level testbench runs beautifully. Then you integrate at SoC level, launch a 24-hour regression, and watch memory consumption climb until the OOM killer terminates your simulation at 3 AM. Welcome to the memory management problem that every DV engineer eventually faces.

Unlike software engineers who have decades of memory management tools and techniques, verification engineers work in SystemVerilog—a language with primitive garbage collection and no explicit memory control. This post explores why testbenches leak memory, what SystemVerilog can't do, and how modern approaches including hybrid languages offer solutions.

The Problem: Memory That Never Dies

Consider a typical verification scenario:

Block-level test: 10,000 transactions, 2 hours, 4GB RAM—no problem
SoC-level test: 10,000,000 transactions, 48 hours, ???

If each transaction object consumes 1KB and you store them all, that's 10GB just for transactions—before counting scoreboards, coverage, and infrastructure. But the real problem isn't size; it's accumulation. Objects that should be temporary become permanent.

class typical_scoreboard extends uvm_scoreboard;
  
  transaction expected_q[$];
  transaction history[$];      // "For debug"—grows forever
  
  function void check(transaction actual);
    foreach (expected_q[i]) begin
      if (matches(expected_q[i], actual)) begin
        history.push_back(expected_q[i]);  // Never cleaned
        expected_q.delete(i);
        return;
      end
    end
    // No match? expected_q keeps the orphan forever
    `uvm_error("SB", "Unexpected transaction")
  endfunction
  
endclass

This scoreboard has two leaks: history grows without bound, and unmatched expectations accumulate. At block level, you'll never notice. At SoC level, it's fatal.

Why SystemVerilog Memory Management Is Primitive

SystemVerilog provides garbage collection, but it's far from modern GC implementations:

Feature	Modern GC (Java, Python, Go)	SystemVerilog
Generational collection	Yes—optimizes for short-lived objects	No—all objects treated equally
Explicit deallocation	Optional (delete, del)	Not available
Weak references	Yes—references that don't prevent collection	No
Memory pools	Built-in or library support	Manual implementation only
Profiling tools	Rich ecosystem	Simulator-specific, limited
Deterministic cleanup	Finalizers, context managers, RAII	No
Cycle detection	Standard	Simulator-dependent

Worse, SystemVerilog GC is implementation-dependent. VCS, Xcelium, and Questa each implement it differently. Code that's leak-free on one simulator may leak on another.

The Fundamental Mismatch

SystemVerilog was designed for hardware modeling, where most objects are static (modules, interfaces) or short-lived (transactions that flow through). Modern testbenches break this assumption:

Scoreboards accumulate state across millions of transactions
Coverage databases grow continuously
Analysis subscribers may store history indefinitely
Configuration objects persist for entire simulation

The Five Memory Leak Patterns

Pattern 1: The Unbounded Queue

// Grows without limit
class packet_history;
  packet packets[$];
  
  function void record(packet p);
    packets.push_back(p);  // Never removed
  endfunction
endclass

Fix: Bounded buffer with eviction

class bounded_history #(int MAX_SIZE = 10000);
  packet packets[$];
  
  function void record(packet p);
    if (packets.size() >= MAX_SIZE)
      void'(packets.pop_front());  // Evict oldest
    packets.push_back(p);
  endfunction
endclass

Pattern 2: The Orphaned Expectation

// Expected transactions that never get matched
class scoreboard;
  transaction expected[$];
  
  function void add_expected(transaction t);
    expected.push_back(t);
  endfunction
  
  function void check(transaction actual);
    // If protocol error causes mismatch, expected grows forever
    foreach (expected[i]) begin
      if (expected[i].id == actual.id) begin
        expected.delete(i);
        return;
      end
    end
  endfunction
endclass

Fix: Timeout-based cleanup

class robust_scoreboard;
  typedef struct {
    transaction txn;
    time added_time;
  } expected_entry_t;
  
  expected_entry_t expected[$];
  time TIMEOUT = 1ms;
  
  function void add_expected(transaction t);
    expected.push_back('{txn: t, added_time: $time});
  endfunction
  
  function void check(transaction actual);
    // First, clean stale entries
    cleanup_stale();
    // Then match
    foreach (expected[i]) begin
      if (expected[i].txn.id == actual.id) begin
        expected.delete(i);
        return;
      end
    end
  endfunction
  
  function void cleanup_stale();
    for (int i = expected.size() - 1; i >= 0; i--) begin
      if ($time - expected[i].added_time > TIMEOUT) begin
        `uvm_warning("SB", $sformatf("Stale expectation removed: id=%0h", 
                                      expected[i].txn.id))
        expected.delete(i);
      end
    end
  endfunction
endclass

Pattern 3: The Deep Copy Cascade

// Every subscriber clones the transaction
class coverage_subscriber extends uvm_subscriber #(packet);
  packet stored[$];
  
  function void write(packet t);
    packet copy;
    $cast(copy, t.clone());     // Deep copy #1
    stored.push_back(copy);     // Stored forever
  endfunction
endclass

// If 5 subscribers all clone: 5× memory per transaction

Fix: Read-only access, selective storage

class efficient_subscriber extends uvm_subscriber #(packet);
  
  // Only store what you need, not entire objects
  bit [31:0] seen_addresses[$];
  
  function void write(packet t);
    // Extract only necessary data
    seen_addresses.push_back(t.addr);
    // Don't clone entire packet
  endfunction
endclass

Pattern 4: The Circular Reference

class tree_node;
  tree_node parent;        // Points up
  tree_node children[$];   // Points down
  
  function void add_child(tree_node child);
    children.push_back(child);
    child.parent = this;   // Circular reference!
  endfunction
endclass

// Even when you "forget" the root, the cycle keeps all nodes alive

Fix: Break cycles explicitly

class tree_node;
  tree_node parent;
  tree_node children[$];
  
  // Call before discarding tree
  function void destroy();
    foreach (children[i]) begin
      children[i].destroy();  // Recursive cleanup
    end
    children.delete();
    parent = null;  // Break upward reference
  endfunction
endclass

Pattern 5: The Config DB Accumulator

// Configuration entries persist forever
class dynamic_test extends uvm_test;
  
  task run_phase(uvm_phase phase);
    for (int i = 0; i < 1000; i++) begin
      config_object cfg = new($sformatf("cfg_%0d", i));
      // Each iteration adds permanent config entry
      uvm_config_db#(config_object)::set(this, "*", $sformatf("cfg_%0d", i), cfg);
    end
  endtask
  
endclass

Fix: Reuse config objects, limit scope

// Set once, modify in place
config_object cfg = new("cfg");
uvm_config_db#(config_object)::set(this, "*", "cfg", cfg);

// Later: modify, don't recreate
cfg.update_values(new_values);

Object Pooling: The Traditional Solution

Since you can't control when GC runs, reuse objects instead of creating new ones:

class transaction_pool #(type T = uvm_sequence_item);
  
  protected T pool[$];
  protected int created = 0;
  protected int reused = 0;
  
  // Get object from pool or create new
  function T acquire();
    if (pool.size() > 0) begin
      reused++;
      return pool.pop_back();
    end
    created++;
    return T::type_id::create("pooled_item");
  endfunction
  
  // Return object to pool
  function void release(T item);
    item.clear();  // Reset to clean state
    pool.push_back(item);
  endfunction
  
  function void report();
    real reuse_pct = (created > 0) ? (100.0 * reused / (created + reused)) : 0;
    `uvm_info("POOL", $sformatf(
      "Created: %0d | Reused: %0d | Reuse rate: %.1f%% | Pool size: %0d",
      created, reused, reuse_pct, pool.size()), UVM_LOW)
  endfunction
  
endclass

// Usage
class my_sequence extends uvm_sequence #(my_transaction);
  
  static transaction_pool #(my_transaction) pool = new();
  
  task body();
    my_transaction txn;
    
    repeat (1000000) begin
      txn = pool.acquire();
      // ... configure and send txn ...
      start_item(txn);
      finish_item(txn);
      pool.release(txn);  // Return to pool
    end
  endtask
  
endclass

Hybrid Language Approaches

SystemVerilog's memory limitations aren't fundamental to verification—they're language limitations. Modern verification increasingly uses hybrid approaches.

Python with cocotb

Python offers mature memory management that SystemVerilog lacks:

# Python: Explicit cleanup with context managers
class TransactionBuffer:
    def __init__(self, max_size=1000):
        self.buffer = []
        self.max_size = max_size
    
    def __enter__(self):
        return self
    
    def __exit__(self, *args):
        self.buffer.clear()  # Guaranteed cleanup
    
    def add(self, txn):
        if len(self.buffer) >= self.max_size:
            self.buffer.pop(0)  # Evict oldest
        self.buffer.append(txn)

# Usage: Automatic cleanup when block exits
async def test_sequence(dut):
    with TransactionBuffer() as buf:
        for _ in range(1000000):
            txn = await generate_transaction(dut)
            buf.add(txn)
            await verify_response(dut, txn)
    # Buffer automatically cleared here

Python Memory Advantages:

# 1. Generators: Process without storing
def transaction_stream():
    while True:
        yield Transaction()  # One at a time, not all in memory

async def test_streaming(dut):
    for txn in transaction_stream():
        await drive_and_check(dut, txn)
        # txn goes out of scope immediately

# 2. Weak references: Cache without preventing cleanup
import weakref

class TransactionCache:
    def __init__(self):
        self.cache = weakref.WeakValueDictionary()
    
    def get(self, key):
        return self.cache.get(key)  # Returns None if collected
    
    def put(self, key, txn):
        self.cache[key] = txn  # Won't prevent GC

# 3. Memory profiling: Find leaks easily
import tracemalloc

tracemalloc.start()
# ... run test ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics('lineno')[:10]:
    print(stat)  # Shows top memory consumers by line

C++ via DPI

For performance-critical components, C++ offers full memory control:

// C++ side: Custom allocator with pooling
#include <memory>
#include <vector>

class TransactionPool {
    std::vector<std::unique_ptr<Transaction>> pool;
    
public:
    Transaction* acquire() {
        if (!pool.empty()) {
            auto txn = std::move(pool.back());
            pool.pop_back();
            return txn.release();
        }
        return new Transaction();
    }
    
    void release(Transaction* txn) {
        txn->reset();
        pool.push_back(std::unique_ptr<Transaction>(txn));
    }
};

// DPI export
extern "C" {
    static TransactionPool g_pool;
    
    void* dpi_acquire_transaction() {
        return g_pool.acquire();
    }
    
    void dpi_release_transaction(void* txn) {
        g_pool.release(static_cast<Transaction*>(txn));
    }
}

// SystemVerilog side
import "DPI-C" function chandle dpi_acquire_transaction();
import "DPI-C" function void dpi_release_transaction(chandle txn);

class dpi_managed_sequence extends uvm_sequence;
  
  task body();
    chandle txn_handle;
    
    repeat (1000000) begin
      txn_handle = dpi_acquire_transaction();
      // ... use transaction via DPI ...
      dpi_release_transaction(txn_handle);  // Explicit cleanup
    end
  endtask
  
endclass

The Hybrid Architecture

flowchart LR
    subgraph SV["SystemVerilog"]
        DRV[Driver]
        MON[Monitor]
    end
    
    subgraph PY["Python"]
        SB[Scoreboard]
        COV[Coverage]
        ANAL[Analysis]
    end
    
    subgraph CPP["C++ via DPI"]
        POOL[Object Pool]
        REF[Reference Model]
    end
    
    MON -->|Transactions| PY
    DRV <-->|Pooled Objects| CPP
    PY <-->|High-perf checks| CPP
    
    style SV fill:#dbeafe,stroke:#3b82f6
    style PY fill:#d1fae5,stroke:#10b981
    style CPP fill:#fef3c7,stroke:#f59e0b

Division of responsibility:

SystemVerilog: Signal-level driving/monitoring (must be SV)
Python: Scoreboarding, coverage, analysis (memory-managed, easy to debug)
C++: Reference models, high-performance algorithms (full memory control)

Memory Profiling Techniques

Simulator-Specific Profiling

# VCS
vcs -simprofile mem ... 
simv +simprofile+mem
profile_viewer profile.db  # GUI shows memory over time

# Xcelium
xrun -memprof ...
# Generates memory profile report

# Questa
vsim -memprofile ...
# Memory stats in transcript

DIY Memory Tracking

// Track object lifecycle manually
class tracked_object extends uvm_object;
  `uvm_object_utils(tracked_object)
  
  static int total_created = 0;
  static int total_active = 0;
  
  function new(string name = "tracked_object");
    super.new(name);
    total_created++;
    total_active++;
  endfunction
  
  // Call explicitly before releasing
  function void release();
    total_active--;
  endfunction
  
  static function void report_stats();
    `uvm_info("MEM_TRACK", $sformatf(
      "Created: %0d | Active: %0d | Potentially leaked: %0d",
      total_created, total_active, total_active), UVM_LOW)
  endfunction
  
endclass

// Call at end of simulation
final begin
  tracked_object::report_stats();
end

Phase-Based Memory Monitoring

class memory_monitor extends uvm_component;
  `uvm_component_utils(memory_monitor)
  
  int phase_start_objects;
  
  function void phase_started(uvm_phase phase);
    phase_start_objects = get_active_object_count();
    `uvm_info("MEM", $sformatf("%s started: %0d active objects",
      phase.get_name(), phase_start_objects), UVM_MEDIUM)
  endfunction
  
  function void phase_ended(uvm_phase phase);
    int phase_end_objects = get_active_object_count();
    int delta = phase_end_objects - phase_start_objects;
    
    `uvm_info("MEM", $sformatf("%s ended: %0d active objects (delta: %+0d)",
      phase.get_name(), phase_end_objects, delta), UVM_MEDIUM)
    
    if (delta > 1000)
      `uvm_warning("MEM", "Significant object accumulation detected")
  endfunction
  
endclass

Modern and Future Solutions

Cloud-Based Isolation

Run each test in a fresh container/VM:

# Kubernetes job for simulation
apiVersion: batch/v1
kind: Job
metadata:
  name: sim-test-001
spec:
  template:
    spec:
      containers:
      - name: simulator
        image: verification/sim:latest
        resources:
          limits:
            memory: "32Gi"  # Hard limit
        command: ["./run_test.sh", "test_001"]
      restartPolicy: Never

Benefits:

Memory leaks don't accumulate across tests
Each test starts with clean state
OOM kills one test, not entire regression
Easy to identify memory-hungry tests

Streaming Verification Architecture

Process transactions without storing:

// Don't store transactions—stream them
class streaming_scoreboard extends uvm_scoreboard;
  
  // Minimal state: just what's needed for pending checks
  bit [15:0] pending_ids[$];
  
  function void expected(transaction t);
    pending_ids.push_back(t.id);
  endfunction
  
  function void actual(transaction t);
    int idx = pending_ids.find_first_index(x) with (x == t.id);
    if (idx >= 0) begin
      pending_ids.delete(idx);
      // Transaction checked and discarded—no storage
    end else begin
      `uvm_error("SB", "Unexpected transaction")
    end
  endfunction
  
  // Only store IDs, not full transactions
  // 16 bits vs 1KB+ per transaction = 64× less memory
  
endclass

Incremental Coverage

# Python: Stream coverage to database, don't accumulate in memory
import sqlite3

class StreamingCoverage:
    def __init__(self, db_path):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute('''CREATE TABLE IF NOT EXISTS coverage
                            (bin TEXT PRIMARY KEY, hits INTEGER)''')
    
    def sample(self, bin_name):
        # Write directly to DB, don't store in memory
        self.conn.execute(
            'INSERT OR REPLACE INTO coverage VALUES (?, 
             COALESCE((SELECT hits FROM coverage WHERE bin=?), 0) + 1)',
            (bin_name, bin_name))
    
    def commit(self):
        self.conn.commit()  # Periodic flush

Key Takeaways

SystemVerilog GC is primitive: No explicit control, no weak references, simulator-dependent
Leaks hide at block level: Only manifest in long-running SoC simulations
Five leak patterns: Unbounded queues, orphaned expectations, deep copies, circular references, config accumulation
Object pooling helps: Reuse objects instead of relying on GC
Hybrid languages offer solutions: Python for managed memory, C++ for explicit control
Profile before optimizing: Use simulator tools or DIY tracking to find actual leaks
Modern architectures: Cloud isolation, streaming verification, incremental coverage

Interview Corner

Q: Your simulation runs out of memory after 12 hours. How do you debug?

A: First, use simulator memory profiling to identify which components are growing. Common culprits: scoreboards storing history, coverage with exploding crosses, analysis subscribers cloning transactions. Check for unbounded queues and missing cleanup on error paths. Consider whether you need to store full transactions or just essential data (IDs, addresses). If the design requires long-running tests, consider hybrid approaches with Python/C++ for better memory control, or cloud-based isolation where each test runs in a fresh container.

Q: Why can't you just call delete in SystemVerilog like in C++?

A: SystemVerilog uses automatic garbage collection without explicit deallocation. The language designers chose this to prevent use-after-free bugs common in C/C++, but it means you can't force immediate cleanup. Objects become eligible for collection when no references remain, but when GC actually runs is simulator-dependent and unpredictable. This is why object pooling and hybrid language approaches are important for memory-sensitive applications.

Return to Software Engineering for DV

Memory Management: Surviving Long-Running Simulations