Architecture & Design Verification: 6. SystemC Tutorial

Introduction

In the semiconductor industry, verification is not a phase that happens after design — it is a parallel activity that starts the moment the first module is defined, and it consumes the majority of project resources. At Arm, verification engineers significantly outnumber RTL designers on complex blocks like the Cortex-A78 core. At Apple and Google, custom silicon teams for the M-series and Tensor processors each employ hundreds of verification engineers. The ratio is not 1:1 between designers and verification engineers — it is typically 2:1 or 3:1 on blocks that matter, because a silicon respin costs tens of millions of dollars and typically takes 6–12 months. One escaped logic bug can cost more than the entire verification effort that was supposed to catch it.

This is not abstract concern. The Intel Pentium FDIV bug (1994) cost Intel $475 million in chip replacements. The Qualcomm Snapdragon 810 thermal issues (2015) delayed multiple flagship phones. The Apple M1 "GoFetch" timing side-channel (2024) requires OS-level mitigations that reduce performance. Each of these had a verification gap somewhere — a scenario that was never tested, a case that was considered "unlikely," or a structural assumption in the testbench that prevented the bug from being visible.

Testbench structure is the discipline that closes these gaps systematically. A well-structured testbench separates concerns: one component generates stimulus, one component observes outputs, one component checks correctness. When these concerns are mixed together in one monolithic function — as they were in our Post 5 sc_main — a bug in any one of them can contaminate all three. Worse, a testbench that mixes stimulus and checking often can't tell the difference between "the DUT is wrong" and "the testbench expected the wrong thing." The verification result becomes ambiguous.

Post 6 restructures the ALU test into a proper three-component testbench, adds passive monitoring, and introduces sc_trace for VCD waveform generation. The architecture we build here is not scaffolding — it is the permanent testbench pattern for this series. When we add UVM infrastructure in Posts 23–27, the structure stays identical. The vocabulary changes (monitor becomes uvm_monitor, stimulus driver becomes uvm_sequence), but the separation of concerns is the same.

Prerequisites

Completed Post 5 — the RV32I ALU module and its alu_op_t encoding
Post 5 — Building the RV32I ALU
Code for this post: GitHub — section1/post06

SystemC Language Reference

Quick-reference for the testbench infrastructure constructs introduced in this post:

Construct	Syntax	SV Equivalent	Key Difference
Create VCD trace file	`sc_create_vcd_trace_file("name")`	`$dumpfile("name.vcd")`	Returns a handle; must be called before `sc_start()`
Set VCD time unit	`tf->set_time_unit(1, SC_NS)`	Implicit from timescale	Explicit control over VCD timestamp resolution
Trace a signal	`sc_trace(tf, signal, "name")`	`$dumpvars(0, module)`	Per-signal registration; only `sc_signal<T>` supported
Close trace file	`sc_close_vcd_trace_file(tf)`	`$dumpflush` at end	Must call after `sc_start()` or VCD is incomplete
FIFO channel	`sc_fifo<T> fifo("name", depth)`	`mailbox #(T)`	Blocking read/write by default; depth 1 if omitted
FIFO output port	`sc_fifo_out<T> port`	`mailbox #(T)` handle	`write()` blocks if FIFO full
FIFO input port	`sc_fifo_in<T> port`	`mailbox #(T)` handle	`read()` blocks if FIFO empty
Non-blocking write	`fifo_port.nb_write(val)`	`mailbox.try_put(val)`	Returns `bool` — false if FIFO full
Non-blocking read	`fifo_port.nb_read(val)`	`mailbox.try_get(val)`	Returns `bool` — false if FIFO empty
FIFO depth query	`fifo.num_available()`	`mailbox.num()`	Returns count of items currently in FIFO
Bind FIFO to ports	`mon.txn_out(txn_fifo)`	Implicit via handles	Same binding syntax as `sc_signal`
Wait for zero time	`sc_start(SC_ZERO_TIME)`	`#0` or `##0`	Runs one delta cycle; no time advance
Stop simulation	`sc_stop()`	`$finish`	Call from testbench thread; DUT never calls this

Testbench Architecture — The Formal Structure

The Canonical Three-Module Pattern

Professional SystemC verification environments use a canonical structure that maps directly to UVM agents. Understanding this structure as a formal pattern — not just as "how this particular testbench happens to be organized" — is essential for scaling to complex designs.

tb_top (SC_MODULE, or just sc_main)
├── dut (device under test — SC_METHOD processes)
├── stimulus driver (SC_THREAD — drives inputs, controls test lifecycle)
├── monitor (SC_METHOD or SC_THREAD — observes outputs, records transactions)
└── checker/scoreboard (SC_THREAD — compares against golden reference)

The monitor and checker communicate through a channel (here: sc_fifo). The driver and checker communicate through a shared reference (or via a prediction FIFO). The DUT communicates with all other modules only through sc_signal connections — it has no direct knowledge of the testbench.

Mapping to SystemVerilog:

module tb_top;
  // Signals (equivalent to sc_signal in SystemC)
  logic clk, rst;
  logic [31:0] a, b, result;
  logic [3:0]  op;
  logic        zero;

  // DUT instantiation (same in both)
  alu dut_inst(.a(a), .b(b), .op(op), .result(result), .zero(zero));

  // SV equivalent of SC_THREAD stimulus driver
  initial begin
    // Drive inputs, sequence through tests
    op = 4'd0; a = 32'd5; b = 32'd3;
    #10;
    // ... more stimulus ...
    $finish;  // equivalent to sc_stop()
  end

  // SV equivalent of SC_METHOD monitor
  always @(result or zero) begin  // fires when outputs change
    // Record transaction — in SV often uses mailbox or checker task
    check_result(result, zero);
  end

  // SV checker (often a separate module or task)
  task automatic check_result(input logic [31:0] got_result, input logic got_zero);
    // compare got_result against expected_queue.pop()
  endtask
endmodule

The SystemC version is structurally identical. The key difference: in SystemC, the monitor, checker, and driver are separate SC_MODULE instances connected through channels. In SystemVerilog, they are typically initial/always blocks in the same module or tasks called from those blocks. SystemC enforces the architectural separation at the language level.

Mapping to UVM:

Three-Module SystemC Pattern	UVM Equivalent
Stimulus driver SC_THREAD	`uvm_driver` + `uvm_sequence`
Monitor SC_METHOD	`uvm_monitor` with TLM analysis port
Checker SC_THREAD	`uvm_scoreboard`
`sc_fifo<alu_txn>`	`uvm_tlm_analysis_fifo`
`load_expected()`	Scoreboard prediction from reference model
`sc_stop()`	`phase.drop_objection()`

The Three-Component Pattern

Every testbench in this series follows the same three-component structure:

graph TD
    SD["Stimulus Driver\n(SC_THREAD)\nDrives DUT inputs\nControls test sequence"] -->|writes| SIG["sc_signals\n(shared bus)"]
    SIG -->|inputs| DUT["DUT\n(ALU, register file, etc.)"]
    DUT -->|outputs| SIG2["sc_signals\n(output bus)"]
    SIG2 -->|reads| MON["Monitor\n(SC_THREAD)\nPassively observes\nRecords transactions"]
    SIG2 -->|reads| CHK["Checker\n(SC_THREAD)\nCompares vs expected\nReports pass/fail"]
    style SD fill:#f59e0b,color:#fff
    style MON fill:#06b6d4,color:#fff
    style CHK fill:#10b981,color:#fff
    style DUT fill:#334155,color:#fff

Stimulus Driver — the only component that writes to DUT input signals. It sequences through test vectors, applies inputs in order, and calls sc_stop() when done. It is an SC_THREAD because it sequences operations over time (wait() between stimulus changes). It knows nothing about expected outputs.

Monitor — a passive observer. It reads DUT output signals, timestamps observations, and records transactions to a log. It writes to the log, not to any signal that the DUT sees. A monitor that writes to a DUT input signal is not a monitor — it is a second driver, and two drivers on one signal produces a multiple-driver error. The monitor fires on DUT output signal changes using sensitivity to those signals.

Checker — compares the monitor's observed values against a golden reference. It holds an expected-value queue, dequeues expected values as the monitor observes actuals, and reports pass/fail. The checker is also read-only relative to the DUT — it never drives signals. It can call sc_assert() to halt simulation on a failure, or it can count errors and report a summary at the end.

Why this separation matters: When a test fails, you want to know immediately whether the failure is in the DUT logic, the expected values in the checker, or the stimulus generation. With mixed code, a wrong expected value and a wrong DUT output produce the same symptom — a mismatch — and you cannot distinguish them without reading all the code. With separated components, a failure in the checker is isolated to the checker's expected-value table. A failure in the DUT is isolated to the DUT logic. The stimulus driver is never suspected unless the monitor records impossible values.

sc_fifo — The Monitor-to-Checker Channel

Theory: sc_fifo as a Blocking Channel

sc_fifo<T> is a SystemC primitive channel that implements a first-in, first-out buffer with blocking semantics:

write(val) — if the FIFO has space, writes val and returns immediately. If the FIFO is full, the calling process blocks until space is available (another process reads from it).
read() — if the FIFO has data, reads and returns the next item immediately. If the FIFO is empty, the calling process blocks until data is written (another process writes to it).

This blocking behavior makes sc_fifo ideal for producer-consumer communication between concurrent processes — exactly the monitor-to-checker relationship.

Comparison to SV mailbox:

Concept	SystemC `sc_fifo<T>`	SystemVerilog `mailbox #(T)`
Create	`sc_fifo<T> fifo("name", depth)`	`mailbox #(T) mb = new(depth)`
Write (blocking)	`fifo_out.write(val)`	`mb.put(val)` (blocks if bounded+full)
Read (blocking)	`fifo_in.read()`	`mb.get(val)` (blocks if empty)
Write (non-blocking)	`fifo_out.nb_write(val)` — returns bool	`mb.try_put(val)` — returns bool
Read (non-blocking)	`fifo_in.nb_read(val)` — returns bool	`mb.try_get(val)` — returns bool
Items available	`fifo.num_available()`	`mb.num()`
Unbounded depth	Not supported (must specify depth)	`new()` with no arg = unbounded
Bind to port	`module.port(fifo)` at elaboration	Pass mailbox handle in constructor

The key difference: in SystemC, sc_fifo is a channel that is bound to module ports (sc_fifo_in<T> and sc_fifo_out<T>) at elaboration time — the same way sc_signal is bound to sc_in/sc_out. In SV, a mailbox is passed as an object handle (constructor argument or config DB). The SystemC approach enforces structural connections at compile time; the SV approach allows more dynamic reconfiguration.

sc_mutex and sc_semaphore — two additional SystemC synchronization primitives:

sc_mutex — mutual exclusion lock. Analogous to SV semaphore initialized to 1. Two processes cannot both hold the mutex simultaneously. mutex.lock() blocks if already locked. mutex.unlock() releases it.
sc_semaphore — counting semaphore initialized to N. sem.wait() decrements (blocks if 0). sem.post() increments. Analogous to SV semaphore initialized to N.

For the ALU testbench, sc_fifo is sufficient. For testbenches with shared resources (a shared memory model, a register file accessed by multiple bus functional models), sc_mutex prevents concurrent corruption.

sc_trace and VCD Waveforms

Theory: How sc_trace Works

sc_trace hooks into the SystemC delta-cycle mechanism. Each time a traced sc_signal<T> changes value during the update phase, the new value is written to the VCD file with the current simulation timestamp. The VCD file records only changes — if a signal holds the same value for 1000 ns, only one entry is written (the initial assignment).

Mandatory ordering: sc_trace() calls must occur before sc_start(). The kernel sets up the tracing infrastructure during elaboration. After sc_start(), the list of traced signals is fixed. Calling sc_trace() after simulation starts either silently does nothing or asserts, depending on the SystemC implementation version.

// CORRECT ORDER — trace setup before simulation:
sc_trace_file* tf = sc_create_vcd_trace_file("alu_waveform");
tf->set_time_unit(1, SC_NS);
sc_trace(tf, sig_a,      "ALU.a");       // register sig_a for tracing
sc_trace(tf, sig_result, "ALU.result");  // register sig_result
// ... more sc_trace calls ...

sc_start();    // simulation runs — changes are written to VCD

sc_close_vcd_trace_file(tf);  // flush VCD buffers, close file

What can be traced: Only sc_signal<T> objects can be traced. You cannot directly trace a plain C++ variable, a local variable inside a process, or a module member variable that is not wrapped in sc_signal<T>. If you need to observe internal DUT state in waveforms, add sc_signal<T> members to the module for that state, and update them in the process alongside the real computation.

Comparison to SV:

// SV: dumps ALL variables in the hierarchy
$dumpfile("out.vcd");
$dumpvars(0, tb_top);   // 0 = all levels, tb_top = root

// SystemC: explicit per-signal registration
sc_trace(tf, sig_a, "ALU.a");           // trace sig_a
sc_trace(tf, sig_b, "ALU.b");           // trace sig_b
sc_trace(tf, sig_result, "ALU.result"); // trace sig_result

SV's $dumpvars is simpler but less controlled — it captures everything, including internal signals you may not care about, potentially producing enormous VCD files. SystemC's explicit registration is more verbose but gives precise control over what is recorded. For large designs, this matters: tracing every signal in a 10,000-flop design produces VCD files measured in gigabytes.

GTKWave commands:

# View waveform — standard command
gtkwave alu_waveform.vcd &

# Append .vcd if your version needs it
gtkwave alu_waveform.vcd.vcd &

# Open with specific signal file (saves selected signals between sessions)
gtkwave alu_waveform.vcd -S signals.gtkw &

In GTKWave, you will see each traced signal as a track. sig_op shows the opcode changing at each test step. sig_result shows the ALU output updating after each opcode/operand change (one delta cycle later). sig_zero pulses high during tests where the result is zero. The hierarchy name you passed to sc_trace() (e.g., "ALU.a") appears as the signal name in the signal list panel.

Simulation Semantics

How the Three-Component Testbench Executes — Step by Step

Understanding the execution order within a delta cycle prevents the most common race conditions in testbench design.

Elaboration (before sc_start()):

1. sc_main allocates: sig_a, sig_b, sig_result, sig_op, sig_zero (sc_signals)
2. sc_main allocates: txn_fifo(32) — sc_fifo with depth 32
3. Instantiate: alu dut, alu_monitor mon, alu_checker chk, alu_driver drv
4. Bind ports to signals: dut.a(sig_a), mon.result(sig_result), drv.a(sig_a), etc.
5. Bind FIFO: mon.txn_out(txn_fifo), chk.txn_in(txn_fifo)
6. Register sc_trace: before sc_start()
7. All SC_METHOD/SC_THREAD processes are registered

sc_start() — first delta cycle:

8. SC_THREAD processes (driver, checker) start immediately
9. Driver: calls drive() which calls a.write(5), b.write(3), op.write(ADD)
   These writes go to signal "pending new value" buffers
10. Driver: calls wait(10, SC_NS) — fiber suspends, kernel gets control
11. Kernel: update phase — sig_a, sig_b, sig_op become visible
12. Monitor SC_METHOD: NOT sensitive to inputs — waits for result to change
13. ALU SC_METHOD: sensitive to a, b, op — fires in evaluate phase
14. ALU: reads new a=5, b=3, op=ADD; writes result=8, zero=false
15. Kernel: update phase — sig_result, sig_zero become visible
16. Monitor SC_METHOD: result changed — fires in evaluate phase
17. Monitor: reads a=5, b=3, op=ADD, result=8, zero=false; creates alu_txn
18. Monitor: calls txn_out.write(txn) — writes to sc_fifo
19. Checker SC_THREAD: blocked on txn_in.read() — wakes up when Monitor writes
20. Checker: dequeues expected value, compares, prints PASS/FAIL

ASCII timing — one test transaction:

t=0 (sc_start called)
│
├── Δ0: Driver thread starts
│       a.write(5), b.write(3), op.write(ADD) → pending
│       wait(10, SC_NS) → driver suspends
│
├── Δ1: Update: sig_a=5, sig_b=3, sig_op=ADD visible
│       ALU SC_METHOD scheduled (a,b,op in sensitivity list)
│
├── Δ2: ALU compute() fires
│       result=8, zero=false → pending
│
├── Δ3: Update: sig_result=8, sig_zero=false visible
│       Monitor SC_METHOD scheduled (result in sensitivity list)
│
├── Δ4: Monitor observe() fires
│       Creates alu_txn{a=5,b=3,op=ADD,result=8,zero=false}
│       txn_out.write(txn) → FIFO
│       Checker unblocks from txn_in.read()
│       Checker: expected={8, false} → PASS
│
t=10ns: Driver resumes
        Next stimulus written...

This is a critical insight: the checker does not fire until delta cycle 4 of the stimulus write — four evaluate-update cycles after the inputs were written. This is deterministic and correct. The pipeline is: input write → signal update → ALU compute → result update → monitor observe → FIFO write → checker read.

What happens if the monitor were sensitive to inputs instead of outputs:

If the monitor's sensitivity were sensitive << a << b << op instead of sensitive << result, it would fire at Δ1 — before the ALU has computed the new result. The transaction recorded would have the new inputs but the old result. This is the "input-sensitive monitor" bug: observes input changes, reports stale output values.

SC_THREAD for Stimulus Generation — The Right Pattern

The Synchronization-First Convention

For testbenches that drive clocked DUTs, there is a standard pattern for the stimulus SC_THREAD that every DV engineer should internalize:

void stimulus_proc() {
    // Phase 1: Reset
    rst.write(true);
    wait(3, SC_NS);           // hold reset for 3ns
    rst.write(false);
    wait(clk.posedge_event()); // synchronize to first clock edge after reset

    // Phase 2: Drive stimulus
    for (int i = 0; i < 256; i++) {
        a.write(i);
        wait(clk.posedge_event()); // hold each value for exactly one clock cycle
    }

    // Phase 3: Drain and stop
    wait(5, SC_NS);           // allow any pipeline to drain
    sc_stop();                // signal simulation end
}

Comparison to SystemVerilog:

// SV equivalent: identical structure to the SystemC version
initial begin
    // Phase 1: Reset
    rst = 1;
    #3;                      // wait 3ns
    rst = 0;
    @(posedge clk);          // sync to first clock edge

    // Phase 2: Drive stimulus
    for (int i = 0; i < 256; i++) begin
        a = i;
        @(posedge clk);      // hold for one cycle
    end

    // Phase 3: Drain and stop
    #5;
    $finish;                 // equivalent to sc_stop()
end

The structures are nearly identical. The differences are syntax: wait(clk.posedge_event()) vs @(posedge clk), and sc_stop() vs $finish. The conceptual pattern — reset, synchronize, drive, drain, stop — is universal.

clk.posedge_event() vs wait(clk.pos()):

Both wait for the posedge of clk, but they are accessed differently:
- wait(clk.posedge_event()) — calls the posedge_event() method on the sc_in<bool> port, which returns the posedge sc_event
- Inside SC_CTHREAD(proc, clk.pos()), bare wait() is equivalent

For stimulus threads registered as plain SC_THREAD (not SC_CTHREAD), use wait(clk.posedge_event()) for explicit posedge synchronization.

Implementation

Complete Testbench

The monitor and checker communicate through a shared transaction log — a simple struct passed via sc_fifo (a SystemC FIFO channel). The monitor records observed transactions to the FIFO. The checker reads from the FIFO and compares against its expected-value queue.

// File: alu_structured_tb.cpp
// Three-component testbench: Stimulus Driver + Monitor + Checker
// DUT: RV32I ALU from Post 5
#include <systemc.h>
#include <iostream>
#include <iomanip>
#include <deque>

// ─── ALU operation encoding (from Post 5) ────────────────────────────────────
enum alu_op_t {
    ALU_ADD=0, ALU_SUB=1, ALU_AND=2, ALU_OR=3, ALU_XOR=4,
    ALU_SLT=5, ALU_SLTU=6, ALU_SLL=7, ALU_SRL=8, ALU_SRA=9
};

// ─── ALU module (from Post 5 — unchanged) ────────────────────────────────────
SC_MODULE(alu) {
    sc_in<sc_uint<32>>  a, b;
    sc_in<sc_uint<4>>   op;
    sc_out<sc_uint<32>> result;
    sc_out<bool>        zero;

    void compute() {
        sc_uint<32> res = 0;
        switch ((int)op.read()) {
            case ALU_ADD:  res = a.read() + b.read(); break;
            case ALU_SUB:  res = a.read() - b.read(); break;
            case ALU_AND:  res = a.read() & b.read(); break;
            case ALU_OR:   res = a.read() | b.read(); break;
            case ALU_XOR:  res = a.read() ^ b.read(); break;
            case ALU_SLT:  res = ((sc_int<32>)a.read() < (sc_int<32>)b.read()) ? 1 : 0; break;
            case ALU_SLTU: res = (a.read() < b.read()) ? 1 : 0; break;
            case ALU_SLL:  res = a.read() << b.read().range(4,0); break;
            case ALU_SRL:  res = a.read() >> b.read().range(4,0); break;
            case ALU_SRA:  res = (sc_uint<32>)((sc_int<32>)a.read() >> b.read().range(4,0)); break;
            default:       res = 0; break;
        }
        result.write(res);
        zero.write(res == 0);
    }

    SC_CTOR(alu) {
        SC_METHOD(compute);
        sensitive << a << b << op;
    }
};

// ─── Transaction record — passed from Monitor to Checker ─────────────────────
struct alu_txn {
    sc_uint<32> a, b, result;
    sc_uint<4>  op;
    bool        zero;
    sc_time     timestamp;
};

// ─── Monitor ──────────────────────────────────────────────────────────────────
// PASSIVE — never drives any signal. Observes result and zero outputs.
// Records a transaction every time the result signal changes.
// Communicates observed transactions to the Checker via sc_fifo.

SC_MODULE(alu_monitor) {
    sc_in<sc_uint<32>> a, b, result;
    sc_in<sc_uint<4>>  op;
    sc_in<bool>        zero;

    sc_fifo_out<alu_txn> txn_out;  // sends observed transactions to Checker

    // Fire every time the result changes — passive observation only
    void observe() {
        alu_txn t;
        t.a         = a.read();
        t.b         = b.read();
        t.op        = op.read();
        t.result    = result.read();
        t.zero      = zero.read();
        t.timestamp = sc_time_stamp();

        std::cout << "[MON " << t.timestamp << "]"
                  << "  op=" << (int)t.op
                  << "  a=0x"  << std::hex << (uint32_t)t.a
                  << "  b=0x"  << (uint32_t)t.b
                  << "  res=0x" << (uint32_t)t.result
                  << "  zero=" << std::dec << t.zero
                  << std::endl;

        txn_out.write(t);   // non-blocking — sc_fifo buffers it
    }

    SC_CTOR(alu_monitor) {
        SC_METHOD(observe);
        sensitive << result;  // fire when result changes — passive trigger
        // Note: NOT sensitive to a, b, op directly — we observe outputs, not inputs
    }
};

// ─── Checker ──────────────────────────────────────────────────────────────────
// PASSIVE — never drives any signal. Reads from Monitor's transaction FIFO.
// Compares each observed transaction against a pre-loaded expected-value queue.
// Reports pass/fail per transaction and a final summary.

SC_MODULE(alu_checker) {
    sc_fifo_in<alu_txn> txn_in;   // receives observed transactions from Monitor

    struct expected_t {
        sc_uint<32> result;
        bool        zero;
        const char* description;
    };

    std::deque<expected_t> expected_queue;
    int pass_count = 0;
    int fail_count = 0;

    // Load expected values before simulation starts
    void load_expected(sc_uint<32> result, bool zero, const char* desc) {
        expected_queue.push_back({result, zero, desc});
    }

    void check_loop() {
        while (true) {
            alu_txn t = txn_in.read();   // blocks until Monitor writes a transaction

            if (expected_queue.empty()) {
                std::cout << "[CHK ERROR] Received unexpected transaction at "
                          << t.timestamp << std::endl;
                fail_count++;
                continue;
            }

            expected_t exp = expected_queue.front();
            expected_queue.pop_front();

            bool result_ok = (t.result == exp.result);
            bool zero_ok   = (t.zero   == exp.zero);
            bool ok        = result_ok && zero_ok;

            std::cout << "[CHK " << (ok ? "PASS" : "FAIL") << "]  "
                      << exp.description;
            if (!ok) {
                std::cout << "  got result=0x" << std::hex << (uint32_t)t.result
                          << " zero=" << std::dec << t.zero
                          << "  expected result=0x" << std::hex << (uint32_t)exp.result
                          << " zero=" << std::dec << exp.zero;
            }
            std::cout << std::endl;

            if (ok) pass_count++; else fail_count++;
        }
    }

    void report() {
        // Called after simulation ends — print final summary
        std::cout << "\n=== Checker Summary: "
                  << pass_count << " PASS, "
                  << fail_count << " FAIL ===" << std::endl;
        if (!expected_queue.empty()) {
            std::cout << "[CHK WARNING] " << expected_queue.size()
                      << " expected transactions were never observed" << std::endl;
        }
    }

    SC_CTOR(alu_checker) {
        SC_THREAD(check_loop);   // blocks on txn_in.read() — must be SC_THREAD
    }
};

// ─── Stimulus Driver ──────────────────────────────────────────────────────────
// The ONLY component that drives DUT input signals.
// Sequences through test vectors, waits between each, calls sc_stop() at end.
// Registers expected values in the Checker BEFORE applying stimulus — this is
// the "predict then observe" pattern used in UVM scoreboards.

SC_MODULE(alu_driver) {
    sc_out<sc_uint<32>> a, b;
    sc_out<sc_uint<4>>  op;

    alu_checker* chk;   // reference to checker — to load expected values

    void drive(sc_uint<32> a_val, sc_uint<32> b_val, alu_op_t op_val,
               sc_uint<32> expected_result, bool expected_zero,
               const char* desc) {
        // Load expected value into checker BEFORE driving stimulus
        chk->load_expected(expected_result, expected_zero, desc);

        // Apply stimulus
        a.write(a_val);
        b.write(b_val);
        op.write(op_val);
        wait(10, SC_NS);   // advance time — monitor fires when result settles
    }

    void run() {
        // ── Test sequence ─────────────────────────────────────────────────────
        drive(5,          3,          ALU_ADD,  8,          false, "ADD: 5+3=8");
        drive(0xFFFFFFFF, 1,          ALU_ADD,  0,          true,  "ADD: overflow wraps to 0");
        drive(10,         3,          ALU_SUB,  7,          false, "SUB: 10-3=7");
        drive(5,          5,          ALU_SUB,  0,          true,  "SUB: 5-5=0, BEQ check");
        drive(0xFF00FF00, 0x0F0F0F0F, ALU_AND,  0x0F000F00, false, "AND: mask operation");
        drive(0xFF000000, 0x00FF0000, ALU_OR,   0xFFFF0000, false, "OR:  combine flags");
        drive(0xDEADBEEF, 0xDEADBEEF, ALU_XOR,  0,         true,  "XOR: x^x=0 zero idiom");
        drive(0xFFFFFFFF, 1,          ALU_SLT,  1,          false, "SLT: -1 < 1 signed");
        drive(0xFFFFFFFF, 1,          ALU_SLTU, 0,          true,  "SLTU: 0xFFFF > 1 unsigned");
        drive(1,          4,          ALU_SLL,  16,         false, "SLL: 1<<4=16");
        drive(1,          31,         ALU_SLL,  0x80000000, false, "SLL: 1<<31=MSB");
        drive(0x80000000, 1,          ALU_SRL,  0x40000000, false, "SRL: MSB not preserved");
        drive(0x80000000, 1,          ALU_SRA,  0xC0000000, false, "SRA: sign-extend 0x80000000>>1");
        drive(0xFFFFFFFF, 4,          ALU_SRA,  0xFFFFFFFF, false, "SRA: all-ones stays all-ones");

        std::cout << "\n[DRV] All stimulus applied. Stopping simulation." << std::endl;
        wait(SC_ZERO_TIME);  // allow last transaction to be processed by monitor/checker
        sc_stop();
    }

    SC_CTOR(alu_driver) {
        SC_THREAD(run);
        chk = nullptr;  // set in sc_main after construction
    }
};

// ─── sc_main ──────────────────────────────────────────────────────────────────

int sc_main(int argc, char* argv[]) {
    // Signals connecting all components
    sc_signal<sc_uint<32>> sig_a, sig_b, sig_result;
    sc_signal<sc_uint<4>>  sig_op;
    sc_signal<bool>        sig_zero;

    // FIFO connecting Monitor → Checker (depth 32 — enough for test sequence)
    sc_fifo<alu_txn> txn_fifo(32);

    // Instantiate components
    alu        dut("dut");
    alu_monitor mon("monitor");
    alu_checker chk("checker");
    alu_driver  drv("driver");

    // Wire DUT
    dut.a(sig_a);  dut.b(sig_b);  dut.op(sig_op);
    dut.result(sig_result);  dut.zero(sig_zero);

    // Wire Monitor (read-only connections to DUT outputs + inputs for logging)
    mon.a(sig_a);  mon.b(sig_b);  mon.op(sig_op);
    mon.result(sig_result);  mon.zero(sig_zero);
    mon.txn_out(txn_fifo);

    // Wire Checker (receives from Monitor via FIFO)
    chk.txn_in(txn_fifo);

    // Wire Driver (drives DUT inputs only)
    drv.a(sig_a);  drv.b(sig_b);  drv.op(sig_op);
    drv.chk = &chk;  // give driver reference to checker for expected-value loading

    // ── VCD Trace Setup ────────────────────────────────────────────────────
    sc_trace_file* tf = sc_create_vcd_trace_file("alu_waveform");
    tf->set_time_unit(1, SC_NS);
    sc_trace(tf, sig_a,      "ALU.a");
    sc_trace(tf, sig_b,      "ALU.b");
    sc_trace(tf, sig_op,     "ALU.op");
    sc_trace(tf, sig_result, "ALU.result");
    sc_trace(tf, sig_zero,   "ALU.zero");

    // ── Run ────────────────────────────────────────────────────────────────
    sc_start();   // runs until sc_stop() called by driver
    sc_close_vcd_trace_file(tf);

    // Print checker summary after simulation ends
    chk.report();

    return (chk.fail_count > 0) ? 1 : 0;
}

Build & Run

# CMakeLists.txt in section1/post06/
cmake_minimum_required(VERSION 3.16)
project(post06_testbench)

set(SYSTEMC_HOME $ENV{SYSTEMC_HOME})
include_directories(${SYSTEMC_HOME}/include)
link_directories(${SYSTEMC_HOME}/lib-linux64)

add_executable(alu_structured_tb alu_structured_tb.cpp)
target_link_libraries(alu_structured_tb systemc)

mkdir build && cd build
cmake .. && make
./alu_structured_tb
gtkwave alu_waveform.vcd &   # open waveform viewer

Expected console output (abbreviated):

[MON 10 ns]  op=0  a=0x00000005  b=0x00000003  res=0x00000008  zero=0
[CHK PASS]  ADD: 5+3=8
[MON 20 ns]  op=0  a=0xffffffff  b=0x00000001  res=0x00000000  zero=1
[CHK PASS]  ADD: overflow wraps to 0
...
[DRV] All stimulus applied. Stopping simulation.

=== Checker Summary: 14 PASS, 0 FAIL ===

The Monitor Is Read-Only — Why This Rule Is Absolute

The single most important architectural constraint in this testbench pattern: a monitor never drives any signal that the DUT can observe.

Consider what happens if a monitor drives a signal:
- The DUT responds to the monitor's drive, not just the stimulus driver's drive
- The monitor is now part of the stimulus — it is no longer passive
- A bug in the monitor can mask a bug in the DUT
- A bug in the DUT can cause the monitor to drive incorrect values, which then "fix" the DUT's wrong behavior in a later cycle

This sounds unlikely, but it happens regularly in testbenches written under time pressure. The monitor has access to all the signals. It's tempting to add a "correction" or "initialization" write from inside the monitor. The moment you do this, your testbench is no longer a verification environment — it is a co-simulator that contributes to the DUT's behavior.

The rule is enforced structurally in UVM: uvm_monitor only has virtual interface connections, and those are declared const in the monitor class. In our SystemC testbench, it is enforced by convention — the monitor's ports are all sc_in (read-only), never sc_out. If you catch yourself writing sc_out in a monitor module, stop and reconsider.

Common Pitfalls for SV Engineers

Pitfall 1: sc_trace Called After sc_start — Silent No-Op or Assertion

SV engineers are used to $dumpvars being callable at any point during simulation, even conditionally inside initial blocks that run mid-test. SystemC's sc_trace does not work this way. All sc_trace() calls must occur before sc_start() — during elaboration. Calling sc_trace() after sc_start() either does nothing or asserts, depending on the SystemC version.

sc_start();  // simulation begins

// WRONG: sc_trace called after sc_start — no-op or assertion
sc_trace(tf, sig_result, "ALU.result");

// CORRECT pattern: all tracing setup before sc_start()
sc_trace_file* tf = sc_create_vcd_trace_file("out");
sc_trace(tf, sig_result, "ALU.result");
sc_start();
sc_close_vcd_trace_file(tf);

If your waveform file is empty or missing signals you expect to see, this is the first thing to check.

Pitfall 2: sc_fifo Depth 1 — Stimulus Blocks When Monitor Is Slow

The default sc_fifo constructor creates a FIFO with depth 1. If you write sc_fifo<alu_txn> txn_fifo; without specifying depth, you get a depth-1 FIFO. In this testbench, the monitor writes one transaction per ALU result change. If the checker's check_loop is not scheduled before the monitor tries to write its second transaction, the monitor's txn_out.write() blocks — stalling the monitor, which in turn stalls the delta cycle progression.

// DANGEROUS: depth-1 FIFO
sc_fifo<alu_txn> txn_fifo;         // depth = 1

// SAFE: explicit depth sized for the test sequence
sc_fifo<alu_txn> txn_fifo(32);     // depth = 32, room for all 14 transactions

Size the FIFO to at least the maximum number of transactions that can be in-flight between producer (monitor) and consumer (checker) at any point.

Pitfall 3: SC_THREAD Stimulus Without sc_stop — Simulation Runs Forever

When the driver's run() function exits without calling sc_stop(), the SC_THREAD terminates but sc_start() in sc_main continues waiting for events. The simulation appears to complete (the last [DRV] message prints) but the program never exits. CPU usage holds at 100% (or near it for the event scheduler).

void run() {
    drive(5, 3, ALU_ADD, 8, false, "ADD");
    // ... more drives ...
    // MISSING: sc_stop()
    // Program hangs here — sc_start() never returns
}

The fix is always sc_stop() at the end of the controlling thread. For testbenches with multiple drivers, make one thread the "controller" that calls sc_stop() after all others complete (using an sc_event or sc_mutex to synchronize).

Pitfall 4: Monitor SC_METHOD Fires on Every Clock Edge — Pipeline Latency Mismatch

For a clocked DUT (unlike our combinational ALU), making the monitor sensitive to the clock edge instead of the output signal creates a pipeline latency problem. If the DUT has 2 cycles of latency, the monitor fires on every clock edge and records the output value — but for the first 2 cycles after a new input, the output still reflects the previous input.

// DANGEROUS for pipelined DUTs: monitor fires on clock, not on output change
SC_METHOD(observe);
sensitive << clk.pos();  // fires every cycle — may observe stale outputs

// SAFER: monitor fires when the output actually changes
SC_METHOD(observe);
sensitive << result;  // fires only when result changes to a new value

For pipelined DUTs, the correct pattern is to fire the monitor on the output signal change AND verify the timestamp is consistent with the expected pipeline depth.

Pitfall 5: sc_close_vcd_trace_file Must Be Called — Incomplete VCD

If sc_close_vcd_trace_file(tf) is not called (for example, if the program exits via an exception or an early return), the VCD file may be incomplete. VCD files have a header section that can reference later parts of the file. An improperly closed VCD may have a valid header but truncated data, causing GTKWave to open it but show no waveforms or to crash.

// WRONG: VCD file not properly closed
sc_start();
return (chk.fail_count > 0) ? 1 : 0;  // forgot sc_close_vcd_trace_file(tf)

// CORRECT: always close before return
sc_start();
sc_close_vcd_trace_file(tf);   // flush and finalize VCD
chk.report();
return (chk.fail_count > 0) ? 1 : 0;

For exception-safe cleanup, use RAII: wrap the trace file in a class whose destructor calls sc_close_vcd_trace_file, or use a try/finally pattern (C++ lacks finally natively, but std::unique_ptr with a custom deleter works).

DV Insight

DV Insight Two practical warnings from real verification projects.

VCD files grow fast. For a 10-signal design running 1 million clock cycles, the VCD file can reach 50–100 MB. At 1 billion cycles (typical for full regression), you are looking at tens of gigabytes. For the ALU test above, 14 transactions at 10ns each = 140ns of simulation — the VCD will be kilobytes. But as we add the full CPU pipeline, pipeline stalls, and multi-thousand-instruction programs in Posts 18–30, the VCD grows with every added signal and every added cycle. Two mitigation strategies: (1) only enable VCD in debug builds, not regression builds — controlled by a command-line flag or #ifdef ENABLE_VCD; (2) limit tracing to a failure window — start recording only when the checker reports a mismatch, using sc_trace_delta_cycles(tf, false) to reduce noise.

The sc_fifo depth matters. The sc_fifo<alu_txn> txn_fifo(32) line sets the FIFO depth to 32. If the monitor produces transactions faster than the checker consumes them (possible if the checker is doing expensive computation), the FIFO fills and the monitor's txn_out.write() blocks. For the simple ALU test this is not a problem. For a high-throughput bus functional model running thousands of transactions, a full FIFO causes the monitor to stall, which in turn causes the sensitivity-driven SC_METHOD to stall the simulation. Size your FIFOs with headroom, or use unbounded queues (std::deque) for record-keeping in complex scenarios.

Integration

This testbench pattern — monitor, checker, driver, with FIFO communication — is the permanent architecture for this series.

Why this maps directly to UVM: The UVM agent is exactly this pattern with a more formal structure. The uvm_driver drives the DUT. The uvm_monitor observes outputs. The uvm_scoreboard (our "checker") compares against a reference model. The uvm_sequence is the ordered list of test vectors (our drive() call sequence). When we add UVM infrastructure in Posts 23–27, the code we write here does not become obsolete — it becomes the reference implementation that the UVM version is validated against.

Post 7 (Capstone 1): This testbench becomes the foundation for the Section 1 capstone. The capstone extends the test suite to cover all edge cases — overflow, sign extension, shift boundary conditions, and the zero flag — and produces a final pass/fail report with a "Section 1 Complete" banner.

Posts 23–27 (UVM): The alu_monitor becomes a uvm_monitor with TLM analysis ports. The alu_checker becomes a uvm_scoreboard with a reference model. The alu_driver becomes a uvm_driver driven by a uvm_sequence. The sc_fifo becomes a uvm_tlm_analysis_fifo. Same structure, formal infrastructure.

Series progress:
- Post 1 — Modules, Ports & Signals ✓
- Post 2 — Simulation Time & Clocks ✓
- Post 3 — Delta Cycles & Event-Driven Semantics ✓
- Post 4 — SC_METHOD vs SC_THREAD ✓
- Post 5 — Building the RV32I ALU ✓
- Post 6 — Your First SystemC Testbench ✓
- Post 7 — Capstone 1: Complete ALU Testbench (next)

What's Next

Post 7: Capstone 1 — Complete ALU Testbench

Post 7 closes Section 1 with a capstone that proves everything from Posts 1–6 works together. It extends the test suite to full edge-case coverage, adds the complete Section 1 component chain in a diagram, and ends with a checklist of what you can now do independently.

After Post 7, every module you write in this series — register file, decoder, ALU control, each pipeline stage — will get the same structured testbench treatment. The patterns are established. The discipline is set. The remaining posts are about building more complex hardware on a solid foundation.

Post 7 → Capstone 1: Complete ALU Testbench

← Part 5: Building the RV32I ALU Part 6 of 13 Part 7: Capstone 1: Complete ALU Testbench →

6. SystemC Tutorial - Your First SystemC Testbench