Architecture & Design Verification: 7. SystemC Tutorial

Introduction

In the past 6 posts, we have learned modules, clocks, delta cycles, process types, RTL design, and testbench structure. Now we prove it all works together.

Post 7 is the Section 1 capstone. No new concepts. No new SystemC syntax. What this post does is close the verification loop on the ALU — the first real hardware block in our RISC-V CPU — with a complete, edge-case-covering test suite built on the structured testbench pattern from Post 6.

This matters because "it works on basic cases" is not the same as "it is correct." The most dangerous bugs in hardware implementations are the ones that only appear at boundaries: overflow, underflow, sign extension with the most-negative value, shift by zero, shift by the maximum allowed amount. A testbench that only covers the simple middle of the valid range is a testbench that provides false confidence. The goal of a capstone is to provide actual confidence — a test suite comprehensive enough that if it passes, you can reasonably say the block is functionally correct for the operations it implements.

In real chip teams, this is called functional verification closure for a block. Before RTL freeze, every block must achieve a defined coverage metric — functional coverage points all hit, assertions passing, regression clean. For a production ALU, that might mean thousands of directed tests and millions of random tests. For our purposes — a 10-operation combinational block with no state — 30 carefully chosen directed tests covering all boundary cases is legitimate closure.

When this test suite passes cleanly, you have verified the RV32I ALU. Not "it seems to work." Verified.

Section 1 Recap

Before the code, a brief map of what Section 1 built and how the pieces connect:

graph LR
    P1["Post 1\nModules\nPorts\nSignals\npass_through"] --> P2["Post 2\nSimulation\nTime & Clocks\ndff module"]
    P2 --> P3["Post 3\nDelta Cycles\nsc_event\ntwo_stage_chain"]
    P3 --> P4["Post 4\nSC_METHOD\nvs SC_THREAD\nand_gate"]
    P4 --> P5["Post 5\nRV32I ALU\n10 operations\nsc_uint/sc_int"]
    P5 --> P6["Post 6\nTestbench\nArchitecture\nMonitor/Checker"]
    P6 --> P7["Post 7\nCapstone 1\nFull ALU Test\nVCD+Summary"]
    style P5 fill:#06b6d4,color:#fff
    style P6 fill:#10b981,color:#fff
    style P7 fill:#f59e0b,color:#fff

Each post added one foundational layer that every subsequent post builds on:
- Post 1 established the module/port/signal vocabulary — every module in this series uses it
- Post 2 established simulation time control — every timed testbench uses sc_start() and sc_time_stamp()
- Post 3 established the evaluate-update model — why delta cycles matter for the forwarding unit in Post 20
- Post 4 established SC_METHOD vs SC_THREAD — why the ALU is SC_METHOD and every testbench driver is SC_THREAD
- Post 5 built the first real RTL block — the ALU that will sit at the center of our CPU's execute stage
- Post 6 built the testbench pattern — monitor, checker, driver — that every block in this series will use

Synthesizable vs. Simulation-Only Code

The ALU we have built is synthesizable RTL. The testbench surrounding it is simulation-only. This distinction is fundamental in digital design and is one of the first things a new RTL engineer must internalize. SystemC makes this boundary explicit through process types and API choices.

What Makes the ALU Synthesizable

Five properties make our alu module synthesizable:

1. All processes are SC_METHOD with no blocking waits.
An SC_METHOD models a combinational cloud: given stable inputs, it computes stable outputs. There is no notion of "wait until something happens" — the process simply evaluates. Synthesis tools can map this directly to gates. An SC_THREAD with wait() calls implies time passing inside the process body, which requires storing state between waits — and a synthesis tool cannot infer that state from a C++ function.

2. All outputs depend only on current inputs — no hidden state.
The ALU has no member variables that persist between process activations. res is a local variable recomputed every time compute() runs. Compare this to the dff from Post 2, where the q output depends on both current input and the stored state from the previous clock edge. The ALU is purely combinational; the dff is sequential. Both are synthesizable — but the synthesizability conditions differ.

3. Only sc_uint, sc_int, and bool types — not sc_lv with X/Z logic values.
sc_lv (logic vector) models four-value logic: 0, 1, X (unknown), and Z (high-impedance). These extra values are simulation artifacts used for testbench modeling and reset analysis. They do not map to synthesizable hardware constructs (a wire cannot be X in silicon — it has a definite voltage). Synthesis tools that accept SystemC as input typically only accept sc_uint/sc_int/bool for the synthesizable subset.

4. No dynamic memory allocation.
new, delete, std::vector, std::list — none of these are synthesizable. Hardware structures must have fixed, statically-known dimensions at elaboration time. The synthesizer must know at compile time how many bits of state to allocate, how many multiplexers to instantiate. Dynamic allocation implies runtime-variable structure, which has no hardware analog.

5. No file I/O, no OS calls, no cout.
These operations reach into the operating system layer. Silicon does not have an OS. A synthesis tool encountering std::cout << "value" has no way to lower that to gates. File I/O, system calls, and console output exist only in the simulation environment.

What Makes the Testbench Non-Synthesizable

The testbench in this capstone uses every non-synthesizable construct deliberately:

// SC_THREAD with time waits — non-synthesizable
void run() {
    wait(10, SC_NS);      // "wait 10 nanoseconds" — no gate equivalent
    sc_stop();            // tells the simulator to stop — no hardware meaning
}

// cout output — non-synthesizable
std::cout << "[PASS] " << description << std::endl;

// VCD trace file — non-synthesizable
sc_trace_file* tf = sc_create_vcd_trace_file("capstone1_waveform");
sc_trace(tf, sig_a, "ALU.a");
sc_close_vcd_trace_file(tf);

The SC_THREAD with wait(10, SC_NS) models the testbench driving stimulus at 10 ns intervals. There is no hardware that "waits 10 ns and then decides what to drive" in the sense a C++ programmer imagines it — hardware is continuously reactive. The time construct exists in the simulation only.

The Exact SV Parallel

This synthesizability boundary maps directly to SystemVerilog practice:

Construct	Synthesizable?	SystemC	SystemVerilog
Combinational logic	Yes	`SC_METHOD`	`always_comb` / `assign`
Sequential flip-flop	Yes	`SC_CTHREAD` + `wait()`	`always_ff @(posedge clk)`
Timed stimulus	No	`SC_THREAD` + `wait(N, SC_NS)`	`initial #10 a = 1;`
Simulation control	No	`sc_stop()`	`$finish`
Waveform dump	No	`sc_create_vcd_trace_file`	`$dumpfile` / `$dumpvars`
Console output	No	`std::cout`	`$display` / `$monitor`

In SystemVerilog: the module alu with always_comb is synthesizable; the module tb_alu with initial blocks and $display calls is not. The language lets both coexist in the same simulation because the simulator runs all of it, but only the synthesizable subset goes to the foundry.

SystemC Language Reference: Synthesizability

Construct	Syntax	SV/Verilog Equivalent	Key Difference
Combinational process	`SC_METHOD(f); sensitive << a << b;`	`always_comb`	SV auto-detects sensitivity; SystemC requires manual listing
Synchronous sequential	`SC_CTHREAD(f, clk.pos()); async_reset_signal_is(rst, true);`	`always_ff @(posedge clk or posedge rst)`	SystemC uses sequential C++ with `wait()`; SV re-evaluates the block
Unsigned integer	`sc_uint<32>`	`logic [31:0]`	SystemC templates carry width; SV uses bit ranges
Signed integer	`sc_int<32>`	`logic signed [31:0]`	Same semantic; SystemC explicit, SV keyword
Boolean	`bool`	`logic` (1-bit)	Identical semantics
Four-valued logic	`sc_lv<N>`	`logic [N-1:0]`	`sc_lv` supports X/Z; only needed for testbenches
Fixed array	`uint32_t regs[32]`	`logic [31:0] regs [0:31]`	Both fixed-size; synthesizable
Simulation wait	`wait(10, SC_NS)`	`#10`	Not synthesizable in either language
Simulation stop	`sc_stop()`	`$finish`	Simulation-only
VCD trace	`sc_create_vcd_trace_file`	`$dumpfile`	Simulation-only

Synthesizability Rules Reference

The following table summarizes which patterns a synthesis tool will accept and which it will reject for SystemC RTL:

Synthesizable Pattern	Not Synthesizable
`SC_METHOD` with sensitivity list	`SC_THREAD` with `wait()` for time
`sc_uint<N>`, `sc_int<N>`, `bool`	`std::string`, `double`, `float`
Fixed-size arrays `T arr[N]`	`std::vector`, `new`/`delete`
`if/else`, `switch`, `for` with fixed bounds	Loop bounds depending on runtime signal values
Reading `sc_in` ports, writing `sc_out` ports	`std::cout`, `std::cin`, file I/O
Local variables, combinational computation	Member variables modified across invocations (without clock)
Bit-accurate integer arithmetic	Floating-point arithmetic

Compare to SystemVerilog synthesizability rules:
- always_comb is synthesizable; initial is not (for synthesis)
- logic, reg, wire are synthesizable; real is not
- for (i = 0; i < 8; i++) with constant 8 is synthesizable; for (i = 0; i < dynamic_count; i++) is not
- $display, $monitor, $finish are simulation-only

The pattern repeats: synthesis tools accept a carefully defined subset of the language where the hardware structure can be statically determined before any simulation runs.

Regression Testing Philosophy

The 30-test suite in this capstone is the seed of a regression database. Understanding what regression testing means for hardware — and why it matters — is as important as writing the tests themselves.

What a Regression Is

A regression is when a change to the design (or to its environment) breaks something that previously worked correctly. The test suite you have now defines "previously worked correctly" for the ALU. If you modify the ALU implementation tomorrow — perhaps to optimize the shift logic, or to add a new operation — and any of these 30 tests fail, you have introduced a regression. The failing test identifies exactly which operation broke and at which input values.

Without the test suite, you cannot say whether your change was safe. With it, you have a repeatable, automated oracle.

The Industry Scale

The scale at which regression testing operates in real chip projects is humbling:
- ARM maintains millions of regression tests across its processor IP portfolio. Before a new release of Cortex-A or Cortex-M IP, the entire regression suite must pass. A single escaped bug in shipped RTL can require a metal-layer respun silicon — which costs millions of dollars and months of delay.
- RISC-V architectural conformance: The riscv-arch-test suite contains approximately 400 tests that any RISC-V implementation must pass to claim architectural compliance. These tests cover every instruction in the base ISA, every immediate encoding, every edge case in the specification. They are, precisely, a regression suite for the RISC-V ISA specification itself.
- OpenTitan (Google's open-source secure microcontroller): The regression suite runs in CI on every commit. The suite includes RTL simulation, formal property checks, and FPGA emulation regression — all automated.

What Our Suite Covers

Our 30 tests cover a carefully chosen set of cases that provide genuine confidence:

Operation completeness: All 10 ALU operations are tested (not just the easy ones)
Boundary arithmetic: Overflow, underflow, zero, the maximum positive and minimum negative values
Sign/unsigned distinction: Same bit patterns, different interpretations — verified both ways
Shift boundaries: Shift-by-0 (identity), shift-by-31 (maximum useful shift), zero-fill vs. sign-fill
Zero flag coverage: Both zero=true and zero=false are tested for each operation that can produce either

The one gap: constrained-random testing. Our directed tests cover the cases we thought of. A constrained-random test would find cases we did not. That gap is closed in Posts 23–27.

What This Capstone Tests

The test suite targets three categories of cases that simple tests miss:

Boundary arithmetic: overflow wraps to zero, underflow wraps to the maximum unsigned value, the most-negative signed 32-bit integer (0x80000000 = -2147483648) behaves correctly in signed operations.

Signed vs. unsigned correctness: the same bit pattern 0xFFFFFFFF means -1 in signed context and 4294967295 in unsigned context. SLT and SLTU must produce different results for the same input pair. SRA must sign-extend (fill with 1s), SRL must zero-extend (fill with 0s).

Shift boundary conditions: shift by 0 is identity (output equals input), shift by 31 moves the LSB to the MSB position (or vice versa), shift amount above 31 uses only the lower 5 bits per the RISC-V spec.

Zero flag coverage: the zero flag must be true exactly when result is zero, and false otherwise. Both conditions must be explicitly tested — a broken implementation that always outputs zero = true or always zero = false should fail.

Simulation Semantics: How the ALU Executes

Understanding how the simulator actually executes the capstone is important for reasoning about any failure you might see.

When sc_start() is called, the SystemC scheduler runs the evaluate-update loop:

Delta cycle 0 (initialization): All signals are at their default values (0). The ALU's compute() SC_METHOD is triggered once with all-zero inputs. result=0, zero=true is computed and staged for update.
Driver activates: The alu_driver SC_THREAD begins executing at simulation time 0. It calls chk->load_expected(...) to register the expected value, then writes a=5, b=3, op=ADD to the output signals. These writes are staged — not immediately visible to other processes.
Next delta cycle: The scheduler sees that sig_a, sig_b, and sig_op have changed. It triggers compute(). Inside compute(), a.read() returns 5, b.read() returns 3. The switch case computes res=8, writes result=8, zero=false. These are staged.
Update phase: All staged writes become visible simultaneously. sig_result becomes 8, sig_zero becomes false.
Monitor activates: alu_monitor's observe() SC_METHOD is sensitive to result. It fires, captures the transaction, and pushes it to txn_fifo.
Checker activates: alu_checker's check_loop() SC_THREAD reads from the FIFO. If the FIFO is empty, it blocks until the monitor pushes a transaction. It pops the expected record and compares.
Time advances: After the checker runs, wait(10, SC_NS) in the driver advances simulation time by 10 ns. The cycle repeats for the next test vector.

The critical insight: the monitor fires on every result change, and the checker reads every transaction the monitor produces. If the driver applies 30 stimulus vectors, the monitor produces 30 transactions, and the checker compares 30 expected values. The FIFO decouples them in time — the checker can be momentarily behind the monitor without losing data.

Complete Capstone Code

The capstone reuses the alu module and testbench components from Posts 5 and 6 without modification. The only addition is an extended test sequence in the driver and a final === Section 1 Complete === banner.

// File: capstone1_alu.cpp
// Section 1 Capstone — Complete ALU verification closure
// Reuses: alu (Post 5), alu_monitor, alu_checker, alu_driver pattern (Post 6)
#include <systemc.h>
#include <iostream>
#include <iomanip>
#include <deque>

// ─── ALU operation encoding ───────────────────────────────────────────────────
enum alu_op_t {
    ALU_ADD=0, ALU_SUB=1, ALU_AND=2, ALU_OR=3, ALU_XOR=4,
    ALU_SLT=5, ALU_SLTU=6, ALU_SLL=7, ALU_SRL=8, ALU_SRA=9
};

// ─── ALU (from Post 5 — no changes) ──────────────────────────────────────────
SC_MODULE(alu) {
    sc_in<sc_uint<32>>  a, b;
    sc_in<sc_uint<4>>   op;
    sc_out<sc_uint<32>> result;
    sc_out<bool>        zero;

    void compute() {
        sc_uint<32> res = 0;
        switch ((int)op.read()) {
            case ALU_ADD:  res = a.read() + b.read(); break;
            case ALU_SUB:  res = a.read() - b.read(); break;
            case ALU_AND:  res = a.read() & b.read(); break;
            case ALU_OR:   res = a.read() | b.read(); break;
            case ALU_XOR:  res = a.read() ^ b.read(); break;
            case ALU_SLT:  res = ((sc_int<32>)a.read() < (sc_int<32>)b.read()) ? 1 : 0; break;
            case ALU_SLTU: res = (a.read() < b.read()) ? 1 : 0; break;
            case ALU_SLL:  res = a.read() << b.read().range(4,0); break;
            case ALU_SRL:  res = a.read() >> b.read().range(4,0); break;
            case ALU_SRA:  res = (sc_uint<32>)((sc_int<32>)a.read() >> b.read().range(4,0)); break;
        }
        result.write(res);
        zero.write(res == 0);
    }

    SC_CTOR(alu) { SC_METHOD(compute); sensitive << a << b << op; }
};

// ─── Transaction record ───────────────────────────────────────────────────────
struct alu_txn {
    sc_uint<32> a, b, result;
    sc_uint<4>  op;
    bool        zero;
    sc_time     timestamp;
};

// ─── Monitor — passive observer of DUT outputs ────────────────────────────────
SC_MODULE(alu_monitor) {
    sc_in<sc_uint<32>>  a, b, result;
    sc_in<sc_uint<4>>   op;
    sc_in<bool>         zero;
    sc_fifo_out<alu_txn> txn_out;

    void observe() {
        alu_txn t;
        t.a = a.read(); t.b = b.read(); t.op = op.read();
        t.result = result.read(); t.zero = zero.read();
        t.timestamp = sc_time_stamp();
        txn_out.write(t);
    }

    SC_CTOR(alu_monitor) { SC_METHOD(observe); sensitive << result; }
};

// ─── Checker — compares observed vs expected ──────────────────────────────────
SC_MODULE(alu_checker) {
    sc_fifo_in<alu_txn> txn_in;

    struct expected_t {
        sc_uint<32> result;
        bool        zero;
        const char* description;
    };

    std::deque<expected_t> expected_queue;
    int pass_count = 0;
    int fail_count = 0;

    void load_expected(sc_uint<32> result, bool zero, const char* desc) {
        expected_queue.push_back({result, zero, desc});
    }

    void check_loop() {
        while (true) {
            alu_txn t = txn_in.read();
            if (expected_queue.empty()) { fail_count++; continue; }

            expected_t exp = expected_queue.front();
            expected_queue.pop_front();

            bool ok = (t.result == exp.result) && (t.zero == exp.zero);
            std::cout << "  " << (ok ? "[PASS]" : "[FAIL]") << "  "
                      << std::left << std::setw(46) << exp.description;
            if (!ok) {
                std::cout << "  got=0x" << std::hex << std::setw(8) << std::setfill('0')
                          << (uint32_t)t.result << " z=" << t.zero
                          << "  exp=0x" << std::setw(8) << (uint32_t)exp.result
                          << " z=" << std::dec << exp.zero
                          << std::setfill(' ');
            }
            std::cout << std::endl;
            if (ok) pass_count++; else fail_count++;
        }
    }

    void report(int total) {
        std::cout << std::endl;
        std::cout << "╔══════════════════════════════════════════════════╗" << std::endl;
        std::cout << "║  === Section 1 Complete: "
                  << pass_count << "/" << total << " tests passed ===    ║" << std::endl;
        std::cout << "╚══════════════════════════════════════════════════╝" << std::endl;
        if (fail_count > 0)
            std::cout << "  FAIL: " << fail_count << " test(s) failed" << std::endl;
        if (!expected_queue.empty())
            std::cout << "  WARN: " << expected_queue.size()
                      << " expected transaction(s) never observed" << std::endl;
    }

    SC_CTOR(alu_checker) { SC_THREAD(check_loop); }
};

// ─── Driver — drives all 30 capstone test vectors ────────────────────────────
SC_MODULE(alu_driver) {
    sc_out<sc_uint<32>> a, b;
    sc_out<sc_uint<4>>  op;
    alu_checker* chk;

    void drive(sc_uint<32> av, sc_uint<32> bv, alu_op_t opv,
               sc_uint<32> exp_res, bool exp_zero, const char* desc) {
        chk->load_expected(exp_res, exp_zero, desc);
        a.write(av); b.write(bv); op.write(opv);
        wait(10, SC_NS);
    }

    void run() {
        std::cout << std::endl << "=== Section 1 Capstone: RV32I ALU ===" << std::endl;

        // ── ADD: normal, overflow, zero ───────────────────────────────────────
        std::cout << std::endl << "  ── ADD ──" << std::endl;
        drive(5,          3,          ALU_ADD, 8,          false, "ADD:  5 + 3 = 8");
        drive(0,          0,          ALU_ADD, 0,          true,  "ADD:  0 + 0 = 0  (zero flag)");
        drive(0xFFFFFFFF, 1,          ALU_ADD, 0,          true,  "ADD:  0xFFFFFFFF + 1 = 0  (overflow wraps)");
        drive(0x7FFFFFFF, 1,          ALU_ADD, 0x80000000, false, "ADD:  max_positive + 1 → MSB set");

        // ── SUB: normal, zero, underflow ──────────────────────────────────────
        std::cout << std::endl << "  ── SUB ──" << std::endl;
        drive(10,         3,          ALU_SUB, 7,          false, "SUB:  10 - 3 = 7");
        drive(5,          5,          ALU_SUB, 0,          true,  "SUB:  5 - 5 = 0  (BEQ zero check)");
        drive(5,          4,          ALU_SUB, 1,          false, "SUB:  5 - 4 = 1  (zero must be false)");
        drive(0,          1,          ALU_SUB, 0xFFFFFFFF, false, "SUB:  0 - 1 = 0xFFFFFFFF  (underflow wraps)");

        // ── AND, OR, XOR ──────────────────────────────────────────────────────
        std::cout << std::endl << "  ── AND / OR / XOR ──" << std::endl;
        drive(0xFF00FF00, 0x0F0F0F0F, ALU_AND, 0x0F000F00, false, "AND:  bit masking");
        drive(0xFFFFFFFF, 0,          ALU_AND, 0,          true,  "AND:  all_ones & 0 = 0");
        drive(0xFF000000, 0x00FF0000, ALU_OR,  0xFFFF0000, false, "OR:   combine two fields");
        drive(0,          0,          ALU_OR,  0,          true,  "OR:   0 | 0 = 0");
        drive(0xAAAAAAAA, 0x55555555, ALU_XOR, 0xFFFFFFFF, false, "XOR:  alternating bits → all ones");
        drive(0xDEADBEEF, 0xDEADBEEF, ALU_XOR, 0,         true,  "XOR:  x ^ x = 0  (zero idiom)");

        // ── SLT: signed comparisons ───────────────────────────────────────────
        std::cout << std::endl << "  ── SLT (signed) ──" << std::endl;
        drive(0xFFFFFFFF, 1,          ALU_SLT, 1,          false, "SLT:  -1 < 1 → 1  (signed)");
        drive(1,          0xFFFFFFFF, ALU_SLT, 0,          true,  "SLT:  1 < -1 → 0  (signed, zero)");
        drive(5,          3,          ALU_SLT, 0,          true,  "SLT:  5 < 3 → 0");
        drive(0x80000000, 0,          ALU_SLT, 1,          false, "SLT:  INT_MIN < 0 → 1  (signed)");

        // ── SLTU: unsigned comparisons ────────────────────────────────────────
        std::cout << std::endl << "  ── SLTU (unsigned) ──" << std::endl;
        drive(0xFFFFFFFF, 1,          ALU_SLTU, 0,         true,  "SLTU: 0xFFFFFFFF > 1 → 0  (unsigned)");
        drive(1,          0xFFFFFFFF, ALU_SLTU, 1,         false, "SLTU: 1 < 0xFFFFFFFF → 1  (unsigned)");

        // ── SLL: logical left shift ───────────────────────────────────────────
        std::cout << std::endl << "  ── SLL ──" << std::endl;
        drive(1,          0,          ALU_SLL, 1,          false, "SLL:  1 << 0 = 1  (shift by 0 is identity)");
        drive(1,          4,          ALU_SLL, 16,         false, "SLL:  1 << 4 = 16");
        drive(1,          31,         ALU_SLL, 0x80000000, false, "SLL:  1 << 31 = 0x80000000  (MSB)");
        drive(0xFFFFFFFF, 1,          ALU_SLL, 0xFFFFFFFE, false, "SLL:  0xFFFFFFFF << 1 (MSB shifted out)");

        // ── SRL: logical right shift ──────────────────────────────────────────
        std::cout << std::endl << "  ── SRL ──" << std::endl;
        drive(0x80000000, 1,          ALU_SRL, 0x40000000, false, "SRL:  0x80000000 >> 1 = 0x40000000  (zero fill)");
        drive(0xFFFFFFFF, 4,          ALU_SRL, 0x0FFFFFFF, false, "SRL:  0xFFFFFFFF >> 4 = 0x0FFFFFFF");

        // ── SRA: arithmetic right shift ───────────────────────────────────────
        std::cout << std::endl << "  ── SRA ──" << std::endl;
        drive(0x80000000, 1,          ALU_SRA, 0xC0000000, false, "SRA:  0x80000000 >> 1 = 0xC0000000  (sign ext)");
        drive(0xFFFFFFFF, 4,          ALU_SRA, 0xFFFFFFFF, false, "SRA:  -1 >> 4 = -1  (sign fills with 1s)");
        drive(0x40000000, 1,          ALU_SRA, 0x20000000, false, "SRA:  positive: SRA == SRL");

        std::cout << std::endl;
        wait(SC_ZERO_TIME);
        sc_stop();
    }

    SC_CTOR(alu_driver) { SC_THREAD(run); chk = nullptr; }
};

// ─── sc_main ──────────────────────────────────────────────────────────────────
int sc_main(int argc, char* argv[]) {
    sc_signal<sc_uint<32>> sig_a, sig_b, sig_result;
    sc_signal<sc_uint<4>>  sig_op;
    sc_signal<bool>        sig_zero;
    sc_fifo<alu_txn>       txn_fifo(64);

    alu         dut("dut");
    alu_monitor mon("monitor");
    alu_checker chk("checker");
    alu_driver  drv("driver");

    dut.a(sig_a); dut.b(sig_b); dut.op(sig_op);
    dut.result(sig_result); dut.zero(sig_zero);

    mon.a(sig_a); mon.b(sig_b); mon.op(sig_op);
    mon.result(sig_result); mon.zero(sig_zero);
    mon.txn_out(txn_fifo);

    chk.txn_in(txn_fifo);

    drv.a(sig_a); drv.b(sig_b); drv.op(sig_op);
    drv.chk = &chk;

    // VCD for the capstone run
    sc_trace_file* tf = sc_create_vcd_trace_file("capstone1_waveform");
    tf->set_time_unit(1, SC_NS);
    sc_trace(tf, sig_a,      "ALU.a");
    sc_trace(tf, sig_b,      "ALU.b");
    sc_trace(tf, sig_op,     "ALU.op");
    sc_trace(tf, sig_result, "ALU.result");
    sc_trace(tf, sig_zero,   "ALU.zero");

    sc_start();
    sc_close_vcd_trace_file(tf);

    const int total = 30;
    chk.report(total);

    return (chk.fail_count > 0) ? 1 : 0;
}

Expected Output

=== Section 1 Capstone: RV32I ALU ===

  ── ADD ──
  [PASS]  ADD:  5 + 3 = 8
  [PASS]  ADD:  0 + 0 = 0  (zero flag)
  [PASS]  ADD:  0xFFFFFFFF + 1 = 0  (overflow wraps)
  [PASS]  ADD:  max_positive + 1 → MSB set

  ── SUB ──
  [PASS]  SUB:  10 - 3 = 7
  [PASS]  SUB:  5 - 5 = 0  (BEQ zero check)
  [PASS]  SUB:  5 - 4 = 1  (zero must be false)
  [PASS]  SUB:  0 - 1 = 0xFFFFFFFF  (underflow wraps)

  ... (all 30 PASS) ...

╔══════════════════════════════════════════════════╗
║  === Section 1 Complete: 30/30 tests passed ===  ║
╚══════════════════════════════════════════════════╝

Common Pitfalls for SV Engineers

1. sc_signal.write() is not immediate — delta cycles apply.
In SystemVerilog, assign result = a + b updates result in the same time step, with no delta required in most tools' mental model. In SystemC, when compute() calls result.write(res), the new value is not visible to any other process until the next delta cycle. For a pure combinational block tested combinationally this rarely causes bugs — but if you read back result.read() in the same process activation that called result.write(res), you will read the old value. Always read outputs through the signal, after the evaluate-update cycle completes.

2. Missing operations in the sensitivity list silently break the ALU.
If sensitive << a << b accidentally becomes sensitive << a (missing b), the ALU will not recompute when b changes. The result will appear to be "stuck" at the previous value. SystemC gives no warning. SystemVerilog tools warn about incomplete sensitivity lists in always @(...) blocks. With always_comb, the tool auto-generates the list. In SystemC, manual lists require discipline.

3. The zero flag is not a separate output port in hardware — it is derived.
Some engineers new to RTL expect zero to be somehow "automatic." In our implementation, zero.write(res == 0) is an explicit assignment computed every time compute() runs. If you forget this line, zero stays at its initial value (false) regardless of the result. Test the zero flag explicitly — the test suite above covers both zero=true and zero=false for every operation group.

4. sc_uint<32> arithmetic wraps silently — there is no overflow exception.
When a = 0xFFFFFFFF and b = 1, a + b in sc_uint<32> arithmetic produces 0. No exception, no warning, no X value. This is correct hardware behavior — 32-bit adders wrap. But if you use uint32_t internally and then cast to sc_uint<32>, the wrap behavior is identical because both are 32-bit unsigned. The test ADD: 0xFFFFFFFF + 1 = 0 verifies this explicitly. Do not write tests that expect overflow to throw.

5. SLT uses signed comparison but the ports are sc_uint<32> — you must cast.
sc_uint is unsigned. (a.read() < b.read()) is an unsigned comparison. For SLT, you need (sc_int<32>)a.read() < (sc_int<32>)b.read(). Forgetting the cast makes SLT behave identically to SLTU — the most dangerous kind of bug because both appear to work on positive inputs. The test vector SLT: -1 < 1 → 1 (signed) uses a = 0xFFFFFFFF which is -1 in signed context and 4294967295 in unsigned context — the two interpretations give opposite comparison results. That test will fail immediately if you forget the cast.

DV Insight

DV Insight The ALU testbench you have written is a complete functional verification closure for this block. In a real chip team, this would be part of the block-level regression suite — run nightly in CI, must pass 100% before RTL freeze, gated by a senior verification engineer's sign-off.

The principles are identical whether the implementation technology is SystemC, UVM/SystemVerilog, cocotb/Python, or a custom C++ framework. The structure — stimulus driver, passive monitor, independent checker, self-reporting pass/fail — is the universal verification pattern. What changes between frameworks is the boilerplate: in UVM, the driver is a uvm_driver with a TLM seq_item_port, the monitor is a uvm_monitor with a uvm_analysis_port, and the checker is a uvm_scoreboard. The data flow, the passive-observer constraint, and the predict-then-observe pattern are exactly what we built here.

There is one gap this testbench does not close: constrained-random coverage. Our directed tests cover the cases we thought of. A constrained-random test covering all 10 operations with random 32-bit inputs would catch cases we did not anticipate — corner cases in unsigned/signed interaction, shift amounts between 16 and 30 that no directed test exercises, etc. That gap will be closed in Posts 23–27 when we add a UVM-SystemC environment with a constrained-random sequence and functional coverage. For now, 30 directed tests covering all boundary cases is a solid and defensible closure for a 10-operation combinational block.

What You Can Do Now

Section 1 is complete. Here is what you can do independently with what you have learned:

Model any combinational RTL block in SystemC — define ports with sc_in/sc_out, implement logic in SC_METHOD, add sensitivity list, connect to signals
Model any clocked sequential block — use SC_THREAD, wait on clock edges, latch values between rising edges
Write a structured testbench — separate stimulus driver, passive monitor, and checker; connect them with sc_fifo; load expected values before applying stimulus
Generate VCD waveforms — sc_create_vcd_trace_file, sc_trace per signal, sc_close_vcd_trace_file, open in GTKWave
Understand the evaluate-update simulation model — why delta cycles exist, why sc_signal.write() is not immediately visible, how sc_start(SC_ZERO_TIME) settles combinational logic
Choose the correct process type — SC_METHOD for combinational logic and decoders, SC_THREAD for sequential behavior, stimulus generators, and monitors
Distinguish synthesizable from simulation-only code — SC_METHOD is synthesizable; SC_THREAD with time waits is not. sc_uint is synthesizable; std::cout is not.

What We Built — Section 1 Component Chain

graph LR
    PT["pass_through\n(Post 1)\nSC_METHOD\nbool passthrough"] --> DFF
    DFF["dff\n(Post 2)\nSC_THREAD\nD flip-flop"] --> TSC
    TSC["two_stage_chain\n(Post 3)\ndelta cycle demo"] --> AG
    AG["and_gate\n(Post 4)\nSC_METHOD\n2-input AND"] --> ALU
    ALU["RV32I ALU\n(Post 5)\nSC_METHOD\n10 operations"] --> TB
    TB["Monitor+Checker TB\n(Posts 6-7)\nSC_THREAD\nstructured testbench"]
    style ALU fill:#06b6d4,color:#fff
    style TB fill:#10b981,color:#fff

Each block in this chain is a building block for the full CPU. The dff becomes the pipeline stage registers (Post 18). The and_gate pattern is the template for every combinational module. The ALU is wired into the execute stage at Post 18 without modification. The Monitor+Checker TB pattern is applied to every subsequent block.

SystemC Language Reference

The table below covers every construct used in this capstone in a single reference:

Construct	Syntax	SV/Verilog Equivalent	Key Difference
Combinational process	`SC_METHOD(compute); sensitive << a << b << op;`	`always_comb`	SV auto-detects sensitivity list from RHS; SystemC requires manual listing
Timed stimulus thread	`SC_THREAD(run); ... wait(10, SC_NS);`	`initial #10 a = 1;`	Neither is synthesizable; both model timed stimulus
Unsigned 32-bit integer	`sc_uint<32>`	`logic [31:0]`	SystemC width is a template parameter; SV uses bit ranges
Signed 32-bit integer	`sc_int<32>`	`logic signed [31:0]`	Explicit cast required in SystemC; `signed` keyword in SV
Signal (1-bit)	`sc_signal<bool>`	`logic` (scalar)	SV `logic` is always in the event graph; `sc_signal` opts in explicitly
Signal (N-bit)	`sc_signal<sc_uint<32>>`	`logic [31:0]`	Same semantics; SystemC templated, SV range-declared
VCD trace file	`sc_create_vcd_trace_file("name")`	`$dumpfile("name.vcd")`	SystemC returns a handle used for all subsequent `sc_trace` calls
Trace a signal	`sc_trace(tf, sig, "path.name")`	`$dumpvars(0, module)`	SystemC traces individual named signals; SV `$dumpvars` dumps a scope
Start simulation	`sc_start()`	`$finish` is the end, no equivalent start	`sc_start()` without args runs until `sc_stop()` is called
Stop simulation	`sc_stop()`	`$finish`	Both terminate the simulation run
Settle combinational logic	`sc_start(SC_ZERO_TIME)`	`#0` (delta cycle)	Runs all pending delta cycles without advancing wall-clock time

Synthesizable vs. Simulation-Only Code — Extended Reference

The synthesizability boundary is fundamental. The table below is the complete reference for the patterns in this capstone:

Pattern	Synthesizable?	SV Equivalent
`SC_METHOD` with sensitivity list	YES	`always_comb` / `always @(...)`
`SC_THREAD` with `wait()`	NO	`initial` block
`sc_uint<N>`, `sc_int<N>`, `bool`	YES	`logic [N-1:0]`, `bit`
`std::string`, `double`, `float`	NO	`string`, `real`
Fixed `for(int i=0;i<32;i++)`	YES	`for` loop in `always_comb`
`while(true) { wait(); }`	NO	Modeled as continuous `always` with clock sensitivity
`sc_trace_file`, `cout`	NO	`$monitor`, `$display`
`new`, `delete`, `std::vector`	NO	No synthesizable equivalent
`sc_start()` / `sc_stop()`	NO	`$finish` — simulation control only
Local variable computed each activation	YES	Local variable in `always_comb`
Module member variable modified across activations (no clock)	NO	Creates latch — must use `always_ff`

What makes this ALU synthesizable — the five criteria:

SC_METHOD (no blocking waits) maps directly to always_comb. A synthesis tool can lower a compute() with no waits to a combinational cloud of gates.
All outputs are functions of current inputs only. res is a local variable recomputed every invocation. There is no persistent state between calls.
Only sc_uint<N>, sc_int<N>, and bool types. No sc_lv (X/Z states), no floating-point, no strings.
No dynamic memory, no file I/O, no OS calls. Hardware structures must be statically determined at compile time.
Fixed-size arrays and fixed-bound loops only. The synthesizer must know structure at elaboration time.

What makes the testbench non-synthesizable:

SC_THREAD with wait(10, SC_NS) — implies time passing inside the process body; no gate equivalent
sc_start() / sc_stop() — simulation lifecycle control; has no hardware meaning
cout output — reaches the operating system; silicon has no OS
sc_trace_file / VCD — verification infrastructure; not hardware

Regression Testing Philosophy — The Test Runner Pattern

The 30-test suite in this capstone is the seed of a regression database. Below is the concrete test runner pattern that scales from 30 tests to 30,000:

// Scalable regression runner pattern
struct alu_test_case {
    sc_uint<32> a;
    sc_uint<32> b;
    alu_op_t    op;
    sc_uint<32> expected_result;
    bool        expected_zero;
    const char* description;
};

// All 22+ test cases in a static array — single source of truth
static const alu_test_case test_suite[] = {
    { 5,          3,          ALU_ADD, 8,          false, "ADD: 5+3=8"                     },
    { 0,          0,          ALU_ADD, 0,          true,  "ADD: 0+0=0 (zero flag)"         },
    { 0xFFFFFFFF, 1,          ALU_ADD, 0,          true,  "ADD: overflow wraps"            },
    { 0xFFFFFFFF, 1,          ALU_SLT, 1,          false, "SLT: -1 < 1 (signed)"          },
    { 0xFFFFFFFF, 1,          ALU_SLTU,0,          true,  "SLTU: MAX_U > 1 (unsigned)"    },
    // ... remaining cases
};

// Single loop runs ALL tests — adding a test = adding one struct entry
int total_pass = 0, total_fail = 0;
for (const auto& tc : test_suite) {
    chk->load_expected(tc.expected_result, tc.expected_zero, tc.description);
    a.write(tc.a);
    b.write(tc.b);
    op.write(tc.op);
    wait(10, SC_NS);
}

What a regression means in hardware:

Every test that once passed must continue to pass on every code change, forever
Hardware verification is about confidence, not proof — you test until you trust the coverage
This ALU test suite is the start of a regression database that grows with every bug found
RISC-V architectural conformance (riscv-arch-test): approximately 400 assembly tests that every RV32I implementation must pass to claim architectural compliance — they are precisely a regression suite for the ISA specification
ARM's CPU validation teams run millions of tests before each mask layer — a single escaped bug in shipped RTL can require metal-layer respins costing millions of dollars and months of delay
OpenTitan (Google's open-source secure microcontroller) runs a full regression on every commit, including RTL simulation, formal property checks, and FPGA emulation

The single-loop-over-struct-array pattern is the same architecture whether you have 30 tests or 300,000. In UVM, the "struct array" becomes a sequence of seq_item objects; the "loop" becomes the sequencer driving the driver; the "expected value" becomes the scoreboard's predict() method. The pattern scales.

Common Pitfalls for SV Engineers — Extended

The original pitfalls section covered five common mistakes. Here are five additional pitfalls that arise specifically when SystemC RTL is used as a synthesizable design language:

Pitfall 6: SC_METHOD used for what should be SC_CTHREAD — feedback loop, not register.
If you put sequential logic (a register that remembers its previous value) inside an SC_METHOD without a clock, you create a feedback path in combinational logic. The output feeds back to the input, which re-triggers the method, which updates the output — a combinational loop. In a synthesis tool this produces a latch or a timing loop violation. The rule: if a process needs to remember a value from one invocation to the next, use SC_CTHREAD with a clock, not SC_METHOD.

Pitfall 7: sc_uint<32> vs. uint32_t in golden values — mixing types in assertions.
In the checker's expected struct, use sc_uint<32> consistently, OR always compare via .to_uint(). Mixing types can produce unexpected C++ implicit conversion behavior. The safest pattern: (uint32_t)t.result == (uint32_t)exp.result — cast both sides to the same plain C++ type before comparing. This eliminates any ambiguity about operator overloading between sc_uint and uint32_t.

Pitfall 8: Forgetting sc_start(SC_ZERO_TIME) between test vectors.
If you change inputs and immediately read back the output in a test script without calling sc_start, the SC_METHOD has not re-run yet — you are reading the old value. Each sc_start(SC_ZERO_TIME) call runs all pending delta cycles to completion. For a pure combinational SC_METHOD, one sc_start(SC_ZERO_TIME) after changing inputs is sufficient to get updated outputs. Our capstone avoids this because the driver uses wait(10, SC_NS) which automatically settles delta cycles, but if you write a non-SC_THREAD test harness (e.g., in sc_main directly), this pitfall is common.

Pitfall 9: Synthesis tool flow with SystemC — not all tools accept SC_METHOD RTL directly.
Cadence Stratus HLS, Mentor Catapult HLS, and Xilinx Vitis HLS support SystemC RTL as a synthesis input. Standard ASIC flows (Synopsys Design Compiler, Cadence Genus) typically require RTL export first — the HLS tool produces Verilog/VHDL, which then enters the traditional flow. Do not assume that "synthesizable SystemC" means you can point dc_shell directly at a .cpp file. Always check which entry point your tool expects.

Pitfall 10: sc_uint vs. sc_bv for opcode encoding.
The ALU opcode is typed as sc_uint<4> — correct for arithmetic encoding. sc_bv<4> (bit-vector) supports four-valued logic (0, 1, X, Z) and does not support integer comparison directly: op == 0 on an sc_bv<4> requires explicit conversion. For control signals that are compared against integer constants in a switch statement, always use sc_uint. Reserve sc_bv for ports that genuinely need X/Z modeling (e.g., bus tri-state drivers in a testbench).

What's Next: Section 2 — The Register File

Post 8: Building the RV32I Register File

Section 2 begins. With the ALU complete, the next core compute element is the Register File — 32 general-purpose registers (x0–x31) that hold the RISC-V program state. The register file is read twice per instruction (to get the two operand values for the ALU) and written once per instruction (to store the result back).

The register file introduces a new challenge: write-then-read ordering. If an instruction writes to register x1 and the next instruction reads from x1, does the read see the new value or the old value? The answer depends on whether forwarding is implemented. In the single-cycle CPU (Posts 15–17), there is no pipeline — write and read happen in the same clock cycle in a carefully ordered sequence. In the pipelined CPU (Posts 18–21), forwarding paths from the EX and MEM stages provide the new value directly, bypassing the register file read.

With the ALU (Post 5) and Register File (Post 8) complete, we have the two core compute elements. Every subsequent post adds control logic that connects them.

Post 8 → Building the RV32I Register File

← Part 6: Your First SystemC Testbench Part 7 of 13 Part 8: Register File: 32×32 Storage →

7. SystemC Tutorial - Capstone 1