7. SystemC Tutorial - Capstone 1

Introduction

In the past 6 posts, we have learned modules, clocks, delta cycles, process types, RTL design, and testbench structure. Now we prove it all works together.

Post 7 is the Section 1 capstone. No new concepts. No new SystemC syntax. What this post does is close the verification loop on the ALU — the first real hardware block in our RISC-V CPU — with a complete, edge-case-covering test suite built on the structured testbench pattern from Post 6.

This matters because "it works on basic cases" is not the same as "it is correct." The most dangerous bugs in hardware implementations are the ones that only appear at boundaries: overflow, underflow, sign extension with the most-negative value, shift by zero, shift by the maximum allowed amount. A testbench that only covers the simple middle of the valid range is a testbench that provides false confidence. The goal of a capstone is to provide actual confidence — a test suite comprehensive enough that if it passes, you can reasonably say the block is functionally correct for the operations it implements.

In real chip teams, this is called functional verification closure for a block. Before RTL freeze, every block must achieve a defined coverage metric — functional coverage points all hit, assertions passing, regression clean. For a production ALU, that might mean thousands of directed tests and millions of random tests. For our purposes — a 10-operation combinational block with no state — 30 carefully chosen directed tests covering all boundary cases is legitimate closure.

When this test suite passes cleanly, you have verified the RV32I ALU. Not "it seems to work." Verified.


Section 1 Recap

Before the code, a brief map of what Section 1 built and how the pieces connect:

graph LR
    P1["Post 1\nModules\nPorts\nSignals\npass_through"] --> P2["Post 2\nSimulation\nTime & Clocks\ndff module"]
    P2 --> P3["Post 3\nDelta Cycles\nsc_event\ntwo_stage_chain"]
    P3 --> P4["Post 4\nSC_METHOD\nvs SC_THREAD\nand_gate"]
    P4 --> P5["Post 5\nRV32I ALU\n10 operations\nsc_uint/sc_int"]
    P5 --> P6["Post 6\nTestbench\nArchitecture\nMonitor/Checker"]
    P6 --> P7["Post 7\nCapstone 1\nFull ALU Test\nVCD+Summary"]
    style P5 fill:#06b6d4,color:#fff
    style P6 fill:#10b981,color:#fff
    style P7 fill:#f59e0b,color:#fff

Each post added one foundational layer that every subsequent post builds on:
- Post 1 established the module/port/signal vocabulary — every module in this series uses it
- Post 2 established simulation time control — every timed testbench uses sc_start() and sc_time_stamp()
- Post 3 established the evaluate-update model — why delta cycles matter for the forwarding unit in Post 20
- Post 4 established SC_METHOD vs SC_THREAD — why the ALU is SC_METHOD and every testbench driver is SC_THREAD
- Post 5 built the first real RTL block — the ALU that will sit at the center of our CPU's execute stage
- Post 6 built the testbench pattern — monitor, checker, driver — that every block in this series will use


What This Capstone Tests

The test suite targets three categories of cases that simple tests miss:

Boundary arithmetic: overflow wraps to zero, underflow wraps to the maximum unsigned value, the most-negative signed 32-bit integer (0x80000000 = -2147483648) behaves correctly in signed operations.

Signed vs. unsigned correctness: the same bit pattern 0xFFFFFFFF means -1 in signed context and 4294967295 in unsigned context. SLT and SLTU must produce different results for the same input pair. SRA must sign-extend (fill with 1s), SRL must zero-extend (fill with 0s).

Shift boundary conditions: shift by 0 is identity (output equals input), shift by 31 moves the LSB to the MSB position (or vice versa), shift amount above 31 uses only the lower 5 bits per the RISC-V spec.

Zero flag coverage: the zero flag must be true exactly when result is zero, and false otherwise. Both conditions must be explicitly tested — a broken implementation that always outputs zero = true or always zero = false should fail.


Complete Capstone Code

The capstone reuses the alu module and testbench components from Posts 5 and 6 without modification. The only addition is an extended test sequence in the driver and a final === Section 1 Complete === banner.

// File: capstone1_alu.cpp
// Section 1 Capstone — Complete ALU verification closure
// Reuses: alu (Post 5), alu_monitor, alu_checker, alu_driver pattern (Post 6)
#include <systemc.h>
#include <iostream>
#include <iomanip>
#include <deque>

// ─── ALU operation encoding ───────────────────────────────────────────────────
enum alu_op_t {
    ALU_ADD=0, ALU_SUB=1, ALU_AND=2, ALU_OR=3, ALU_XOR=4,
    ALU_SLT=5, ALU_SLTU=6, ALU_SLL=7, ALU_SRL=8, ALU_SRA=9
};

// ─── ALU (from Post 5 — no changes) ──────────────────────────────────────────
SC_MODULE(alu) {
    sc_in<sc_uint<32>>  a, b;
    sc_in<sc_uint<4>>   op;
    sc_out<sc_uint<32>> result;
    sc_out<bool>        zero;

    void compute() {
        sc_uint<32> res = 0;
        switch ((int)op.read()) {
            case ALU_ADD:  res = a.read() + b.read(); break;
            case ALU_SUB:  res = a.read() - b.read(); break;
            case ALU_AND:  res = a.read() & b.read(); break;
            case ALU_OR:   res = a.read() | b.read(); break;
            case ALU_XOR:  res = a.read() ^ b.read(); break;
            case ALU_SLT:  res = ((sc_int<32>)a.read() < (sc_int<32>)b.read()) ? 1 : 0; break;
            case ALU_SLTU: res = (a.read() < b.read()) ? 1 : 0; break;
            case ALU_SLL:  res = a.read() << b.read().range(4,0); break;
            case ALU_SRL:  res = a.read() >> b.read().range(4,0); break;
            case ALU_SRA:  res = (sc_uint<32>)((sc_int<32>)a.read() >> b.read().range(4,0)); break;
        }
        result.write(res);
        zero.write(res == 0);
    }

    SC_CTOR(alu) { SC_METHOD(compute); sensitive << a << b << op; }
};

// ─── Transaction record ───────────────────────────────────────────────────────
struct alu_txn {
    sc_uint<32> a, b, result;
    sc_uint<4>  op;
    bool        zero;
    sc_time     timestamp;
};

// ─── Monitor — passive observer of DUT outputs ────────────────────────────────
SC_MODULE(alu_monitor) {
    sc_in<sc_uint<32>>  a, b, result;
    sc_in<sc_uint<4>>   op;
    sc_in<bool>         zero;
    sc_fifo_out<alu_txn> txn_out;

    void observe() {
        alu_txn t;
        t.a = a.read(); t.b = b.read(); t.op = op.read();
        t.result = result.read(); t.zero = zero.read();
        t.timestamp = sc_time_stamp();
        txn_out.write(t);
    }

    SC_CTOR(alu_monitor) { SC_METHOD(observe); sensitive << result; }
};

// ─── Checker — compares observed vs expected ──────────────────────────────────
SC_MODULE(alu_checker) {
    sc_fifo_in<alu_txn> txn_in;

    struct expected_t {
        sc_uint<32> result;
        bool        zero;
        const char* description;
    };

    std::deque<expected_t> expected_queue;
    int pass_count = 0;
    int fail_count = 0;

    void load_expected(sc_uint<32> result, bool zero, const char* desc) {
        expected_queue.push_back({result, zero, desc});
    }

    void check_loop() {
        while (true) {
            alu_txn t = txn_in.read();
            if (expected_queue.empty()) { fail_count++; continue; }

            expected_t exp = expected_queue.front();
            expected_queue.pop_front();

            bool ok = (t.result == exp.result) && (t.zero == exp.zero);
            std::cout << "  " << (ok ? "[PASS]" : "[FAIL]") << "  "
                      << std::left << std::setw(46) << exp.description;
            if (!ok) {
                std::cout << "  got=0x" << std::hex << std::setw(8) << std::setfill('0')
                          << (uint32_t)t.result << " z=" << t.zero
                          << "  exp=0x" << std::setw(8) << (uint32_t)exp.result
                          << " z=" << std::dec << exp.zero
                          << std::setfill(' ');
            }
            std::cout << std::endl;
            if (ok) pass_count++; else fail_count++;
        }
    }

    void report(int total) {
        std::cout << std::endl;
        std::cout << "╔══════════════════════════════════════════════════╗" << std::endl;
        std::cout << "║  === Section 1 Complete: "
                  << pass_count << "/" << total << " tests passed ===    ║" << std::endl;
        std::cout << "╚══════════════════════════════════════════════════╝" << std::endl;
        if (fail_count > 0)
            std::cout << "  FAIL: " << fail_count << " test(s) failed" << std::endl;
        if (!expected_queue.empty())
            std::cout << "  WARN: " << expected_queue.size()
                      << " expected transaction(s) never observed" << std::endl;
    }

    SC_CTOR(alu_checker) { SC_THREAD(check_loop); }
};

// ─── Driver — drives all 30 capstone test vectors ────────────────────────────
SC_MODULE(alu_driver) {
    sc_out<sc_uint<32>> a, b;
    sc_out<sc_uint<4>>  op;
    alu_checker* chk;

    void drive(sc_uint<32> av, sc_uint<32> bv, alu_op_t opv,
               sc_uint<32> exp_res, bool exp_zero, const char* desc) {
        chk->load_expected(exp_res, exp_zero, desc);
        a.write(av); b.write(bv); op.write(opv);
        wait(10, SC_NS);
    }

    void run() {
        std::cout << std::endl << "=== Section 1 Capstone: RV32I ALU ===" << std::endl;

        // ── ADD: normal, overflow, zero ───────────────────────────────────────
        std::cout << std::endl << "  ── ADD ──" << std::endl;
        drive(5,          3,          ALU_ADD, 8,          false, "ADD:  5 + 3 = 8");
        drive(0,          0,          ALU_ADD, 0,          true,  "ADD:  0 + 0 = 0  (zero flag)");
        drive(0xFFFFFFFF, 1,          ALU_ADD, 0,          true,  "ADD:  0xFFFFFFFF + 1 = 0  (overflow wraps)");
        drive(0x7FFFFFFF, 1,          ALU_ADD, 0x80000000, false, "ADD:  max_positive + 1 → MSB set");

        // ── SUB: normal, zero, underflow ──────────────────────────────────────
        std::cout << std::endl << "  ── SUB ──" << std::endl;
        drive(10,         3,          ALU_SUB, 7,          false, "SUB:  10 - 3 = 7");
        drive(5,          5,          ALU_SUB, 0,          true,  "SUB:  5 - 5 = 0  (BEQ zero check)");
        drive(5,          4,          ALU_SUB, 1,          false, "SUB:  5 - 4 = 1  (zero must be false)");
        drive(0,          1,          ALU_SUB, 0xFFFFFFFF, false, "SUB:  0 - 1 = 0xFFFFFFFF  (underflow wraps)");

        // ── AND, OR, XOR ──────────────────────────────────────────────────────
        std::cout << std::endl << "  ── AND / OR / XOR ──" << std::endl;
        drive(0xFF00FF00, 0x0F0F0F0F, ALU_AND, 0x0F000F00, false, "AND:  bit masking");
        drive(0xFFFFFFFF, 0,          ALU_AND, 0,          true,  "AND:  all_ones & 0 = 0");
        drive(0xFF000000, 0x00FF0000, ALU_OR,  0xFFFF0000, false, "OR:   combine two fields");
        drive(0,          0,          ALU_OR,  0,          true,  "OR:   0 | 0 = 0");
        drive(0xAAAAAAAA, 0x55555555, ALU_XOR, 0xFFFFFFFF, false, "XOR:  alternating bits → all ones");
        drive(0xDEADBEEF, 0xDEADBEEF, ALU_XOR, 0,         true,  "XOR:  x ^ x = 0  (zero idiom)");

        // ── SLT: signed comparisons ───────────────────────────────────────────
        std::cout << std::endl << "  ── SLT (signed) ──" << std::endl;
        drive(0xFFFFFFFF, 1,          ALU_SLT, 1,          false, "SLT:  -1 < 1 → 1  (signed)");
        drive(1,          0xFFFFFFFF, ALU_SLT, 0,          true,  "SLT:  1 < -1 → 0  (signed, zero)");
        drive(5,          3,          ALU_SLT, 0,          true,  "SLT:  5 < 3 → 0");
        drive(0x80000000, 0,          ALU_SLT, 1,          false, "SLT:  INT_MIN < 0 → 1  (signed)");

        // ── SLTU: unsigned comparisons ────────────────────────────────────────
        std::cout << std::endl << "  ── SLTU (unsigned) ──" << std::endl;
        drive(0xFFFFFFFF, 1,          ALU_SLTU, 0,         true,  "SLTU: 0xFFFFFFFF > 1 → 0  (unsigned)");
        drive(1,          0xFFFFFFFF, ALU_SLTU, 1,         false, "SLTU: 1 < 0xFFFFFFFF → 1  (unsigned)");

        // ── SLL: logical left shift ───────────────────────────────────────────
        std::cout << std::endl << "  ── SLL ──" << std::endl;
        drive(1,          0,          ALU_SLL, 1,          false, "SLL:  1 << 0 = 1  (shift by 0 is identity)");
        drive(1,          4,          ALU_SLL, 16,         false, "SLL:  1 << 4 = 16");
        drive(1,          31,         ALU_SLL, 0x80000000, false, "SLL:  1 << 31 = 0x80000000  (MSB)");
        drive(0xFFFFFFFF, 1,          ALU_SLL, 0xFFFFFFFE, false, "SLL:  0xFFFFFFFF << 1 (MSB shifted out)");

        // ── SRL: logical right shift ──────────────────────────────────────────
        std::cout << std::endl << "  ── SRL ──" << std::endl;
        drive(0x80000000, 1,          ALU_SRL, 0x40000000, false, "SRL:  0x80000000 >> 1 = 0x40000000  (zero fill)");
        drive(0xFFFFFFFF, 4,          ALU_SRL, 0x0FFFFFFF, false, "SRL:  0xFFFFFFFF >> 4 = 0x0FFFFFFF");

        // ── SRA: arithmetic right shift ───────────────────────────────────────
        std::cout << std::endl << "  ── SRA ──" << std::endl;
        drive(0x80000000, 1,          ALU_SRA, 0xC0000000, false, "SRA:  0x80000000 >> 1 = 0xC0000000  (sign ext)");
        drive(0xFFFFFFFF, 4,          ALU_SRA, 0xFFFFFFFF, false, "SRA:  -1 >> 4 = -1  (sign fills with 1s)");
        drive(0x40000000, 1,          ALU_SRA, 0x20000000, false, "SRA:  positive: SRA == SRL");

        std::cout << std::endl;
        wait(SC_ZERO_TIME);
        sc_stop();
    }

    SC_CTOR(alu_driver) { SC_THREAD(run); chk = nullptr; }
};

// ─── sc_main ──────────────────────────────────────────────────────────────────
int sc_main(int argc, char* argv[]) {
    sc_signal<sc_uint<32>> sig_a, sig_b, sig_result;
    sc_signal<sc_uint<4>>  sig_op;
    sc_signal<bool>        sig_zero;
    sc_fifo<alu_txn>       txn_fifo(64);

    alu         dut("dut");
    alu_monitor mon("monitor");
    alu_checker chk("checker");
    alu_driver  drv("driver");

    dut.a(sig_a); dut.b(sig_b); dut.op(sig_op);
    dut.result(sig_result); dut.zero(sig_zero);

    mon.a(sig_a); mon.b(sig_b); mon.op(sig_op);
    mon.result(sig_result); mon.zero(sig_zero);
    mon.txn_out(txn_fifo);

    chk.txn_in(txn_fifo);

    drv.a(sig_a); drv.b(sig_b); drv.op(sig_op);
    drv.chk = &chk;

    // VCD for the capstone run
    sc_trace_file* tf = sc_create_vcd_trace_file("capstone1_waveform");
    tf->set_time_unit(1, SC_NS);
    sc_trace(tf, sig_a,      "ALU.a");
    sc_trace(tf, sig_b,      "ALU.b");
    sc_trace(tf, sig_op,     "ALU.op");
    sc_trace(tf, sig_result, "ALU.result");
    sc_trace(tf, sig_zero,   "ALU.zero");

    sc_start();
    sc_close_vcd_trace_file(tf);

    const int total = 30;
    chk.report(total);

    return (chk.fail_count > 0) ? 1 : 0;
}

Expected Output

=== Section 1 Capstone: RV32I ALU ===

  ── ADD ──
  [PASS]  ADD:  5 + 3 = 8
  [PASS]  ADD:  0 + 0 = 0  (zero flag)
  [PASS]  ADD:  0xFFFFFFFF + 1 = 0  (overflow wraps)
  [PASS]  ADD:  max_positive + 1 → MSB set

  ── SUB ──
  [PASS]  SUB:  10 - 3 = 7
  [PASS]  SUB:  5 - 5 = 0  (BEQ zero check)
  [PASS]  SUB:  5 - 4 = 1  (zero must be false)
  [PASS]  SUB:  0 - 1 = 0xFFFFFFFF  (underflow wraps)

  ... (all 30 PASS) ...

╔══════════════════════════════════════════════════╗
║  === Section 1 Complete: 30/30 tests passed ===  ║
╚══════════════════════════════════════════════════╝

DV Insight

DV Insight The ALU testbench you have written is a complete functional verification closure for this block. In a real chip team, this would be part of the block-level regression suite — run nightly in CI, must pass 100% before RTL freeze, gated by a senior verification engineer's sign-off.

The principles are identical whether the implementation technology is SystemC, UVM/SystemVerilog, cocotb/Python, or a custom C++ framework. The structure — stimulus driver, passive monitor, independent checker, self-reporting pass/fail — is the universal verification pattern. What changes between frameworks is the boilerplate: in UVM, the driver is a uvm_driver with a TLM seq_item_port, the monitor is a uvm_monitor with a uvm_analysis_port, and the checker is a uvm_scoreboard. The data flow, the passive-observer constraint, and the predict-then-observe pattern are exactly what we built here.

There is one gap this testbench does not close: constrained-random coverage. Our directed tests cover the cases we thought of. A constrained-random test covering all 10 operations with random 32-bit inputs would catch cases we did not anticipate — corner cases in unsigned/signed interaction, shift amounts between 16 and 30 that no directed test exercises, etc. That gap will be closed in Posts 23–27 when we add a UVM-SystemC environment with a constrained-random sequence and functional coverage. For now, 30 directed tests covering all boundary cases is a solid and defensible closure for a 10-operation combinational block.


What You Can Do Now

Section 1 is complete. Here is what you can do independently with what you have learned:

  • Model any combinational RTL block in SystemC — define ports with sc_in/sc_out, implement logic in SC_METHOD, add sensitivity list, connect to signals
  • Model any clocked sequential block — use SC_THREAD, wait on clock edges, latch values between rising edges
  • Write a structured testbench — separate stimulus driver, passive monitor, and checker; connect them with sc_fifo; load expected values before applying stimulus
  • Generate VCD waveformssc_create_vcd_trace_file, sc_trace per signal, sc_close_vcd_trace_file, open in GTKWave
  • Understand the evaluate-update simulation model — why delta cycles exist, why sc_signal.write() is not immediately visible, how sc_start(SC_ZERO_TIME) settles combinational logic
  • Choose the correct process type — SC_METHOD for combinational logic and decoders, SC_THREAD for sequential behavior, stimulus generators, and monitors

What We Built — Section 1 Component Chain

graph LR
    PT["pass_through\n(Post 1)\nSC_METHOD\nbool passthrough"] --> DFF
    DFF["dff\n(Post 2)\nSC_THREAD\nD flip-flop"] --> TSC
    TSC["two_stage_chain\n(Post 3)\ndelta cycle demo"] --> AG
    AG["and_gate\n(Post 4)\nSC_METHOD\n2-input AND"] --> ALU
    ALU["RV32I ALU\n(Post 5)\nSC_METHOD\n10 operations"] --> TB
    TB["Monitor+Checker TB\n(Posts 6-7)\nSC_THREAD\nstructured testbench"]
    style ALU fill:#06b6d4,color:#fff
    style TB fill:#10b981,color:#fff

Each block in this chain is a building block for the full CPU. The dff becomes the pipeline stage registers (Post 18). The and_gate pattern is the template for every combinational module. The ALU is wired into the execute stage at Post 18 without modification. The Monitor+Checker TB pattern is applied to every subsequent block.


What's Next: Section 2 — The Register File

Post 8: Building the RV32I Register File

Section 2 begins. With the ALU complete, the next core compute element is the Register File — 32 general-purpose registers (x0–x31) that hold the RISC-V program state. The register file is read twice per instruction (to get the two operand values for the ALU) and written once per instruction (to store the result back).

The register file introduces a new challenge: write-then-read ordering. If an instruction writes to register x1 and the next instruction reads from x1, does the read see the new value or the old value? The answer depends on whether forwarding is implemented. In the single-cycle CPU (Posts 15–17), there is no pipeline — write and read happen in the same clock cycle in a carefully ordered sequence. In the pipelined CPU (Posts 18–21), forwarding paths from the EX and MEM stages provide the new value directly, bypassing the register file read.

With the ALU (Post 5) and Register File (Post 8) complete, we have the two core compute elements. Every subsequent post adds control logic that connects them.

Post 8 → Building the RV32I Register File

← Part 6: Your First SystemC Testbench Part 7 of 7 Section 1 complete ✓
Author
Mayur Kubavat
VLSI Design and Verification Engineer sharing knowledge about SystemVerilog, UVM, and hardware verification methodologies.

Comments (0)

Leave a Comment