7. SystemC Tutorial - Capstone 1
Introduction
In the past 6 posts, we have learned modules, clocks, delta cycles, process types, RTL design, and testbench structure. Now we prove it all works together.
Post 7 is the Section 1 capstone. No new concepts. No new SystemC syntax. What this post does is close the verification loop on the ALU — the first real hardware block in our RISC-V CPU — with a complete, edge-case-covering test suite built on the structured testbench pattern from Post 6.
This matters because "it works on basic cases" is not the same as "it is correct." The most dangerous bugs in hardware implementations are the ones that only appear at boundaries: overflow, underflow, sign extension with the most-negative value, shift by zero, shift by the maximum allowed amount. A testbench that only covers the simple middle of the valid range is a testbench that provides false confidence. The goal of a capstone is to provide actual confidence — a test suite comprehensive enough that if it passes, you can reasonably say the block is functionally correct for the operations it implements.
In real chip teams, this is called functional verification closure for a block. Before RTL freeze, every block must achieve a defined coverage metric — functional coverage points all hit, assertions passing, regression clean. For a production ALU, that might mean thousands of directed tests and millions of random tests. For our purposes — a 10-operation combinational block with no state — 30 carefully chosen directed tests covering all boundary cases is legitimate closure.
When this test suite passes cleanly, you have verified the RV32I ALU. Not "it seems to work." Verified.
Section 1 Recap
Before the code, a brief map of what Section 1 built and how the pieces connect:
graph LR
P1["Post 1\nModules\nPorts\nSignals\npass_through"] --> P2["Post 2\nSimulation\nTime & Clocks\ndff module"]
P2 --> P3["Post 3\nDelta Cycles\nsc_event\ntwo_stage_chain"]
P3 --> P4["Post 4\nSC_METHOD\nvs SC_THREAD\nand_gate"]
P4 --> P5["Post 5\nRV32I ALU\n10 operations\nsc_uint/sc_int"]
P5 --> P6["Post 6\nTestbench\nArchitecture\nMonitor/Checker"]
P6 --> P7["Post 7\nCapstone 1\nFull ALU Test\nVCD+Summary"]
style P5 fill:#06b6d4,color:#fff
style P6 fill:#10b981,color:#fff
style P7 fill:#f59e0b,color:#fff
Each post added one foundational layer that every subsequent post builds on:
- Post 1 established the module/port/signal vocabulary — every module in this series uses it
- Post 2 established simulation time control — every timed testbench uses sc_start() and sc_time_stamp()
- Post 3 established the evaluate-update model — why delta cycles matter for the forwarding unit in Post 20
- Post 4 established SC_METHOD vs SC_THREAD — why the ALU is SC_METHOD and every testbench driver is SC_THREAD
- Post 5 built the first real RTL block — the ALU that will sit at the center of our CPU's execute stage
- Post 6 built the testbench pattern — monitor, checker, driver — that every block in this series will use
What This Capstone Tests
The test suite targets three categories of cases that simple tests miss:
Boundary arithmetic: overflow wraps to zero, underflow wraps to the maximum unsigned value, the most-negative signed 32-bit integer (0x80000000 = -2147483648) behaves correctly in signed operations.
Signed vs. unsigned correctness: the same bit pattern 0xFFFFFFFF means -1 in signed context and 4294967295 in unsigned context. SLT and SLTU must produce different results for the same input pair. SRA must sign-extend (fill with 1s), SRL must zero-extend (fill with 0s).
Shift boundary conditions: shift by 0 is identity (output equals input), shift by 31 moves the LSB to the MSB position (or vice versa), shift amount above 31 uses only the lower 5 bits per the RISC-V spec.
Zero flag coverage: the zero flag must be true exactly when result is zero, and false otherwise. Both conditions must be explicitly tested — a broken implementation that always outputs zero = true or always zero = false should fail.
Complete Capstone Code
The capstone reuses the alu module and testbench components from Posts 5 and 6 without modification. The only addition is an extended test sequence in the driver and a final === Section 1 Complete === banner.
// File: capstone1_alu.cpp
// Section 1 Capstone — Complete ALU verification closure
// Reuses: alu (Post 5), alu_monitor, alu_checker, alu_driver pattern (Post 6)
#include <systemc.h>
#include <iostream>
#include <iomanip>
#include <deque>
// ─── ALU operation encoding ───────────────────────────────────────────────────
enum alu_op_t {
ALU_ADD=0, ALU_SUB=1, ALU_AND=2, ALU_OR=3, ALU_XOR=4,
ALU_SLT=5, ALU_SLTU=6, ALU_SLL=7, ALU_SRL=8, ALU_SRA=9
};
// ─── ALU (from Post 5 — no changes) ──────────────────────────────────────────
SC_MODULE(alu) {
sc_in<sc_uint<32>> a, b;
sc_in<sc_uint<4>> op;
sc_out<sc_uint<32>> result;
sc_out<bool> zero;
void compute() {
sc_uint<32> res = 0;
switch ((int)op.read()) {
case ALU_ADD: res = a.read() + b.read(); break;
case ALU_SUB: res = a.read() - b.read(); break;
case ALU_AND: res = a.read() & b.read(); break;
case ALU_OR: res = a.read() | b.read(); break;
case ALU_XOR: res = a.read() ^ b.read(); break;
case ALU_SLT: res = ((sc_int<32>)a.read() < (sc_int<32>)b.read()) ? 1 : 0; break;
case ALU_SLTU: res = (a.read() < b.read()) ? 1 : 0; break;
case ALU_SLL: res = a.read() << b.read().range(4,0); break;
case ALU_SRL: res = a.read() >> b.read().range(4,0); break;
case ALU_SRA: res = (sc_uint<32>)((sc_int<32>)a.read() >> b.read().range(4,0)); break;
}
result.write(res);
zero.write(res == 0);
}
SC_CTOR(alu) { SC_METHOD(compute); sensitive << a << b << op; }
};
// ─── Transaction record ───────────────────────────────────────────────────────
struct alu_txn {
sc_uint<32> a, b, result;
sc_uint<4> op;
bool zero;
sc_time timestamp;
};
// ─── Monitor — passive observer of DUT outputs ────────────────────────────────
SC_MODULE(alu_monitor) {
sc_in<sc_uint<32>> a, b, result;
sc_in<sc_uint<4>> op;
sc_in<bool> zero;
sc_fifo_out<alu_txn> txn_out;
void observe() {
alu_txn t;
t.a = a.read(); t.b = b.read(); t.op = op.read();
t.result = result.read(); t.zero = zero.read();
t.timestamp = sc_time_stamp();
txn_out.write(t);
}
SC_CTOR(alu_monitor) { SC_METHOD(observe); sensitive << result; }
};
// ─── Checker — compares observed vs expected ──────────────────────────────────
SC_MODULE(alu_checker) {
sc_fifo_in<alu_txn> txn_in;
struct expected_t {
sc_uint<32> result;
bool zero;
const char* description;
};
std::deque<expected_t> expected_queue;
int pass_count = 0;
int fail_count = 0;
void load_expected(sc_uint<32> result, bool zero, const char* desc) {
expected_queue.push_back({result, zero, desc});
}
void check_loop() {
while (true) {
alu_txn t = txn_in.read();
if (expected_queue.empty()) { fail_count++; continue; }
expected_t exp = expected_queue.front();
expected_queue.pop_front();
bool ok = (t.result == exp.result) && (t.zero == exp.zero);
std::cout << " " << (ok ? "[PASS]" : "[FAIL]") << " "
<< std::left << std::setw(46) << exp.description;
if (!ok) {
std::cout << " got=0x" << std::hex << std::setw(8) << std::setfill('0')
<< (uint32_t)t.result << " z=" << t.zero
<< " exp=0x" << std::setw(8) << (uint32_t)exp.result
<< " z=" << std::dec << exp.zero
<< std::setfill(' ');
}
std::cout << std::endl;
if (ok) pass_count++; else fail_count++;
}
}
void report(int total) {
std::cout << std::endl;
std::cout << "╔══════════════════════════════════════════════════╗" << std::endl;
std::cout << "║ === Section 1 Complete: "
<< pass_count << "/" << total << " tests passed === ║" << std::endl;
std::cout << "╚══════════════════════════════════════════════════╝" << std::endl;
if (fail_count > 0)
std::cout << " FAIL: " << fail_count << " test(s) failed" << std::endl;
if (!expected_queue.empty())
std::cout << " WARN: " << expected_queue.size()
<< " expected transaction(s) never observed" << std::endl;
}
SC_CTOR(alu_checker) { SC_THREAD(check_loop); }
};
// ─── Driver — drives all 30 capstone test vectors ────────────────────────────
SC_MODULE(alu_driver) {
sc_out<sc_uint<32>> a, b;
sc_out<sc_uint<4>> op;
alu_checker* chk;
void drive(sc_uint<32> av, sc_uint<32> bv, alu_op_t opv,
sc_uint<32> exp_res, bool exp_zero, const char* desc) {
chk->load_expected(exp_res, exp_zero, desc);
a.write(av); b.write(bv); op.write(opv);
wait(10, SC_NS);
}
void run() {
std::cout << std::endl << "=== Section 1 Capstone: RV32I ALU ===" << std::endl;
// ── ADD: normal, overflow, zero ───────────────────────────────────────
std::cout << std::endl << " ── ADD ──" << std::endl;
drive(5, 3, ALU_ADD, 8, false, "ADD: 5 + 3 = 8");
drive(0, 0, ALU_ADD, 0, true, "ADD: 0 + 0 = 0 (zero flag)");
drive(0xFFFFFFFF, 1, ALU_ADD, 0, true, "ADD: 0xFFFFFFFF + 1 = 0 (overflow wraps)");
drive(0x7FFFFFFF, 1, ALU_ADD, 0x80000000, false, "ADD: max_positive + 1 → MSB set");
// ── SUB: normal, zero, underflow ──────────────────────────────────────
std::cout << std::endl << " ── SUB ──" << std::endl;
drive(10, 3, ALU_SUB, 7, false, "SUB: 10 - 3 = 7");
drive(5, 5, ALU_SUB, 0, true, "SUB: 5 - 5 = 0 (BEQ zero check)");
drive(5, 4, ALU_SUB, 1, false, "SUB: 5 - 4 = 1 (zero must be false)");
drive(0, 1, ALU_SUB, 0xFFFFFFFF, false, "SUB: 0 - 1 = 0xFFFFFFFF (underflow wraps)");
// ── AND, OR, XOR ──────────────────────────────────────────────────────
std::cout << std::endl << " ── AND / OR / XOR ──" << std::endl;
drive(0xFF00FF00, 0x0F0F0F0F, ALU_AND, 0x0F000F00, false, "AND: bit masking");
drive(0xFFFFFFFF, 0, ALU_AND, 0, true, "AND: all_ones & 0 = 0");
drive(0xFF000000, 0x00FF0000, ALU_OR, 0xFFFF0000, false, "OR: combine two fields");
drive(0, 0, ALU_OR, 0, true, "OR: 0 | 0 = 0");
drive(0xAAAAAAAA, 0x55555555, ALU_XOR, 0xFFFFFFFF, false, "XOR: alternating bits → all ones");
drive(0xDEADBEEF, 0xDEADBEEF, ALU_XOR, 0, true, "XOR: x ^ x = 0 (zero idiom)");
// ── SLT: signed comparisons ───────────────────────────────────────────
std::cout << std::endl << " ── SLT (signed) ──" << std::endl;
drive(0xFFFFFFFF, 1, ALU_SLT, 1, false, "SLT: -1 < 1 → 1 (signed)");
drive(1, 0xFFFFFFFF, ALU_SLT, 0, true, "SLT: 1 < -1 → 0 (signed, zero)");
drive(5, 3, ALU_SLT, 0, true, "SLT: 5 < 3 → 0");
drive(0x80000000, 0, ALU_SLT, 1, false, "SLT: INT_MIN < 0 → 1 (signed)");
// ── SLTU: unsigned comparisons ────────────────────────────────────────
std::cout << std::endl << " ── SLTU (unsigned) ──" << std::endl;
drive(0xFFFFFFFF, 1, ALU_SLTU, 0, true, "SLTU: 0xFFFFFFFF > 1 → 0 (unsigned)");
drive(1, 0xFFFFFFFF, ALU_SLTU, 1, false, "SLTU: 1 < 0xFFFFFFFF → 1 (unsigned)");
// ── SLL: logical left shift ───────────────────────────────────────────
std::cout << std::endl << " ── SLL ──" << std::endl;
drive(1, 0, ALU_SLL, 1, false, "SLL: 1 << 0 = 1 (shift by 0 is identity)");
drive(1, 4, ALU_SLL, 16, false, "SLL: 1 << 4 = 16");
drive(1, 31, ALU_SLL, 0x80000000, false, "SLL: 1 << 31 = 0x80000000 (MSB)");
drive(0xFFFFFFFF, 1, ALU_SLL, 0xFFFFFFFE, false, "SLL: 0xFFFFFFFF << 1 (MSB shifted out)");
// ── SRL: logical right shift ──────────────────────────────────────────
std::cout << std::endl << " ── SRL ──" << std::endl;
drive(0x80000000, 1, ALU_SRL, 0x40000000, false, "SRL: 0x80000000 >> 1 = 0x40000000 (zero fill)");
drive(0xFFFFFFFF, 4, ALU_SRL, 0x0FFFFFFF, false, "SRL: 0xFFFFFFFF >> 4 = 0x0FFFFFFF");
// ── SRA: arithmetic right shift ───────────────────────────────────────
std::cout << std::endl << " ── SRA ──" << std::endl;
drive(0x80000000, 1, ALU_SRA, 0xC0000000, false, "SRA: 0x80000000 >> 1 = 0xC0000000 (sign ext)");
drive(0xFFFFFFFF, 4, ALU_SRA, 0xFFFFFFFF, false, "SRA: -1 >> 4 = -1 (sign fills with 1s)");
drive(0x40000000, 1, ALU_SRA, 0x20000000, false, "SRA: positive: SRA == SRL");
std::cout << std::endl;
wait(SC_ZERO_TIME);
sc_stop();
}
SC_CTOR(alu_driver) { SC_THREAD(run); chk = nullptr; }
};
// ─── sc_main ──────────────────────────────────────────────────────────────────
int sc_main(int argc, char* argv[]) {
sc_signal<sc_uint<32>> sig_a, sig_b, sig_result;
sc_signal<sc_uint<4>> sig_op;
sc_signal<bool> sig_zero;
sc_fifo<alu_txn> txn_fifo(64);
alu dut("dut");
alu_monitor mon("monitor");
alu_checker chk("checker");
alu_driver drv("driver");
dut.a(sig_a); dut.b(sig_b); dut.op(sig_op);
dut.result(sig_result); dut.zero(sig_zero);
mon.a(sig_a); mon.b(sig_b); mon.op(sig_op);
mon.result(sig_result); mon.zero(sig_zero);
mon.txn_out(txn_fifo);
chk.txn_in(txn_fifo);
drv.a(sig_a); drv.b(sig_b); drv.op(sig_op);
drv.chk = &chk;
// VCD for the capstone run
sc_trace_file* tf = sc_create_vcd_trace_file("capstone1_waveform");
tf->set_time_unit(1, SC_NS);
sc_trace(tf, sig_a, "ALU.a");
sc_trace(tf, sig_b, "ALU.b");
sc_trace(tf, sig_op, "ALU.op");
sc_trace(tf, sig_result, "ALU.result");
sc_trace(tf, sig_zero, "ALU.zero");
sc_start();
sc_close_vcd_trace_file(tf);
const int total = 30;
chk.report(total);
return (chk.fail_count > 0) ? 1 : 0;
}
Expected Output
=== Section 1 Capstone: RV32I ALU ===
── ADD ──
[PASS] ADD: 5 + 3 = 8
[PASS] ADD: 0 + 0 = 0 (zero flag)
[PASS] ADD: 0xFFFFFFFF + 1 = 0 (overflow wraps)
[PASS] ADD: max_positive + 1 → MSB set
── SUB ──
[PASS] SUB: 10 - 3 = 7
[PASS] SUB: 5 - 5 = 0 (BEQ zero check)
[PASS] SUB: 5 - 4 = 1 (zero must be false)
[PASS] SUB: 0 - 1 = 0xFFFFFFFF (underflow wraps)
... (all 30 PASS) ...
╔══════════════════════════════════════════════════╗
║ === Section 1 Complete: 30/30 tests passed === ║
╚══════════════════════════════════════════════════╝
DV Insight
The principles are identical whether the implementation technology is SystemC, UVM/SystemVerilog, cocotb/Python, or a custom C++ framework. The structure — stimulus driver, passive monitor, independent checker, self-reporting pass/fail — is the universal verification pattern. What changes between frameworks is the boilerplate: in UVM, the driver is a uvm_driver with a TLM seq_item_port, the monitor is a uvm_monitor with a uvm_analysis_port, and the checker is a uvm_scoreboard. The data flow, the passive-observer constraint, and the predict-then-observe pattern are exactly what we built here.
There is one gap this testbench does not close: constrained-random coverage. Our directed tests cover the cases we thought of. A constrained-random test covering all 10 operations with random 32-bit inputs would catch cases we did not anticipate — corner cases in unsigned/signed interaction, shift amounts between 16 and 30 that no directed test exercises, etc. That gap will be closed in Posts 23–27 when we add a UVM-SystemC environment with a constrained-random sequence and functional coverage. For now, 30 directed tests covering all boundary cases is a solid and defensible closure for a 10-operation combinational block.
What You Can Do Now
Section 1 is complete. Here is what you can do independently with what you have learned:
- Model any combinational RTL block in SystemC — define ports with
sc_in/sc_out, implement logic inSC_METHOD, add sensitivity list, connect to signals - Model any clocked sequential block — use
SC_THREAD, wait on clock edges, latch values between rising edges - Write a structured testbench — separate stimulus driver, passive monitor, and checker; connect them with
sc_fifo; load expected values before applying stimulus - Generate VCD waveforms —
sc_create_vcd_trace_file,sc_traceper signal,sc_close_vcd_trace_file, open in GTKWave - Understand the evaluate-update simulation model — why delta cycles exist, why
sc_signal.write()is not immediately visible, howsc_start(SC_ZERO_TIME)settles combinational logic - Choose the correct process type — SC_METHOD for combinational logic and decoders, SC_THREAD for sequential behavior, stimulus generators, and monitors
What We Built — Section 1 Component Chain
graph LR
PT["pass_through\n(Post 1)\nSC_METHOD\nbool passthrough"] --> DFF
DFF["dff\n(Post 2)\nSC_THREAD\nD flip-flop"] --> TSC
TSC["two_stage_chain\n(Post 3)\ndelta cycle demo"] --> AG
AG["and_gate\n(Post 4)\nSC_METHOD\n2-input AND"] --> ALU
ALU["RV32I ALU\n(Post 5)\nSC_METHOD\n10 operations"] --> TB
TB["Monitor+Checker TB\n(Posts 6-7)\nSC_THREAD\nstructured testbench"]
style ALU fill:#06b6d4,color:#fff
style TB fill:#10b981,color:#fff
Each block in this chain is a building block for the full CPU. The dff becomes the pipeline stage registers (Post 18). The and_gate pattern is the template for every combinational module. The ALU is wired into the execute stage at Post 18 without modification. The Monitor+Checker TB pattern is applied to every subsequent block.
What's Next: Section 2 — The Register File
Post 8: Building the RV32I Register File
Section 2 begins. With the ALU complete, the next core compute element is the Register File — 32 general-purpose registers (x0–x31) that hold the RISC-V program state. The register file is read twice per instruction (to get the two operand values for the ALU) and written once per instruction (to store the result back).
The register file introduces a new challenge: write-then-read ordering. If an instruction writes to register x1 and the next instruction reads from x1, does the read see the new value or the old value? The answer depends on whether forwarding is implemented. In the single-cycle CPU (Posts 15–17), there is no pipeline — write and read happen in the same clock cycle in a carefully ordered sequence. In the pipelined CPU (Posts 18–21), forwarding paths from the EX and MEM stages provide the new value directly, bypassing the register file read.
With the ALU (Post 5) and Register File (Post 8) complete, we have the two core compute elements. Every subsequent post adds control logic that connects them.
Comments (0)
Leave a Comment