12. SystemC Tutorial - Single-Cycle CPU Integration
Introduction
Integration is where most bugs hide.
Each component you built in Sections 1 and 2 was tested in isolation. The ALU produced correct results for every RV32I operation. The register file wrote and read back values accurately with x0 hardwired to zero. The decoder correctly identified all 42 instruction encodings. The program counter advanced cleanly and branched on command. The data memory read and wrote bytes, halfwords, and words with proper sign extension.
Every single one of those modules looked perfect in its own testbench.
Now you wire them together, and the bugs appear.
Signal names that match in your head but don't match in the port declarations. A control signal that is active-high in one module and active-low in another. An ALU result routed to a memory address input instead of the write-data input. The wrong immediate type decoded for an S-format instruction. A branch condition evaluated one cycle too late because of a sensitivity-list omission.
These are not theoretical risks. They are the specific, predictable failure modes of any manual hardware integration. The SiFive FU540 has 500+ internal signals just in the execution stage. Our version has approximately 30. Both have exactly the same integration challenge: every wire must connect exactly one driver to one or more receivers. One misconnected wire anywhere in those 30 — or 500 — signals produces a bug that may not be obvious until a specific instruction sequence hits a very specific combination of conditions.
This is also why DV engineers exist. The RTL designer who wrote each block knows what signals mean. The integration testbench has no such assumptions — it just observes what comes out. Discrepancies between intent and output are bugs. Finding them before tape-out is the job.
By the end of this post, you will have a complete rv32i_cpu SystemC module containing all five submodules from Section 2, wired together with correct control signals and data paths. The integration testbench will run a five-instruction program and verify register state after each instruction. And you will have a connectivity-checking technique that catches misbound ports before simulation ever starts.
Prerequisites
- Post 5 — RV32I ALU (ADD, SUB, AND, OR, XOR, SLT, SLL, SRL, SRA, SLTU, LUI, AUIPC)
- Post 8 — Register File (32×32-bit, x0 hardwired to 0)
- Post 9 — Instruction Decoder (all 42 RV32I instructions → control signals + immediate)
- Post 10 — Program Counter + Instruction Memory (PC register, next-PC mux, imem ROM)
- Post 11 — Data Memory (LB/LH/LW/LBU/LHU/SB/SH/SW, byte enables, sign extension)
- Code for this post: GitHub — section2/post12
SystemC Language Reference
| Construct | Syntax | SV / Verilog Equivalent | Key Difference |
|---|---|---|---|
| Sub-module declaration | alu i_alu; in SC_MODULE header |
alu i_alu (...) instantiation |
SystemC requires declaration in header as member; SV instantiation is in module body |
| Sub-module constructor init | : i_alu("i_alu") in member initializer list |
Implicit — SV names are part of the instantiation syntax | SystemC passes name string at runtime; SV name is compile-time |
| Port binding (named style) | i_alu.a(sig_alu_a); |
.a(sig_alu_a) in port list |
Both positional and named binding allowed; named is clearer |
| Internal wire declaration | sc_signal<sc_uint<32>> sig_alu_result; as member |
logic [31:0] alu_result; local in module |
SV logic is a storage type; sc_signal is a channel object with event mechanism |
| Signal lifetime requirement | Must be member variable (persists for simulation duration) | Automatic lifetime management | Local sc_signal variables in constructor will be destroyed — silent bug |
| Combinational glue logic | SC_METHOD(alu_src_mux); sensitive << sig_rs2_data << sig_imm << sig_alu_src |
always_comb result = alu_src ? imm : rs2 |
SV auto-senses; SystemC requires explicit list — missing signal = wrong behavior |
| Unbound port detection | sc_port::bind_count() == 0 check |
Compile-time error for unconnected required ports; warning for optional | SV catches at elaboration; SystemC catches only at sc_start() runtime |
| Multiple drivers | sc_signal_resolved<T> (rare) |
wire with multiple drivers = X (resolved) |
SV wire resolves multiple drivers via logic; SystemC sc_signal is single-driver |
| Debug output ports | sc_out<T> dbg_pc |
output logic [31:0] dbg_pc |
Both expose internals; SV also has bind for non-intrusive monitors |
| Top-level instantiation | rv32i_cpu u_cpu("u_cpu"); in sc_main |
rv32i_cpu u_cpu (...) in testbench module |
SystemC top is in sc_main function; SV uses a top-level module |
| Simulation hierarchy name | sc_module_name constructor arg: "u_cpu.i_alu" |
Implicit from module nesting | Name used in error messages and waveform dumps |
Translation Table
| Concept | SystemVerilog | SystemC |
|---|---|---|
| Module instantiation | alu u_alu (.clk(clk), .a(alu_a), ...); |
alu u_alu{"u_alu"}; u_alu.a(alu_a_sig); |
| Internal wire | logic [31:0] alu_result; |
sc_signal<sc_uint<32>> alu_result_sig; |
| Combinational mux | always_comb result = sel ? b : a; |
SC_METHOD(mux_proc); sensitive << sel << a << b; |
| Port connection check | Elaboration-time port lint | Custom check_bindings() via sc_port::bind_count() |
| Top-level hierarchy | top.sv instantiates all submodules |
sc_main() instantiates top module |
| Simulation start | initial begin ... $finish; end |
sc_start(N, SC_NS) in sc_main |
| Signal tracing | $dumpfile/$dumpvars |
sc_trace_file + sc_trace() |
The Complete Single-Cycle Datapath
Before writing a single line of SystemC, you need to understand every signal that flows between the five submodules. The diagram below shows the complete RV32I single-cycle datapath with signal names. Study it carefully — every arrow is a wire you must declare and bind.
graph LR
PC["PC\nModule"] -->|pc_out| IMEM["IMEM\n(instr ROM)"]
IMEM -->|instr| DEC["Decoder"]
DEC -->|rs1_addr,rs2_addr| RF["Register\nFile"]
DEC -->|rd_addr,reg_write| RF
RF -->|rs1_data| ALUA["ALU\nSrc A"]
RF -->|rs2_data| MUXB["ALU Src B\nMux"]
DEC -->|imm,alu_src| MUXB
MUXB -->|alu_b| ALU["ALU"]
ALUA -->|alu_a| ALU
DEC -->|alu_op| ALU
ALU -->|alu_result| DMEM["DMEM\n(data mem)"]
RF -->|rs2_data| DMEM
DEC -->|mem_read,mem_write,funct3| DMEM
ALU -->|alu_result| WBMUX["Writeback\nMux"]
DMEM -->|rd_data| WBMUX
PC -->|pc_plus4| WBMUX
DEC -->|wb_sel| WBMUX
WBMUX -->|wr_data| RF
ALU -->|zero,lt,ltu| BRANCH["Branch\nLogic"]
DEC -->|branch,jump,funct3| BRANCH
PC -->|pc_out| BRANCH
DEC -->|imm| BRANCH
BRANCH -->|next_pc| PC
Count the signals. There are over 30 internal wires connecting these blocks. Each one must be declared as an sc_signal, and each one must be bound to the correct port on the correct module. One wrong binding and your CPU produces wrong answers or produces no output at all.
Module Hierarchy
Before code, visualize the containment structure:
sc_main()
└── tb_cpu "u_tb"
├── rv32i_cpu "u_cpu"
│ ├── pc "i_pc"
│ ├── imem "i_imem"
│ ├── decoder "i_dec"
│ ├── reg_file "i_rf"
│ ├── alu "i_alu"
│ └── dmem "i_dmem"
│ (plus SC_METHOD processes:
│ alu_src_mux, writeback_mux,
│ branch_logic, pc_plus4_proc)
└── (clk generator SC_THREAD,
stimulus SC_THREAD,
checker SC_METHOD)
rv32i_cpu owns all five hardware submodules and all the glue logic. tb_cpu owns the CPU plus the test stimulus and checking logic. sc_main creates one instance of tb_cpu and calls sc_start.
File Layout
post12/
├── CMakeLists.txt
├── include/
│ ├── alu.h
│ ├── reg_file.h
│ ├── decoder.h
│ ├── pc.h
│ ├── imem.h
│ ├── dmem.h
│ └── rv32i_cpu.h ← new this post
├── src/
│ ├── alu.cpp
│ ├── reg_file.cpp
│ ├── decoder.cpp
│ ├── pc.cpp
│ ├── imem.cpp
│ ├── dmem.cpp
│ └── rv32i_cpu.cpp ← new this post
└── tb/
└── tb_cpu.cpp ← new this post
rv32i_cpu.h — Complete Header
The CPU top module exposes only what the testbench needs. During normal operation that is clock and reset. For debugging and tracing we expose optional observation ports.
// include/rv32i_cpu.h
#pragma once
#include <systemc.h>
#include "pc.h"
#include "imem.h"
#include "decoder.h"
#include "reg_file.h"
#include "alu.h"
#include "dmem.h"
SC_MODULE(rv32i_cpu) {
// ─── External ports ──────────────────────────────────────────────
sc_in<bool> clk;
sc_in<bool> rst;
// Debug observation ports (optional — connect to sc_signal in tb)
sc_out<sc_uint<32>> dbg_pc; // current PC each cycle
sc_out<sc_uint<32>> dbg_instr; // current instruction word
sc_out<bool> dbg_halt; // EBREAK detected
// ─── Submodule instances ─────────────────────────────────────────
pc i_pc;
imem i_imem;
decoder i_dec;
reg_file i_rf;
alu i_alu;
dmem i_dmem;
// ─── Internal signals (~30 wires) ────────────────────────────────
// PC / instruction fetch
sc_signal<sc_uint<32>> sig_pc_out; // current PC value
sc_signal<sc_uint<32>> sig_pc_plus4; // PC + 4 (for JAL/JALR writeback)
sc_signal<sc_uint<32>> sig_instr; // instruction word from IMEM
sc_signal<sc_uint<32>> sig_next_pc; // next cycle PC (from branch_logic)
// Decoder outputs — address fields
sc_signal<sc_uint<5>> sig_rs1_addr;
sc_signal<sc_uint<5>> sig_rs2_addr;
sc_signal<sc_uint<5>> sig_rd_addr;
// Decoder outputs — data/control
sc_signal<sc_uint<32>> sig_imm; // sign-extended immediate
sc_signal<sc_uint<4>> sig_alu_op; // ALU operation select
sc_signal<bool> sig_alu_src; // 0=rs2, 1=imm
sc_signal<bool> sig_reg_write; // register file write enable
sc_signal<bool> sig_mem_read; // data memory read enable
sc_signal<bool> sig_mem_write; // data memory write enable
sc_signal<sc_uint<3>> sig_funct3; // width/branch-type selector
sc_signal<sc_uint<2>> sig_wb_sel; // writeback mux: 0=alu,1=mem,2=pc+4
sc_signal<bool> sig_branch; // this instruction is a branch
sc_signal<bool> sig_jump; // this instruction is JAL/JALR
sc_signal<bool> sig_jump_jalr; // 0=JAL (pc-relative), 1=JALR (register)
sc_signal<bool> sig_halt; // EBREAK instruction
// Register file outputs
sc_signal<sc_uint<32>> sig_rs1_data; // operand A (always from RF)
sc_signal<sc_uint<32>> sig_rs2_data; // operand B (before mux)
// ALU
sc_signal<sc_uint<32>> sig_alu_a; // ALU A input (= rs1_data)
sc_signal<sc_uint<32>> sig_alu_b; // ALU B input (after src mux)
sc_signal<sc_uint<32>> sig_alu_result; // ALU output
sc_signal<bool> sig_alu_zero; // result == 0
sc_signal<bool> sig_alu_lt; // signed less-than
sc_signal<bool> sig_alu_ltu; // unsigned less-than
// Data memory
sc_signal<sc_uint<32>> sig_mem_rd_data; // load result
// Writeback bus
sc_signal<sc_uint<32>> sig_wr_data; // data written to register file
// Branch resolution
sc_signal<bool> sig_branch_taken; // branch condition evaluated true
// ─── SC_METHOD declarations ───────────────────────────────────────
void alu_src_mux(); // selects rs2_data or imm for ALU B
void pc_plus4_proc(); // computes PC + 4
void branch_logic(); // evaluates branch, computes next_pc
void writeback_mux(); // selects ALU result, mem data, or PC+4
// Connectivity checker (call before sc_start in testbench)
void check_bindings();
SC_CTOR(rv32i_cpu)
: i_pc("i_pc"), i_imem("i_imem"), i_dec("i_dec"),
i_rf("i_rf"), i_alu("i_alu"), i_dmem("i_dmem")
{
// ── 1. Program Counter ────────────────────────────────────────
i_pc.clk(clk);
i_pc.rst(rst);
i_pc.next_pc(sig_next_pc); // driven by branch_logic
i_pc.pc_out(sig_pc_out); // drives IMEM addr, BRANCH logic
// ── 2. Instruction Memory ─────────────────────────────────────
i_imem.addr(sig_pc_out); // byte address = current PC
i_imem.instr(sig_instr); // instruction word → decoder
// ── 3. Decoder ────────────────────────────────────────────────
i_dec.instr(sig_instr);
i_dec.rs1_addr(sig_rs1_addr);
i_dec.rs2_addr(sig_rs2_addr);
i_dec.rd_addr(sig_rd_addr);
i_dec.imm(sig_imm);
i_dec.alu_op(sig_alu_op);
i_dec.alu_src(sig_alu_src);
i_dec.reg_write(sig_reg_write);
i_dec.mem_read(sig_mem_read);
i_dec.mem_write(sig_mem_write);
i_dec.funct3(sig_funct3);
i_dec.wb_sel(sig_wb_sel);
i_dec.branch(sig_branch);
i_dec.jump(sig_jump);
i_dec.jump_jalr(sig_jump_jalr);
i_dec.halt(sig_halt);
// ── 4. Register File ─────────────────────────────────────────
i_rf.clk(clk);
i_rf.rst(rst);
i_rf.rs1_addr(sig_rs1_addr);
i_rf.rs2_addr(sig_rs2_addr);
i_rf.rd_addr(sig_rd_addr);
i_rf.rd_data(sig_wr_data); // ← writeback bus feeds rd_data
i_rf.reg_write(sig_reg_write);
i_rf.rs1_data(sig_rs1_data);
i_rf.rs2_data(sig_rs2_data);
// ── 5. ALU ────────────────────────────────────────────────────
i_alu.a(sig_rs1_data); // A is always rs1
i_alu.b(sig_alu_b); // B is muxed (rs2 or imm)
i_alu.op(sig_alu_op);
i_alu.result(sig_alu_result);
i_alu.zero(sig_alu_zero);
i_alu.lt(sig_alu_lt);
i_alu.ltu(sig_alu_ltu);
// ── 6. Data Memory ────────────────────────────────────────────
i_dmem.clk(clk);
i_dmem.rst(rst);
i_dmem.addr(sig_alu_result); // EA = ALU result (base + offset)
i_dmem.wr_data(sig_rs2_data); // store data = rs2
i_dmem.mem_read(sig_mem_read);
i_dmem.mem_write(sig_mem_write);
i_dmem.funct3(sig_funct3); // width + sign select
i_dmem.rd_data(sig_mem_rd_data);
// ── 7. ALU Src Mux (combinational) ────────────────────────────
SC_METHOD(alu_src_mux);
sensitive << sig_rs2_data << sig_imm << sig_alu_src;
// ── 8. PC+4 (combinational) ───────────────────────────────────
SC_METHOD(pc_plus4_proc);
sensitive << sig_pc_out;
// ── 9. Branch / Next-PC Logic (combinational) ─────────────────
SC_METHOD(branch_logic);
sensitive << sig_branch << sig_jump << sig_jump_jalr
<< sig_funct3 << sig_alu_zero << sig_alu_lt
<< sig_alu_ltu << sig_pc_out << sig_imm
<< sig_rs1_data << sig_halt;
// ── 10. Writeback Mux (combinational) ─────────────────────────
SC_METHOD(writeback_mux);
sensitive << sig_alu_result << sig_mem_rd_data
<< sig_pc_plus4 << sig_wb_sel;
// ── 11. Debug outputs ─────────────────────────────────────────
// Connect debug ports by forwarding internal signals.
// We use SC_METHOD for this so the outputs update combinationally.
// Alternatively, bind debug ports directly to the signals:
// dbg_pc → bind to sig_pc_out (via a passthrough method)
// dbg_instr → bind to sig_instr
// dbg_halt → bind to sig_halt
}
};
dbg_pc, dbg_instr, dbg_halt) follow a pattern called observation hooks. In real chip verification, these are analogous to scan-chain outputs or internal assertion signals — they expose internal state without changing behavior. In SystemVerilog you would use bind or interface ports to attach monitors non-intrusively. In SystemC, debug sc_out ports achieve the same goal: the testbench can observe internals without reaching into the DUT's private members.rv32i_cpu.cpp — Complete Implementation
The four glue-logic processes are short but critical. Each one implements a mux or logic block that Verilog would express with always_comb.
// src/rv32i_cpu.cpp
#include "rv32i_cpu.h"
#include <iostream>
// ─────────────────────────────────────────────────────────────────────
// Process 1: ALU Source B Mux
//
// alu_src == 0 → B operand is rs2_data (R-type instructions)
// alu_src == 1 → B operand is immediate (I, S, U, J-type)
// ─────────────────────────────────────────────────────────────────────
void rv32i_cpu::alu_src_mux() {
if (sig_alu_src.read()) {
sig_alu_b.write(sig_imm.read());
} else {
sig_alu_b.write(sig_rs2_data.read());
}
}
// ─────────────────────────────────────────────────────────────────────
// Process 2: PC + 4
//
// Used as the link register value for JAL and JALR.
// Also serves as the fall-through next-PC for non-branch instructions
// (though branch_logic handles the actual next_pc selection).
// ─────────────────────────────────────────────────────────────────────
void rv32i_cpu::pc_plus4_proc() {
sig_pc_plus4.write(sig_pc_out.read() + 4);
}
// ─────────────────────────────────────────────────────────────────────
// Process 3: Branch Logic and Next-PC Selection
//
// Priority:
// 1. EBREAK (halt) → freeze PC at current value
// 2. JALR → (rs1_data + imm) & ~1 (clear bit 0)
// 3. JAL → PC + imm (PC-relative)
// 4. Branch taken → PC + imm (PC-relative)
// 5. Default → PC + 4 (sequential)
//
// Branch conditions use funct3 exactly as specified in the
// RISC-V ISA Manual Vol I, Table 2.5:
// 000 BEQ → zero
// 001 BNE → !zero
// 100 BLT → lt (signed)
// 101 BGE → !lt (signed)
// 110 BLTU → ltu (unsigned)
// 111 BGEU → !ltu (unsigned)
// ─────────────────────────────────────────────────────────────────────
void rv32i_cpu::branch_logic() {
sc_uint<32> pc = sig_pc_out.read();
sc_uint<32> imm = sig_imm.read();
sc_uint<32> rs1 = sig_rs1_data.read();
bool zero = sig_alu_zero.read();
bool lt = sig_alu_lt.read();
bool ltu = sig_alu_ltu.read();
// Evaluate branch condition
bool taken = false;
if (sig_branch.read()) {
switch (sig_funct3.read()) {
case 0: taken = zero; break; // BEQ
case 1: taken = !zero; break; // BNE
case 4: taken = lt; break; // BLT (signed)
case 5: taken = !lt; break; // BGE (signed)
case 6: taken = ltu; break; // BLTU (unsigned)
case 7: taken = !ltu; break; // BGEU (unsigned)
default: taken = false; break;
}
}
sig_branch_taken.write(taken);
// Compute next PC
sc_uint<32> next;
if (sig_halt.read()) {
// EBREAK: freeze
next = pc;
} else if (sig_jump.read() && sig_jump_jalr.read()) {
// JALR: target = (rs1 + imm) with bit 0 cleared
// The bit-0 clear prevents misaligned instruction fetches.
// RISC-V ISA Manual Vol I, section 2.5:
// "The target address is obtained by adding the sign-extended
// 12-bit I-immediate to the register rs1, then setting the
// least-significant bit of the result to zero."
next = (rs1 + imm) & sc_uint<32>(0xFFFFFFFEu);
} else if (sig_jump.read()) {
// JAL: PC-relative
next = pc + imm;
} else if (taken) {
// Branch taken: PC-relative
next = pc + imm;
} else {
// Sequential
next = pc + 4;
}
sig_next_pc.write(next);
}
// ─────────────────────────────────────────────────────────────────────
// Process 4: Writeback Mux
//
// wb_sel == 0 → write ALU result (R-type, I-type arithmetic)
// wb_sel == 1 → write mem read data (load instructions)
// wb_sel == 2 → write PC + 4 (JAL, JALR link register)
// ─────────────────────────────────────────────────────────────────────
void rv32i_cpu::writeback_mux() {
sc_uint<32> result;
switch (sig_wb_sel.read()) {
case 0: result = sig_alu_result.read(); break; // arithmetic
case 1: result = sig_mem_rd_data.read(); break; // load
case 2: result = sig_pc_plus4.read(); break; // JAL/JALR link
default: result = sig_alu_result.read(); break;
}
sig_wr_data.write(result);
}
// ─────────────────────────────────────────────────────────────────────
// Connectivity Checker
//
// Call this from the testbench before sc_start() to verify that
// every port on every submodule is actually bound to a signal.
// SystemC will throw sc_report errors at elaboration time for
// completely unbound ports, but this function makes the check
// explicit and produces a readable diagnostic.
// ─────────────────────────────────────────────────────────────────────
void rv32i_cpu::check_bindings() {
bool ok = true;
// sc_port::bind_count() returns the number of signals bound to this port.
// For sc_in/sc_out, exactly 1 binding is required.
auto check_port = [&](const char* name, int count) {
if (count == 0) {
std::cerr << "[BIND ERROR] Port not bound: " << name << "\n";
ok = false;
}
};
// PC module
check_port("i_pc.clk", i_pc.clk.bind_count());
check_port("i_pc.rst", i_pc.rst.bind_count());
check_port("i_pc.next_pc", i_pc.next_pc.bind_count());
check_port("i_pc.pc_out", i_pc.pc_out.bind_count());
// IMEM module
check_port("i_imem.addr", i_imem.addr.bind_count());
check_port("i_imem.instr", i_imem.instr.bind_count());
// Decoder
check_port("i_dec.instr", i_dec.instr.bind_count());
check_port("i_dec.rs1_addr", i_dec.rs1_addr.bind_count());
check_port("i_dec.rs2_addr", i_dec.rs2_addr.bind_count());
check_port("i_dec.rd_addr", i_dec.rd_addr.bind_count());
check_port("i_dec.imm", i_dec.imm.bind_count());
check_port("i_dec.alu_op", i_dec.alu_op.bind_count());
check_port("i_dec.alu_src", i_dec.alu_src.bind_count());
check_port("i_dec.reg_write", i_dec.reg_write.bind_count());
check_port("i_dec.mem_read", i_dec.mem_read.bind_count());
check_port("i_dec.mem_write", i_dec.mem_write.bind_count());
check_port("i_dec.funct3", i_dec.funct3.bind_count());
check_port("i_dec.wb_sel", i_dec.wb_sel.bind_count());
check_port("i_dec.branch", i_dec.branch.bind_count());
check_port("i_dec.jump", i_dec.jump.bind_count());
check_port("i_dec.halt", i_dec.halt.bind_count());
// Register File
check_port("i_rf.clk", i_rf.clk.bind_count());
check_port("i_rf.rs1_addr", i_rf.rs1_addr.bind_count());
check_port("i_rf.rs2_addr", i_rf.rs2_addr.bind_count());
check_port("i_rf.rd_addr", i_rf.rd_addr.bind_count());
check_port("i_rf.rd_data", i_rf.rd_data.bind_count());
check_port("i_rf.reg_write", i_rf.reg_write.bind_count());
check_port("i_rf.rs1_data", i_rf.rs1_data.bind_count());
check_port("i_rf.rs2_data", i_rf.rs2_data.bind_count());
// ALU
check_port("i_alu.a", i_alu.a.bind_count());
check_port("i_alu.b", i_alu.b.bind_count());
check_port("i_alu.op", i_alu.op.bind_count());
check_port("i_alu.result", i_alu.result.bind_count());
check_port("i_alu.zero", i_alu.zero.bind_count());
check_port("i_alu.lt", i_alu.lt.bind_count());
check_port("i_alu.ltu", i_alu.ltu.bind_count());
// Data Memory
check_port("i_dmem.clk", i_dmem.clk.bind_count());
check_port("i_dmem.addr", i_dmem.addr.bind_count());
check_port("i_dmem.wr_data", i_dmem.wr_data.bind_count());
check_port("i_dmem.mem_read", i_dmem.mem_read.bind_count());
check_port("i_dmem.mem_write", i_dmem.mem_write.bind_count());
check_port("i_dmem.funct3", i_dmem.funct3.bind_count());
check_port("i_dmem.rd_data", i_dmem.rd_data.bind_count());
if (ok) {
std::cout << "[CHECK] All ports bound correctly.\n";
} else {
SC_REPORT_FATAL("rv32i_cpu", "Unbound ports detected. Aborting.");
}
}
The JALR Bit-0 Clear — Why It Matters
The JALR instruction computes its target as (rs1 + imm) & ~1. That & ~1 — masking out bit 0 — is not optional. The RISC-V ISA Manual Vol I, section 2.5 states explicitly that this bit must be cleared. The reason: RISC-V instructions are at minimum 16-bit aligned (with the C extension) or 32-bit aligned (without it). An odd target address would cause an instruction-address-misaligned exception on any real implementation.
In SystemC with sc_uint<32>:
// Clear bit 0 to ensure instruction-aligned target
next = (rs1 + imm) & sc_uint<32>(0xFFFFFFFEu);
This is equivalent to the Verilog:
next_pc = (rs1_data + imm32) & 32'hFFFFFFFE;
The masking happens after the addition. If rs1 = 0x1000 and imm = 0x5, the raw sum is 0x1005 — an odd address. After masking: 0x1004. This is the correct behavior per the specification.
Branch Logic in Detail
All six RV32I branch instructions use the same opcode (BRANCH, opcode 1100011). The funct3 field distinguishes them:
void rv32i_cpu::branch_logic() {
// ...
bool taken = false;
if (sig_branch.read()) {
switch (sig_funct3.read()) {
case 0: taken = sig_alu_zero.read(); break; // BEQ — branch if equal
case 1: taken = !sig_alu_zero.read(); break; // BNE — branch if not equal
case 4: taken = sig_alu_lt.read(); break; // BLT — branch if less-than (signed)
case 5: taken = !sig_alu_lt.read(); break; // BGE — branch if >= (signed)
case 6: taken = sig_alu_ltu.read(); break; // BLTU — branch if less-than (unsigned)
case 7: taken = !sig_alu_ltu.read(); break; // BGEU — branch if >= (unsigned)
}
}
// ...
}
The ALU must produce zero, lt (signed less-than), and ltu (unsigned less-than) flags. For BEQ/BNE we use subtraction and check the zero flag. For BLT/BGE the ALU performs a signed comparison via SLT. For BLTU/BGEU it uses SLTU. The decoder sets alu_op to trigger the right ALU operation when processing a branch instruction.
Why funct3 == 2 and funct3 == 3 are unused for branches: Those encodings (010 and 011) are reserved in the RISC-V specification. A real processor would raise an illegal-instruction exception. Our model simply drives taken = false via the default case.
Integration Testbench (tb_cpu.cpp)
The testbench runs five instructions, reads the debug ports after each cycle, and checks the register file via a simple method:
// tb/tb_cpu.cpp
#include <systemc.h>
#include "rv32i_cpu.h"
#include <iostream>
#include <iomanip>
SC_MODULE(tb_cpu) {
sc_signal<bool> clk;
sc_signal<bool> rst;
sc_signal<sc_uint<32>> dbg_pc;
sc_signal<sc_uint<32>> dbg_instr;
sc_signal<bool> dbg_halt;
rv32i_cpu u_cpu;
// Five-instruction test program:
// 0x00000013 addi x0, x0, 0 (NOP — baseline)
// 0x00500093 addi x1, x0, 5 (x1 = 5)
// 0x00300113 addi x2, x0, 3 (x2 = 3)
// 0x002081b3 add x3, x1, x2 (x3 = 5+3 = 8)
// 0x0031a023 sw x3, 0(x3) (mem[8] = 8) note: x3 used as base
// 0x0001a183 lw x3, 0(x3) (x3 = mem[8] = 8, verify load)
// 0x00100073 ebreak
static constexpr uint32_t prog[] = {
0x00000013u, // addi x0, x0, 0 — NOP
0x00500093u, // addi x1, x0, 5 — x1 = 5
0x00300113u, // addi x2, x0, 3 — x2 = 3
0x002081b3u, // add x3, x1, x2 — x3 = 8
0x0031a023u, // sw x3, 0(x3) — mem[8] = 8
0x0001a183u, // lw x3, 0(x3) — x3 = mem[8] = 8
0x00100073u, // ebreak
};
SC_CTOR(tb_cpu) : u_cpu("u_cpu") {
// Bind CPU ports
u_cpu.clk(clk);
u_cpu.rst(rst);
u_cpu.dbg_pc(dbg_pc);
u_cpu.dbg_instr(dbg_instr);
u_cpu.dbg_halt(dbg_halt);
// Connectivity check before simulation
u_cpu.check_bindings();
// Load program into instruction memory
for (size_t i = 0; i < sizeof(prog)/sizeof(prog[0]); i++)
u_cpu.i_imem.load_word(i * 4, prog[i]);
SC_THREAD(run_test);
}
void run_test() {
// Apply reset for 2 cycles
rst.write(true);
clk.write(false);
wait(10, SC_NS); clk.write(true); wait(10, SC_NS); clk.write(false);
wait(10, SC_NS); clk.write(true); wait(10, SC_NS); clk.write(false);
rst.write(false);
std::cout << "\n=== tb_cpu: 5-instruction integration test ===\n";
std::cout << std::hex << std::setfill('0');
int cycle = 0;
while (!dbg_halt.read()) {
// Rising edge
wait(10, SC_NS); clk.write(true);
wait(1, SC_NS); // let combinational settle
std::cout << "[" << std::dec << std::setw(3) << ++cycle << "] "
<< "PC=0x" << std::hex << std::setw(8) << dbg_pc.read()
<< " instr=0x" << std::setw(8) << dbg_instr.read()
<< "\n";
wait(9, SC_NS); clk.write(false);
if (cycle > 50) {
std::cerr << "TIMEOUT: CPU did not halt after 50 cycles\n";
break;
}
}
// Final register state check
// After the test program:
// x1 should be 5
// x2 should be 3
// x3 should be 8 (from SW+LW round-trip)
bool pass = true;
auto check_reg = [&](int r, uint32_t expected) {
uint32_t got = u_cpu.i_rf.read_reg(r);
if (got != expected) {
std::cerr << "FAIL: x" << r << " = 0x" << std::hex << got
<< ", expected 0x" << expected << "\n";
pass = false;
}
};
check_reg(1, 5);
check_reg(2, 3);
check_reg(3, 8);
if (pass)
std::cout << "\nPASS: All register values correct.\n";
else
std::cout << "\nFAIL: Register mismatch detected.\n";
sc_stop();
}
};
int sc_main(int, char**) {
tb_cpu tb("tb");
sc_start();
return 0;
}
Hierarchical Connectivity Verification
SystemC's port binding model has a useful property and a dangerous gap.
The useful property: If you completely fail to bind a port, SystemC throws an sc_report with severity SC_ERROR during elaboration, before sc_start() is called. Simulation never starts. You will see a message like:
Error: (E109) complete binding failed: port not bound: port 'i_alu.b'
The dangerous gap: SystemC cannot detect wrong connections. If you bind port B of the ALU to the immediate signal instead of the ALU-B mux output, both are sc_signal<sc_uint<32>>, the types match, binding succeeds, elaboration completes, and simulation runs. The results are simply wrong — and only a testbench that exercises the mux selection path will catch it.
This is why the check_bindings() function above exists. Checking bind_count() > 0 confirms ports are bound, but it cannot confirm they are bound to the right signals. The only defense against wrong connections is a thorough functional testbench that exercises every instruction and every control path.
The RISC-V Rocket Chip generator at UC Berkeley addresses this with Chisel's type-safe hardware construction language: if you wire two nets of incompatible types — say, a UInt(5.W) register address to a UInt(32.W) data bus — the Scala compiler rejects it before any hardware is generated. In Verilog and SystemC, the type system is too permissive to catch semantic misconnections. You must catch them with simulation.
CMakeLists.txt
cmake_minimum_required(VERSION 3.15)
project(rv32i_cpu CXX)
set(CMAKE_CXX_STANDARD 14)
# SystemC installation path — set via environment or cmake -D
if(NOT DEFINED SYSTEMC_HOME)
set(SYSTEMC_HOME $ENV{SYSTEMC_HOME})
endif()
find_library(SYSTEMC_LIB systemc
PATHS ${SYSTEMC_HOME}/lib ${SYSTEMC_HOME}/lib-linux64
${SYSTEMC_HOME}/lib-macosx64
REQUIRED)
find_path(SYSTEMC_INCLUDE systemc.h
PATHS ${SYSTEMC_HOME}/include REQUIRED)
include_directories(${SYSTEMC_INCLUDE} include)
# Collect all CPU source files
set(CPU_SOURCES
src/alu.cpp
src/reg_file.cpp
src/decoder.cpp
src/pc.cpp
src/imem.cpp
src/dmem.cpp
src/rv32i_cpu.cpp
)
add_executable(tb_cpu ${CPU_SOURCES} tb/tb_cpu.cpp)
target_link_libraries(tb_cpu ${SYSTEMC_LIB})
Build:
mkdir -p build && cd build
cmake .. -DSYSTEMC_HOME=$SYSTEMC_HOME
make -j4
./tb_cpu
Industry Reference: Rocket Chip and the Cost of Manual Interconnect
The RISC-V Rocket Chip generator (Patterson, Asanović et al., UC Berkeley, 2016) was motivated in part by the observation that connecting 50+ internal signals by hand in the execute stage of a modern in-order pipeline is a major source of bugs. Their solution was Chisel — a hardware description language embedded in Scala that generates Verilog or FIRRTL from type-checked hardware descriptions. The generated Verilog looks exactly like our manual binding in rv32i_cpu.cpp, except it was generated from a description where the compiler verified every connection.
SiFive's Freedom E310 (the first commercial RISC-V SoC) is built on this core. It adds JTAG debug transport and PLIC interrupt controller on top of exactly the single-cycle → 5-stage pipeline progression we are following. Their first integration simulation of the full SoC with all peripherals caught 23 connectivity issues in the first run — after Chisel had already eliminated the class of type-mismatch errors that SystemC and Verilog do not catch.
Our manual approach is correct and instructive. You are learning what tools like Chisel automate, which is the only way to understand why those tools exist.
Simulation Semantics: Elaboration vs. Simulation Phases
SystemC simulation has three distinct phases. Understanding them is critical for integration work, because most integration bugs manifest during elaboration — before any simulation clock ticks.
Phase 1: Construction (C++ constructors)
All SC_MODULE constructors run. Sub-module instances are created. sc_signal objects are initialized. Process declarations (SC_METHOD, SC_CTHREAD, etc.) are registered. Port bindings (port(signal)) are called.
This is the SystemC equivalent of Verilog's elaboration. In SV, elaboration is done by the tool at compile/link time. In SystemC, it runs in the user's sc_main function at runtime.
Phase 2: Start-of-Simulation (sc_start / end_of_elaboration callbacks)
sc_start() is called. The kernel verifies that all ports are bound. Unbound required ports cause an SC_ERROR fatal report and abort. This is when check_bindings() should also be called. The end_of_elaboration() callback fires on all SC_MODULE instances — a clean hook for any initialization that requires all ports to be connected.
Phase 3: Simulation (delta cycles, timed steps)
The simulation kernel runs the event-driven loop: evaluate processes, update signals, advance time. sc_start(N, SC_NS) runs for N nanoseconds. sc_stop() from within a process requests termination.
Compare to SV:
// SV phases are tool-managed and largely invisible to the user:
// Compilation → Elaboration → Initial blocks → Simulation time 0 → ...
// The equivalent of sc_start() is just the simulator starting
// The equivalent of sc_stop() is $finish or $stop
Why This Matters for Integration
In SV, a missing port connection is caught at compile/elaborate time — you never even start simulation. In SystemC, you reach sc_start() before the error is detected. This means your sc_main code runs, your testbench constructor fires, and your program loads — then simulation aborts with a cryptic port-binding error. The check_bindings() function moves this detection earlier (before sc_start) by proactively checking bind_count() during construction.
Port Binding Mechanics — The SystemC Connection Model
In SV, module connections are declarative — you write them in the instantiation and the tool checks them at compile time. In SystemC, port binding is imperative — you call port(channel) in the constructor body, and errors are detected at runtime.
// SystemVerilog module instantiation (declarative, compile-time checked)
alu i_alu (
.clk (clk),
.a (sig_alu_a),
.b (sig_alu_b),
.op (sig_alu_op),
.result (sig_alu_result)
);
// Unconnected required port → compile error
// SystemC equivalent (imperative, runtime checked at sc_start)
alu i_alu{"i_alu"}; // Constructor: registers name in hierarchy
i_alu.clk(clk); // Bind clk port to clk signal
i_alu.a(sig_alu_a); // Bind a port to sig_alu_a signal
i_alu.b(sig_alu_b);
i_alu.op(sig_alu_op);
i_alu.result(sig_alu_result);
// Unconnected port → sc_start() throws SC_ERROR
The single-driver rule: sc_signal<T> must have exactly one writer (one output port or one process that calls write()). This corresponds to the SV rule that only one module should drive a wire. Multiple drivers on a wire produce X in SV. Multiple writers to an sc_signal produce a runtime warning in most SystemC implementations.
In rv32i_cpu, every signal has exactly one writer:
- sig_alu_result is written by i_alu.result only
- sig_alu_b is written by alu_src_mux only
- sig_wr_data is written by writeback_mux only
If you accidentally bind two output ports to the same sc_signal, the second binding overwrites the first in some implementations, or fires a multiple-driver warning in others. Always trace which process owns each signal.
Port-to-port binding (connecting an output port directly to an input port without an intermediate signal) is syntactically legal in SystemC but non-standard. Always use an explicit sc_signal<T> as the wire between ports — it makes the connection inspectable and traceable in waveform dumps.
Module Hierarchy and sc_module_name
Every SC_MODULE constructor takes a sc_module_name argument (inherited from sc_module). This string becomes the module's name in the simulation hierarchy and appears in all error messages and waveform annotations.
Simulation hierarchy for our CPU:
sc_main()
└── tb_cpu "u_tb"
└── rv32i_cpu "u_cpu"
├── pc "i_pc"
├── imem "i_imem"
├── decoder "i_dec"
├── reg_file "i_rf"
├── alu "i_alu"
└── dmem "i_dmem"
When a SystemC error fires, it reports the full path: u_cpu.i_alu: port 'b' not bound. This matches the SV hierarchy path notation (u_cpu.i_alu.b) exactly.
The constructor member initializer list must initialize sub-modules with their names:
SC_CTOR(rv32i_cpu)
: i_pc("i_pc"), i_imem("i_imem"), i_dec("i_dec"),
i_rf("i_rf"), i_alu("i_alu"), i_dmem("i_dmem")
{
// Port bindings follow...
}
If you omit a module from the initializer list, it gets a default-constructed name (empty string in some implementations, or a generated name in others). The module will lack a meaningful name in error messages, making debugging harder.
sc_signal naming follows the same convention:
sc_signal<sc_uint<32>> sig_alu_result{"sig_alu_result"};
// In waveform: shows as "u_cpu.sig_alu_result"
// In error messages: "u_cpu.sig_alu_result: multiple drivers"
Unnamed signals (sc_signal<sc_uint<32>> sig; without a name string) are valid but produce anonymous entries in waveforms. Name your signals for production simulation environments.
Common Pitfalls for SV Engineers
Pitfall 1: Sub-module instances must be header members, not constructor-local variables
// WRONG — local variable: destroyed when constructor returns!
SC_CTOR(rv32i_cpu) {
alu i_alu{"i_alu"}; // Created on the stack
i_alu.a(sig_alu_a); // Binding to a stack object
// i_alu is destroyed here — the simulation will crash
}
// CORRECT — member variable: persists for simulation lifetime
// In rv32i_cpu.h:
alu i_alu; // Declared as member
// In constructor:
SC_CTOR(rv32i_cpu) : i_alu("i_alu") { ... }
SV has no equivalent problem — all instances in SV are static after elaboration. This is a C++ object lifetime issue that has no SV counterpart and produces very confusing simulation crashes.
Pitfall 2: sc_signal wires must also be member variables
Same issue as pitfall 1, but for internal signals. A sc_signal<T> declared as a local variable in the constructor is destroyed after the constructor returns, leaving dangling pointers in the port bindings.
// WRONG — local signal, destroyed after constructor
SC_CTOR(rv32i_cpu) {
sc_signal<sc_uint<32>> sig_alu_result; // Stack allocation
i_alu.result(sig_alu_result); // Port binds to stack object
} // sig_alu_result destroyed here — simulation accesses freed memory
// CORRECT — member signal, lives for simulation lifetime
// In rv32i_cpu.h:
sc_signal<sc_uint<32>> sig_alu_result;
Pitfall 3: Missing module name in initializer list — silent bugs in error messages
// Missing i_alu in initializer list:
SC_CTOR(rv32i_cpu) : i_pc("i_pc"), i_imem("i_imem") /* forgot i_alu */ {
i_alu.a(sig_alu_a); // Works — but i_alu has no name
// Error messages show: "(anonymous): port 'a' not bound" — unhelpful
}
Pitfall 4: Binding order does not affect the connection
In SV, the order of port connections in an instantiation does not matter. In SystemC, binding order also does not matter — all bindings happen during elaboration before simulation starts. You do NOT need to bind ports in the order they appear in the module header. Bind in whatever order makes the code most readable (typically: clock/reset first, then input ports, then output ports).
Pitfall 5: Connecting output port to input port directly (port-to-port binding) is legal but non-standard
// Legal but discouraged:
i_alu.result.bind(i_dmem.addr); // Direct port-to-port (no intermediate signal)
// Preferred: use explicit sc_signal
sc_signal<sc_uint<32>> sig_alu_result{"sig_alu_result"};
i_alu.result(sig_alu_result); // ALU writes
i_dmem.addr(sig_alu_result); // DMEM reads
The direct binding works but you lose: named waveform tracing, multiple readers via one signal, the ability to monitor the wire independently. Use intermediate signals for any wire that connects more than two endpoints, or that you want visible in waveforms.
What's Next
Post 13 is the Section 2 Capstone: Running Real RV32I Programs.
You will load actual machine code into the instruction memory — Fibonacci, array sum, byte counting — and watch your CPU execute it instruction by instruction. A software reference model (SoftCPU) will run in lockstep, comparing every register after every instruction. This is the methodology that ARM, Intel, and every major CPU vendor uses to validate their implementations. Our version handles 42 instructions; theirs handle 2,000 or more. The methodology is identical.
After Post 13, your single-cycle RV32I CPU is fully verified. Section 3 begins with TLM 2.0 — replacing the cycle-accurate data memory with a transaction-level interface, moving the model one abstraction level closer to a real system simulation.
Comments (0)
Leave a Comment