8. SystemC Tutorial - Register File: 32x32-bit Storage

Introduction

Every RISC-V CPU has 32 general-purpose registers, named x0 through x31. They are the CPU's scratchpad — the only storage every instruction can access in a single clock cycle. Before the ALU adds two numbers, both operands must already live in registers. When the ALU finishes, the result gets written back to a register. The register file is the central hub of every instruction cycle.

The RV32I register file is 32 entries of 32 bits each — exactly 1 kilobit. Modest storage, but the bandwidth demands are high: most instructions require two simultaneous reads (to get both source operands) and one write (to store the result). That means the hardware must support two independent read ports and one write port, all operating within the same clock cycle.

One register is special above all others: x0 is hardwired to zero. This is not a software convention — it is a hardware invariant. Every read of x0 returns 0. Every write to x0 is silently discarded. The hardware enforces this, and so will our SystemC implementation.

Compare to other architectures:
- ARM Cortex-M: 16 registers (r0–r15). r13=SP (Stack Pointer), r14=LR (Link Register), r15=PC (Program Counter lives in the register file and is directly readable). Only 13 truly general-purpose.
- x86-64: 16 GPRs (rax–r15). Compiler register allocation for x86 is considerably harder than for RISC-V; 16 registers is tight for modern optimizing compilers.
- RISC-V: 32 GPRs, PC is separate and not in the register file. The extra 16 registers over ARM make calling conventions and compiler register allocation cleaner.

The RV32I Register ABI

The hardware only knows register numbers. The Application Binary Interface (ABI) assigns names and conventional uses so that code compiled by GCC, Clang, and LLVM can interoperate:

Register ABI Name Saver Conventional Use
x0 zero Hardwired zero — always reads 0, writes ignored
x1 ra Caller Return address
x2 sp Callee Stack pointer
x3 gp Global pointer
x4 tp Thread pointer
x5 t0 Caller Temporary / alternate link register
x6 t1 Caller Temporary
x7 t2 Caller Temporary
x8 s0/fp Callee Saved register / frame pointer
x9 s1 Callee Saved register
x10 a0 Caller Function argument 0 / return value
x11 a1 Caller Function argument 1 / return value
x12 a2 Caller Function argument 2
x13 a3 Caller Function argument 3
x14 a4 Caller Function argument 4
x15 a5 Caller Function argument 5
x16 a6 Caller Function argument 6
x17 a7 Caller Function argument 7
x18 s2 Callee Saved register
x19 s3 Callee Saved register
x20 s4 Callee Saved register
x21 s5 Callee Saved register
x22 s6 Callee Saved register
x23 s7 Callee Saved register
x24 s8 Callee Saved register
x25 s9 Callee Saved register
x26 s10 Callee Saved register
x27 s11 Callee Saved register
x28 t3 Caller Temporary
x29 t4 Caller Temporary
x30 t5 Caller Temporary
x31 t6 Caller Temporary

Caller-saved (a0–a7, t0–t6, ra): the calling function saves these before a call if needed afterward. The callee may freely clobber them.

Callee-saved (s0–s11, sp): the called function must save and restore these if it uses them. The caller can rely on their values being preserved.

Note: In RISC-V, x2=sp is a convention — not hardwired. Software agrees to treat it as the stack pointer. In ARM Cortex-M, r13 is physically the SP register and behaves differently at the hardware level.


Hardware Concept: Synchronous Write, Asynchronous Read

The timing model of the register file is one of the most important concepts in RTL design.

Asynchronous read (combinational): The two read ports behave like combinational logic. When rs1_addr changes, rs1_data updates in the same delta cycle — no clock edge required. This is identical to a multiplexer. In real silicon (e.g., TSMC 7nm), the critical timing path goes from the register-file read port through the ALU output. A 32-entry 32-bit register file has approximately 1 ns read latency at 7nm. Combinational reads mean the read latency is on the critical path and determines your maximum clock frequency.

Synchronous write (clocked): The write port is clocked on the rising edge of clk. The stored value updates atomically at the clock edge, and the new value is visible to subsequent combinational reads. Adding flip-flop latency to reads would halve the maximum achievable clock frequency — that is why the read path is combinational.

Why uint32_t regs[32] and not sc_signal?

sc_signal is designed for inter-module communication — values other modules observe, that participate in the delta-cycle event mechanism, or that you want to appear in VCD traces as named signals. The register array is internal state of the reg_file module. No external module accesses individual array entries directly; they only see rs1_data and rs2_data. Using sc_signal for internal state would cause every write to a register to trigger a delta cycle, firing read methods unnecessarily and slowing simulation. Plain C++ uint32_t regs[32] is correct: it is internal state accessed only by processes within this module.

General principle: use sc_signal for things that participate in the event-driven mechanism (inter-module communication, traced values). Use plain C++ members for internal state.


SC_CTHREAD with Async Reset — The Full Pattern

SC_CTHREAD combined with async_reset_signal_is is the canonical SystemC idiom for synchronous sequential logic with asynchronous reset. Every line of the pattern carries precise meaning that maps directly to hardware.

The Declaration

SC_CTHREAD(write_proc, clk.pos());     // sensitive to rising clock edge only
async_reset_signal_is(rst, true);      // rst=true triggers asynchronous reset

SC_CTHREAD declares a clocked thread: a process that wakes up only on a specific clock edge. clk.pos() means the positive (rising) edge. The process is not sensitive to any other event — not to input data changes, not to other signals. Only the clock (or reset) can activate it.

async_reset_signal_is(rst, true) means: if rst goes high at any time — including mid-cycle, between clock edges — the scheduler immediately preempts the process and jumps to the reset section. The true argument is the active level of reset (active-high). async_reset_signal_is(rst, false) would specify an active-low reset.

The Process Body

void write_proc() {
    // ① RESET SECTION — runs when rst is asserted OR at simulation time 0
    for (int i = 0; i < 32; i++) regs[i] = 0;
    wait();   // ② synchronize to first clock edge after reset deasserts

    while (true) {  // ③ NORMAL OPERATION — one iteration per clock cycle
        if (wr_en.read() && rd_addr.read() != 0)
            regs[rd_addr.read()] = wr_data.read();
        wait();     // ④ wait for next clock edge
    }
}

Section ①: The reset block. Everything before the first wait() is the reset handler. It runs at simulation time 0 (before the first clock edge) and whenever rst is asserted asynchronously. When rst goes high at, say, 37.5 ns (the middle of a clock cycle), the simulator preempts write_proc at its current wait(), executes the reset loop immediately, and then suspends at wait() again. This is precisely the behavior of an asynchronous reset flip-flop — it does not wait for the next clock edge to clear its state.

Section ②: The synchronizing wait(). After reset completes, this wait() suspends until the next rising clock edge. This ensures the process is synchronized to the clock before entering the main loop. Without it, the first iteration of the while loop would run immediately at simulation time 0, before any clock edge has occurred.

Section ③: The normal operation loop. while (true) models the perpetual operation of a clocked circuit. Hardware is always on — there is no return from a circuit. The loop body models one clock cycle of work.

Section ④: The per-cycle wait(). At the end of each loop iteration, wait() suspends until the next rising clock edge. The scheduler advances time, and on the next posedge it resumes the process at the top of the while body. This wait() is what makes SC_CTHREAD a clocked thread — it advances one clock cycle per loop iteration.

The Exact SystemVerilog Equivalent

always_ff @(posedge clk or posedge rst) begin
    if (rst) begin
        // ① reset section
        for (int i = 0; i < 32; i++) regs[i] <= '0;
    end else begin
        // ③ normal operation
        if (wr_en && rd_addr != 0)
            regs[rd_addr] <= wr_data;
    end
    // ④ implicit — the always_ff block re-evaluates at the next trigger
end

The correspondence is exact in hardware terms, but the programmer's mental model differs fundamentally:

  • SystemC: sequential C++ code with wait() checkpoints. You read a linear narrative: "do reset, then loop forever doing one cycle of work per iteration."
  • SystemVerilog: a block that re-executes completely at every trigger event. You read a conditional: "on every posedge (or async rst), if reset then do A, else do B."

Both describe identical flip-flop behavior. The choice is stylistic and depends on whether you find sequential narrative or conditional re-evaluation more natural for your problem.

Why the Reset Section Runs at Time 0

At simulation time 0, before sc_start() begins advancing time, the SystemC scheduler runs initialization. Every SC_CTHREAD with async_reset_signal_is has its reset section execute once during this initialization phase. The reset signal does not need to be asserted — the process runs the reset section unconditionally at time 0 to ensure deterministic initial state.

This is equivalent to an FPGA's global set/reset (GSR) at power-on: all flip-flops initialize to their reset value before the first clock edge.


Internal State: uint32_t vs. sc_signal — Both Patterns

The question of whether to use plain C++ member variables or sc_signal for internal register state is worth examining carefully, because both are legitimate and the choice has observable consequences.

Pattern 1: Plain Array (Our Implementation)

// In reg_file module:
uint32_t regs[32];   // plain C++ array — internal state only

void write_proc() {
    // ...
    regs[rd_addr.read()] = wr_data.read();  // direct assignment
    // ...
}

void read_proc() {
    rs1_data.write(regs[rs1_addr.read()]);  // direct read
}

Characteristics:
- Zero overhead: no event notification, no delta-cycle scheduling
- Not visible in VCD traces — sc_trace cannot trace a plain array
- read_proc does NOT automatically fire when regs[x] changes — it only fires when the address inputs change
- Correct for synthesis semantics: internal flip-flops are not inter-module signals

Pattern 2: sc_signal Array (For Tracing)

// Alternative: use sc_signal for internal state to enable tracing
sc_signal<sc_uint<32>> regs[32];  // signal array

void write_proc() {
    // ...
    regs[rd_addr.read()].write(wr_data.read());  // signal write — triggers delta cycle
    // ...
}

void read_proc() {
    rs1_data.write(regs[rs1_addr.read()].read());
}

Characteristics:
- Each write to regs[x] triggers a delta-cycle event notification
- sc_trace(tf, regs[3], "regs_3") works — individual registers appear in VCD
- Higher simulation overhead: 32 signal objects with event notification queues
- read_proc still does not auto-fire on regs[x] change because regs is not in the sensitivity list

For debugging a register file implementation, Pattern 2 is useful: you can open GTKWave and watch individual register values change cycle by cycle. For production simulation of a large SoC where the register file is instantiated hundreds of times, Pattern 1 is preferred for performance.

The SV Analog

// SV internal state — private to the module, not a port
logic [31:0] regs [0:31];   // works like uint32_t regs[32]

// SV does not have a direct equivalent to sc_signal for internal state —
// all logic is implicitly "signallike" in SV for simulation purposes.
// But for synthesis: internal reg/logic is private unless exported via port.
// Same concept as SystemC: sc_signal adds event overhead that uint32_t avoids.

In SystemVerilog, every logic/reg participates in the simulator's event-driven mechanism automatically. There is no distinction between "internal state" and "port-connected signal" at the language level — all are part of the event graph. SystemC makes this distinction explicit: sc_signal opts in to event-driven semantics; plain C++ members opt out. For RTL modeling, this explicit control is a feature.


SC_METHOD for Combinational Read — The Sensitivity Caveat

The read_proc is a pure combinational SC_METHOD:

SC_METHOD(read_proc);
sensitive << rs1_addr << rs2_addr;

It fires when rs1_addr or rs2_addr changes. Critically, regs is not in the sensitivity list. This means:

After a write at clock posedge, read_proc does not automatically re-trigger unless an address changes.

In our single-cycle CPU, this is correct:
- At clock posedge (end of cycle N): write_proc writes the new result to regs[rd]
- Start of cycle N+1: the decoder decodes the new instruction, which changes rs1_addr and rs2_addr
- That change triggers read_proc, which reads the freshly-written value

The timing works naturally because the new instruction always decodes new (or possibly same) register addresses, and that decoding is what triggers the combinational read.

The same semantics appear in SystemVerilog:

// SV combinational read — equivalent behavior
assign rs1_data = regs[rs1_addr];
assign rs2_data = regs[rs2_addr];

This assign re-evaluates when rs1_addr or rs2_addr changes. It does NOT re-evaluate when regs[x] is written (because regs is a reg, not a wire). In simulation, if cycle N writes regs[3] = 42 and cycle N+1 reads with rs1_addr = 3 — the assign fires because rs1_addr changed, and correctly returns 42. If cycle N and N+1 both have rs1_addr = 3 but cycle N wrote a new value, the assign does NOT re-fire — but in a single-cycle CPU this situation cannot arise, because every clock cycle fetches a new instruction which changes the address lines.


SystemC Language Reference: Register File Constructs

Construct Syntax SV/Verilog Equivalent Key Difference
Clocked thread SC_CTHREAD(f, clk.pos()) always_ff @(posedge clk) SystemC is sequential C++; SV re-evaluates as a block
Async reset declaration async_reset_signal_is(rst, true) or posedge rst in sensitivity list SystemC declares separately from process; SV is inline
Reset section Code before first wait() in SC_CTHREAD if (rst) begin ... end SystemC runs reset at time 0; SV only on trigger
Per-cycle advance wait() at end of loop Implicit at block boundary SystemC explicit; SV implicit
Internal state uint32_t regs[32] logic [31:0] regs [0:31] SC_SIGNAL adds event overhead; plain C++ does not
Traceable internal state sc_signal<sc_uint<32>> regs[32] logic [31:0] regs [0:31] (always traceable in SV) SV has no equivalent opt-in; all signals are traceable
Combinational read SC_METHOD(f); sensitive << addr; assign out = mem[addr] Both re-trigger on address change, not on mem content change
x0 hardwire if (rd_addr.read() != 0) if (rd_addr != '0) Identical guard in both languages
Dual-port read Two SC_METHOD calls or one SC_METHOD with both addresses Two assign statements Identical hardware; both are combinational muxes

Translation Table: Classic Verilog and Modern SV

Concept Verilog SystemVerilog SystemC
Register storage reg [31:0] regs [0:31] logic [31:0] regs [0:31] uint32_t regs[32]
Sync write always @(posedge clk) always_ff @(posedge clk) SC_CTHREAD(f, clk.pos())
Async reset always @(posedge clk or posedge rst) always_ff @(posedge clk or posedge rst) async_reset_signal_is(rst, true)
Async read assign out = regs[addr] assign out = regs[addr] SC_METHOD; sensitive << addr
Reset all regs integer i; for(i=0;i<32;i=i+1) regs[i] = 0 for (int i=0;i<32;i++) regs[i] <= '0 for (int i=0;i<32;i++) regs[i]=0
x0 guard if (rd_addr != 0) if (rd_addr != '0) if (rd_addr.read() != 0)
Write-enable if (wr_en && rd_addr != 0) same same
Blocking vs. non-blocking regs[addr] = data (blocking in always) regs[addr] <= data (non-blocking in always_ff) regs[addr] = data (C++ assignment, synchronous by SC_CTHREAD timing)

The blocking vs. non-blocking distinction is worth examining. In SystemVerilog, always_ff uses <= (non-blocking assignment) to ensure correct synchronous behavior — all right-hand sides are evaluated before any left-hand sides are updated, preventing read-write order dependencies within the same always block. In SystemC SC_CTHREAD, the sequential C++ execution model plus wait() synchronization provides the same guarantee naturally: the entire body of the while loop executes, then wait() advances to the next clock edge. There is no SystemC equivalent of the non-blocking assignment bug (reading a value before the non-blocking assignment completes) because the C++ execution model is sequential within a process.


Module Block Diagram

graph LR
    subgraph reg_file["reg_file Module"]
        RF["32 x 32-bit\nRegisters\n(x0-x31)"]
    end
    clk["clk"] --> reg_file
    rst["rst"] --> reg_file
    rs1_addr["rs1_addr[4:0]"] --> reg_file
    rs2_addr["rs2_addr[4:0]"] --> reg_file
    rd_addr["rd_addr[4:0]"] --> reg_file
    wr_data["wr_data[31:0]"] --> reg_file
    wr_en["wr_en"] --> reg_file
    reg_file --> rs1_data["rs1_data[31:0]"]
    reg_file --> rs2_data["rs2_data[31:0]"]

Port naming follows the RISC-V instruction field convention:
- rs1 = register source 1 (first ALU operand)
- rs2 = register source 2 (second ALU operand or store data)
- rd = register destination (write-back target)


Complete Implementation

reg_file.h

// reg_file.h
// Register File — 32 x 32-bit storage for RV32I
// Synchronous write on posedge clk, asynchronous (combinational) read
//
// Post 8 — SystemC for RTL & DV Engineers
#pragma once
#include <systemc.h>

SC_MODULE(reg_file) {
    //------------------------------------------------------------------
    // Ports
    //------------------------------------------------------------------
    sc_in<bool>         clk;       // Clock — write port clocks on posedge
    sc_in<bool>         rst;       // Active-high synchronous reset

    // Read Port 1 — source register 1 (rs1)
    sc_in<sc_uint<5>>   rs1_addr;  // 5-bit address: which register to read
    sc_out<sc_uint<32>> rs1_data;  // 32-bit output: the value in that register

    // Read Port 2 — source register 2 (rs2)
    sc_in<sc_uint<5>>   rs2_addr;  // 5-bit address
    sc_out<sc_uint<32>> rs2_data;  // 32-bit output

    // Write Port — destination register (rd)
    sc_in<sc_uint<5>>   rd_addr;   // 5-bit address: which register to write
    sc_in<sc_uint<32>>  wr_data;   // 32-bit data to write
    sc_in<bool>         wr_en;     // Write enable: only write when asserted

    //------------------------------------------------------------------
    // Internal state
    // Plain C++ array — not sc_signal — because this state is only
    // accessed by processes within this module, not observable externally.
    // sc_signal is for inter-module communication; plain arrays are for
    // internal state.
    //------------------------------------------------------------------
    uint32_t regs[32];

    //------------------------------------------------------------------
    // Process declarations
    //------------------------------------------------------------------
    void read_proc();   // SC_METHOD: combinational read (asynchronous)
    void write_proc();  // SC_CTHREAD: clocked write (synchronous)

    //------------------------------------------------------------------
    // Constructor
    //------------------------------------------------------------------
    SC_CTOR(reg_file) {
        // read_proc: runs every time rs1_addr or rs2_addr changes.
        // No clock edge in the sensitivity list — this is combinational logic.
        SC_METHOD(read_proc);
        sensitive << rs1_addr << rs2_addr;

        // write_proc: runs on every rising clock edge.
        // async_reset_signal_is: rst=true causes immediate entry into the
        // reset block (the code before the first wait() in the SC_CTHREAD).
        SC_CTHREAD(write_proc, clk.pos());
        async_reset_signal_is(rst, true);
    }
};

reg_file.cpp

// reg_file.cpp
#include "reg_file.h"

//----------------------------------------------------------------------
// read_proc — Asynchronous (combinational) read
//
// This SC_METHOD fires whenever rs1_addr or rs2_addr changes value.
// It directly indexes the regs[] array and drives the output ports.
//
// Key design point: regs[] is NOT in the sensitivity list. If a write
// happens at posedge clk and changes regs[x], read_proc will NOT
// automatically re-run — it only re-runs when an address changes.
// This matches hardware behavior: in silicon, the combinational read path
// propagates the stored value to the output. When the cell updates at
// posedge clk, the output wire propagates the new value within gate delay.
//
// For the single-cycle CPU: reads happen at the start of a cycle,
// writes happen at the end (posedge). The written value is visible in the
// NEXT cycle when the address lines change for the new instruction.
//----------------------------------------------------------------------
void reg_file::read_proc() {
    // Read source register 1: index directly into the array.
    // regs[0] == 0 always (enforced by write_proc), so reading x0
    // correctly returns 0 without any special case here.
    rs1_data.write(regs[rs1_addr.read()]);

    // Read source register 2: same logic, independent port.
    rs2_data.write(regs[rs2_addr.read()]);
}

//----------------------------------------------------------------------
// write_proc — Synchronous write
//
// This SC_CTHREAD runs on every rising clock edge.
// The reset block (before wait()) runs when rst is asserted.
//
// Structure of every SC_CTHREAD with async reset:
//   1. Reset block: initialize state, then call wait() once.
//   2. Main loop: the normal per-cycle operation.
//----------------------------------------------------------------------
void reg_file::write_proc() {
    //------------------------------------------------------------------
    // Reset block — executes when rst is asserted (async_reset_signal_is)
    // Zero all 32 registers. This ensures a known, deterministic state
    // at startup and after any reset event.
    //
    // In real silicon, some implementations do NOT reset the register file
    // (it is expensive in area and power). For simulation correctness and
    // software-visible behavior, we always reset to zero.
    //------------------------------------------------------------------
    for (int i = 0; i < 32; i++) {
        regs[i] = 0;
    }
    wait();  // Suspend until the first rising clock edge after reset deasserts

    //------------------------------------------------------------------
    // Main operation loop — one iteration per clock cycle
    //------------------------------------------------------------------
    while (true) {
        // Two conditions must both be true before a write occurs:
        //
        // 1. wr_en.read(): the write enable signal, driven by the decoder.
        //    Only certain instructions write back to a register:
        //    ALU ops (ADD, SUB, AND, ...), LOAD, JAL, JALR, LUI, AUIPC.
        //    STORE and BRANCH instructions do NOT write a register.
        //
        // 2. rd_addr.read() != 0: the x0 hardwire constraint.
        //    Any attempt to write x0 is silently discarded.
        //    A correctly implemented decoder will never generate wr_en=1
        //    with rd_addr=0, but this check enforces the architectural
        //    invariant at the register-file level as defense-in-depth.
        //    In real silicon: the write-enable path for cell 0 is simply
        //    not connected; the cell is tied to ground.
        if (wr_en.read() && rd_addr.read() != 0) {
            regs[rd_addr.read()] = wr_data.read();
        }
        wait();  // Advance to the next rising clock edge
    }
}

Line-by-Line Explanation

SC_METHOD(read_proc) with sensitive << rs1_addr << rs2_addr

This tells the SystemC scheduler: "run read_proc immediately whenever either rs1_addr or rs2_addr changes value." No clock edge is in the sensitivity list. This is the SystemC way to model combinational logic.

Compare to SystemVerilog:

assign rs1_data = regs[rs1_addr];
assign rs2_data = regs[rs2_addr];

Both are combinational — they produce a value immediately when inputs change.

SC_CTHREAD(write_proc, clk.pos())

SC_CTHREAD is a clocked thread — a process that runs on a specific clock edge. clk.pos() means positive (rising) edge. This is the SystemC equivalent of always_ff @(posedge clk) in SystemVerilog.

async_reset_signal_is(rst, true) means: when rst goes high, immediately enter the reset block (the code before the first wait()).

for (int i = 0; i < 32; i++) regs[i] = 0; before wait()

This is the reset initializer. It runs once when rst is asserted, zeroing all 32 registers. After it completes, wait() suspends the process until the next clock edge, at which point the while(true) loop begins normal operation.

if (wr_en.read() && rd_addr.read() != 0)

The combined write guard:
- wr_en.read(): only instructions that produce a register result assert this
- rd_addr.read() != 0: the x0 hardwire — writing x0 is a no-op


Write-Before-Read Discussion

In a single-cycle CPU, the instruction execution order within one clock cycle is:

1. Fetch:   PC → Instruction Memory → instr[31:0]
2. Decode:  instr → decoder → control signals + register addresses
3. Read:    rs1_addr, rs2_addr → reg_file.read_proc() → rs1_data, rs2_data
4. Execute: rs1_data, rs2_data → ALU → alu_result
5. Memory:  alu_result → Data Memory (for LOAD/STORE)
6. WB:      result → reg_file.write_proc() at posedge clk

Steps 1–5 are combinational (modeled with SC_METHOD chains). Step 6 is clocked. Write from instruction N (end of cycle N) is not visible to instruction N's own read (step 3 of cycle N) — but IS visible to instruction N+1's read (step 3 of cycle N+1).

ASCII timing diagram:

clk:     ___---___---___
write:         ^ (cycle N writes rd=x3)
read:              (cycle N+1 reads x3: sees new value)

Full view across two cycles:

         ___     ___     ___
CLK  ___|   |___|   |___|   |___

      <--- Cycle N -----------> <--- Cycle N+1 -------->

      [Fetch][Decode][Read][Exe][WB]  [Fetch][Decode][Read][Exe][WB]
                                  ^                    ^
                                  |                    |
                            write x3=42           read x3 -> 42

The write of x3=42 at the end of cycle N is captured on the rising edge. Cycle N+1's combinational read of x3 immediately sees the new value 42. For pipelined CPUs (Post 18–21), instruction N+1 might be in the Read stage at the same cycle instruction N is in Write-Back — that is a data hazard and requires a forwarding network to resolve.


Simulation Semantics: How the Register File Executes

The interaction between write_proc (SC_CTHREAD) and read_proc (SC_METHOD) illustrates the SystemC event-driven scheduler in a non-trivial way.

At simulation time 0:
1. The simulator initializes. write_proc starts executing its reset section immediately (because async_reset_signal_is causes the reset section to run at time 0).
2. All 32 entries of regs[] are set to 0.
3. wait() suspends write_proc until the first rising clock edge.
4. Meanwhile, read_proc is triggered (SC_METHOD runs on initialization). It reads regs[rs1_addr] and regs[rs2_addr], both of which are 0. It writes 0 to rs1_data and rs2_data.

At the first rising clock edge (e.g., 5 ns):
1. write_proc resumes. It checks wr_en and rd_addr. If both conditions are met, it writes to regs[rd_addr].
2. wait() suspends write_proc until the next posedge.
3. Key point: Writing to regs[x] (a plain C++ array) does NOT trigger any SystemC event. read_proc does NOT automatically fire because of this write.

At a subsequent time (e.g., 6 ns):
4. The testbench changes rs1_addr. This triggers read_proc (it is sensitive to rs1_addr).
5. read_proc reads regs[rs1_addr.read()] — which now returns the freshly-written value.

This ordering is exactly correct for a synchronous register file: write on the clock edge, read the new value after the address changes. The fact that read_proc does not fire on register content changes (because regs is plain C++) is a feature, not a limitation.


Complete Testbench

tb_reg_file.cpp

// tb_reg_file.cpp
// Testbench for reg_file module
//
// Tests:
//   1. Write all 32 registers with value (i*7+1), read back and verify
//   2. x0 hardwire: write 0xDEADBEEF to x0, read back must return 0
//   3. Simultaneous read of x5 and x10 via both ports
//   4. Overwrite x3, verify new value
//   5. Reset assertion clears all registers
//   6. Write enable gating: wr_en=0 must not write
//   7. VCD trace to reg_file.vcd
//
// Build: see CMakeLists.txt
// Run:   ./tb_reg_file
// VCD:   reg_file.vcd  (open in GTKWave)

#include <systemc.h>
#include "reg_file.h"

//======================================================================
// Testbench module
//======================================================================
SC_MODULE(tb_reg_file) {
    //------------------------------------------------------------------
    // Signals connecting testbench to DUT
    //------------------------------------------------------------------
    sc_signal<bool>         clk;
    sc_signal<bool>         rst;
    sc_signal<sc_uint<5>>   rs1_addr;
    sc_signal<sc_uint<5>>   rs2_addr;
    sc_signal<sc_uint<5>>   rd_addr;
    sc_signal<sc_uint<32>>  wr_data;
    sc_signal<bool>         wr_en;
    sc_signal<sc_uint<32>>  rs1_data;
    sc_signal<sc_uint<32>>  rs2_data;

    //------------------------------------------------------------------
    // DUT instance
    //------------------------------------------------------------------
    reg_file* dut;

    //------------------------------------------------------------------
    // Tracking
    //------------------------------------------------------------------
    int tests_passed;
    int tests_failed;

    //------------------------------------------------------------------
    // Helper: drive one write to the register file
    // Drives the signals and waits for one rising edge.
    //------------------------------------------------------------------
    void do_write(uint32_t addr, uint32_t data) {
        rd_addr.write(addr);
        wr_data.write(data);
        wr_en.write(true);
        wait(clk.posedge_event());
        wr_en.write(false);
    }

    //------------------------------------------------------------------
    // Helper: read from port 1 (rs1). Read is combinational —
    // change address and wait a delta for propagation.
    //------------------------------------------------------------------
    uint32_t do_read1(uint32_t addr) {
        rs1_addr.write(addr);
        wait(1, SC_NS);
        return rs1_data.read().to_uint();
    }

    uint32_t do_read2(uint32_t addr) {
        rs2_addr.write(addr);
        wait(1, SC_NS);
        return rs2_data.read().to_uint();
    }

    //------------------------------------------------------------------
    // Helper: check and report
    //------------------------------------------------------------------
    void check(const std::string& test_name,
               uint32_t actual, uint32_t expected) {
        if (actual == expected) {
            std::cout << "  PASS  " << test_name
                      << " — got 0x" << std::hex << actual << std::dec
                      << std::endl;
            tests_passed++;
        } else {
            std::cout << "  FAIL  " << test_name
                      << " — expected 0x" << std::hex << expected
                      << ", got 0x" << actual << std::dec
                      << std::endl;
            tests_failed++;
        }
    }

    //------------------------------------------------------------------
    // Clock generator — 10 ns period (100 MHz)
    //------------------------------------------------------------------
    void clock_gen() {
        clk.write(false);
        while (true) {
            wait(5, SC_NS);
            clk.write(!clk.read());
        }
    }

    //------------------------------------------------------------------
    // Main test process
    //------------------------------------------------------------------
    void run_tests() {
        tests_passed = 0;
        tests_failed = 0;

        std::cout << "\n========================================" << std::endl;
        std::cout << "  reg_file Testbench" << std::endl;
        std::cout << "========================================\n" << std::endl;

        //--------------------------------------------------------------
        // Apply reset
        //--------------------------------------------------------------
        rst.write(true);
        wr_en.write(false);
        rs1_addr.write(0);
        rs2_addr.write(0);
        rd_addr.write(0);
        wr_data.write(0);

        wait(clk.posedge_event());
        wait(clk.posedge_event());
        rst.write(false);
        wait(1, SC_NS);

        //==============================================================
        // TEST 1: Write all 32 registers with value (i*7+1) & 0xFFFFFFFF
        //         and read back to verify
        //==============================================================
        std::cout << "TEST 1: Write and read back all 32 registers" << std::endl;

        for (int i = 0; i < 32; i++) {
            uint32_t val = (uint32_t)((i * 7 + 1) & 0xFFFFFFFF);
            do_write(i, val);
        }

        for (int i = 0; i < 32; i++) {
            uint32_t expected = (i == 0) ? 0 : (uint32_t)((i * 7 + 1) & 0xFFFFFFFF);
            uint32_t actual   = do_read1(i);
            std::string name  = "reg[" + std::to_string(i) + "]";
            check(name, actual, expected);
        }

        //==============================================================
        // TEST 2: x0 hardwire invariant
        //         Write 0xDEADBEEF to x0, must always read back as 0
        //==============================================================
        std::cout << "\nTEST 2: x0 hardwire — write 0xDEADBEEF, must read 0" << std::endl;

        do_write(0, 0xDEADBEEF);
        check("x0 after 0xDEADBEEF write", do_read1(0), 0);

        do_write(0, 0xFFFFFFFF);
        check("x0 after 0xFFFFFFFF write", do_read1(0), 0);

        do_write(0, 1);
        check("x0 after write 1", do_read1(0), 0);

        //==============================================================
        // TEST 3: Simultaneous read of x5 and x10 on both ports
        //==============================================================
        std::cout << "\nTEST 3: Simultaneous dual-port read (x5 and x10)" << std::endl;

        do_write(5,  0xAAAAAAAA);
        do_write(10, 0x55555555);

        rs1_addr.write(5);
        rs2_addr.write(10);
        wait(1, SC_NS);

        check("dual read: rs1 port x5",  rs1_data.read().to_uint(), 0xAAAAAAAA);
        check("dual read: rs2 port x10", rs2_data.read().to_uint(), 0x55555555);

        // Same register on both ports
        rs1_addr.write(5);
        rs2_addr.write(5);
        wait(1, SC_NS);
        check("same reg both ports: rs1", rs1_data.read().to_uint(), 0xAAAAAAAA);
        check("same reg both ports: rs2", rs2_data.read().to_uint(), 0xAAAAAAAA);

        //==============================================================
        // TEST 4: Overwrite x3, verify new value
        //==============================================================
        std::cout << "\nTEST 4: Overwrite x3" << std::endl;

        uint32_t initial = do_read1(3);
        check("x3 initial (from TEST 1)", initial, (uint32_t)(3 * 7 + 1));

        do_write(3, 0xCAFEBABE);
        check("x3 after overwrite", do_read1(3), 0xCAFEBABE);

        do_write(3, 0x12345678);
        check("x3 after second overwrite", do_read1(3), 0x12345678);

        //==============================================================
        // TEST 5: Assert rst, verify all registers return 0
        //==============================================================
        std::cout << "\nTEST 5: Reset clears all registers" << std::endl;

        rst.write(true);
        wait(clk.posedge_event());
        wait(clk.posedge_event());
        rst.write(false);
        wait(1, SC_NS);

        for (int i = 0; i < 32; i++) {
            uint32_t val   = do_read1(i);
            std::string nm = "post-reset reg[" + std::to_string(i) + "]";
            check(nm, val, 0);
        }

        //==============================================================
        // TEST 6: Write enable gating — wr_en=0 must not write
        //==============================================================
        std::cout << "\nTEST 6: Write enable gating" << std::endl;

        rd_addr.write(7);
        wr_data.write(0xDEADC0DE);
        wr_en.write(false);
        wait(clk.posedge_event());
        wait(1, SC_NS);

        check("reg[7] unchanged (wr_en=0)", do_read1(7), 0);

        rd_addr.write(7);
        wr_data.write(0xDEADC0DE);
        wr_en.write(true);
        wait(clk.posedge_event());
        wr_en.write(false);
        wait(1, SC_NS);

        check("reg[7] written (wr_en=1)", do_read1(7), 0xDEADC0DE);

        //==============================================================
        // SUMMARY
        //==============================================================
        std::cout << "\n========================================" << std::endl;
        std::cout << "  Results: " << tests_passed << " passed, "
                  << tests_failed << " failed" << std::endl;
        std::cout << "========================================\n" << std::endl;

        if (tests_failed == 0)
            std::cout << "ALL TESTS PASSED" << std::endl;
        else
            std::cout << "FAILURES DETECTED — review output above" << std::endl;

        sc_stop();
    }

    SC_CTOR(tb_reg_file) {
        tests_passed = 0;
        tests_failed = 0;

        // Instantiate DUT
        dut = new reg_file("reg_file_dut");
        dut->clk(clk);
        dut->rst(rst);
        dut->rs1_addr(rs1_addr);
        dut->rs2_addr(rs2_addr);
        dut->rd_addr(rd_addr);
        dut->wr_data(wr_data);
        dut->wr_en(wr_en);
        dut->rs1_data(rs1_data);
        dut->rs2_data(rs2_data);

        SC_THREAD(clock_gen);
        SC_THREAD(run_tests);
    }

    ~tb_reg_file() {
        delete dut;
    }
};

//======================================================================
// sc_main — simulation entry point
//======================================================================
int sc_main(int argc, char* argv[]) {
    // VCD trace — record all signals to a waveform file
    sc_trace_file* tf = sc_create_vcd_trace_file("reg_file");
    tf->set_time_unit(1, SC_NS);

    // Instantiate testbench
    tb_reg_file tb("tb");

    // Register signals for tracing
    sc_trace(tf, tb.clk,      "clk");
    sc_trace(tf, tb.rst,      "rst");
    sc_trace(tf, tb.rs1_addr, "rs1_addr");
    sc_trace(tf, tb.rs2_addr, "rs2_addr");
    sc_trace(tf, tb.rd_addr,  "rd_addr");
    sc_trace(tf, tb.wr_data,  "wr_data");
    sc_trace(tf, tb.wr_en,    "wr_en");
    sc_trace(tf, tb.rs1_data, "rs1_data");
    sc_trace(tf, tb.rs2_data, "rs2_data");

    // Run simulation
    sc_start();

    sc_close_vcd_trace_file(tf);

    std::cout << "\nVCD trace written to reg_file.vcd" << std::endl;
    std::cout << "Open with: gtkwave reg_file.vcd" << std::endl;

    return 0;
}

Common Pitfalls for SV Engineers

1. SC_CTHREAD's wait() always waits for the registered clock edge — not just any event.
In SystemVerilog, @(posedge clk) in an always_ff block suspends until the specific posedge. In SystemC SC_CTHREAD, wait() without arguments similarly waits for the registered clock edge (clk.pos() in our case). This is different from a plain SC_THREAD where wait(e) waits for the specific event e. If you accidentally write SC_THREAD(write_proc) instead of SC_CTHREAD(write_proc, clk.pos()), the wait() call inside becomes ambiguous — it will wait for a default event, not the clock edge. The simulation will not behave as expected, and the error message may not be obvious.

2. The reset section before the first wait() runs at simulation time 0 — not at a clock edge.
This surprises engineers coming from SystemVerilog where always_ff @(posedge clk or posedge rst) does not execute until the first trigger event. In SystemC, the SC_CTHREAD reset section runs immediately at elaboration time (time 0) as part of the async_reset_signal_is mechanism. This is the correct behavior for deterministic initialization — it is equivalent to FPGA power-on reset — but it means you should never put testbench stimulus in the reset section, as it will execute before any clock edges.

3. async_reset_signal_is makes reset truly asynchronous — mid-cycle assertion immediately preempts.
If rst goes high at 37.5 ns (in the middle of a 10 ns clock cycle), write_proc is interrupted immediately at its current wait() and jumps to the reset section. This is precisely the or posedge rst in SystemVerilog always_ff. By contrast, if you omit async_reset_signal_is and instead check if (rst.read()) inside the while loop, you get a synchronous reset — it only takes effect at the next clock edge. Choosing the wrong reset style for your target hardware is a subtle functional mismatch that only manifests when reset overlaps with a clock edge.

4. Writing to regs[0] in the reset loop IS correct — x0's invariant is enforced by the write guard, not by making the array entry const.
A common misconception: since x0 is hardwired zero, surely regs[0] should be const or declared differently. No — the array element is writable from inside the reset loop (which correctly sets it to 0). The architectural invariant "x0 always reads 0" is enforced by the write guard if (rd_addr.read() != 0) in the normal operation path. The reset section sets regs[0] = 0, which is harmless and consistent. If you try to enforce x0 immutability by making it const, you cannot zero-initialize it in the reset loop.

5. Missing async_reset_signal_is means synchronous reset only — reset does not interrupt a mid-cycle operation.
In SystemVerilog, writing always_ff @(posedge clk) without or posedge rst gives you a synchronous reset — the if/else reset check only runs at posedge clk. In SystemC, omitting async_reset_signal_is gives the same behavior: write_proc will not jump to its reset section until the next clock edge, even if rst is asserted mid-cycle. For a register file in a processor reset sequence, asynchronous reset is almost always the correct choice — you want the registers cleared immediately when reset is asserted, not "on the next convenient clock edge."


DV Insight

DV Insight Register file coverage should track: (1) each of 32 registers written (32 bins), (2) each as rs1 source (32 bins), (3) each as rs2 source (32 bins), (4) attempted write to x0 (1 bin — should never change x0), (5) simultaneous write and read of same register number (1 bin — tests read-after-write). Items 4 and 5 catch silent bugs: a decoder bug might write x0, and write-then-read same register can expose register file timing issues.

The x0 attempted write bin is especially important. If the decoder has a bug and generates wr_en=1 with rd_addr=0 for some instruction, the register file silently discards it — behavior is accidentally correct. The test suite may never catch the decoder bug because the output looks fine. Tracking attempted x0 writes forces the testbench to construct that scenario and verify it at the decoder level.

The write-read hazard bin (writing and reading register N in the same cycle) distinguishes a "write-first" register file from a "read-first" one — a distinction that matters enormously for pipeline correctness.

SystemVerilog UVM coverage model:

covergroup reg_file_cg @(posedge clk);
    // Which registers are written
    cp_rd_addr: coverpoint rd_addr iff (wr_en) {
        bins all_regs[] = {[0:31]};
    }

    // Which registers are read on port 1
    cp_rs1_addr: coverpoint rs1_addr {
        bins all_regs[] = {[0:31]};
    }

    // Which registers are read on port 2
    cp_rs2_addr: coverpoint rs2_addr {
        bins all_regs[] = {[0:31]};
    }

    // Attempted write to x0
    cp_x0_write: coverpoint (wr_en && rd_addr == 0) {
        bins attempted = {1'b1};
    }

    // Write-read hazard: writing and reading same register
    cp_wr_rd_hazard: coverpoint (wr_en && rd_addr != 0 &&
                                  (rd_addr == rs1_addr || rd_addr == rs2_addr)) {
        bins hazard = {1'b1};
    }
endgroup

Industry Context

SiFive FU540-C000 (the HiFive Unmatched board): 32-entry register file synthesized to approximately 2 KB of flip-flops. The register file sits on the critical timing path — read latency from rs1_addr to rs1_data contributes directly to the minimum achievable clock period.

ARM Cortex-M4: 16-entry register file. r13=SP and r14=LR are special-purpose at the hardware level — they have dedicated decode paths. In RISC-V, x2=sp is just a software convention; the hardware treats x2 no differently from x5. This design choice makes the RISC-V register file uniformly simple.

Shakti C-class (IIT Madras): An open-source RV32I implementation from academia. The Shakti C-class register file is a direct structural analog of what we are building — 32 entries, combinational read, synchronous write, x0 hardwired to zero. Comparing our SystemC model to the Shakti Verilog source is a useful exercise for confirming that the SystemC abstraction is faithful.

Server-class CPUs at 7nm: Out-of-order processors maintain a physical register file much larger than 32 entries (AMD Zen 4 has ~192 physical integer registers). A 64-entry 64-bit physical register file fits in approximately 0.01 mm2 at TSMC 7nm. Register files use custom 8T or 10T SRAM bitcells (vs 6T for cache) because multi-porting requires additional read wordlines and bitlines per cell.


CMakeLists.txt

# CMakeLists.txt — Post 8: Register File
cmake_minimum_required(VERSION 3.15)
project(post08_register_file CXX)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# SystemC installation — adjust SYSTEMC_HOME to your installation path.
# Typical locations:
#   Linux:  /usr/local/systemc-2.3.4
#   macOS:  /opt/systemc or ~/systemc-2.3.4
if(NOT DEFINED SYSTEMC_HOME)
    set(SYSTEMC_HOME $ENV{SYSTEMC_HOME})
endif()

if(NOT SYSTEMC_HOME)
    message(FATAL_ERROR
        "SYSTEMC_HOME not set. "
        "Pass -DSYSTEMC_HOME=/path/to/systemc or set the environment variable.")
endif()

find_library(SYSTEMC_LIB
    NAMES systemc
    PATHS "${SYSTEMC_HOME}/lib-linux64"
          "${SYSTEMC_HOME}/lib-macosx64"
          "${SYSTEMC_HOME}/lib"
    REQUIRED
)

include_directories("${SYSTEMC_HOME}/include")

# Build the testbench executable (links reg_file.cpp + tb_reg_file.cpp)
add_executable(tb_reg_file
    reg_file.cpp
    tb_reg_file.cpp
)

target_link_libraries(tb_reg_file ${SYSTEMC_LIB})

# Convenience: run the simulation
add_custom_target(run
    COMMAND ./tb_reg_file
    DEPENDS tb_reg_file
    COMMENT "Running register file testbench"
)

# Convenience: open GTKWave
add_custom_target(waves
    COMMAND gtkwave reg_file.vcd &
    DEPENDS run
    COMMENT "Opening GTKWave"
)

Build and run:

mkdir build && cd build
cmake .. -DSYSTEMC_HOME=/usr/local/systemc-2.3.4
make
./tb_reg_file
gtkwave reg_file.vcd

SystemC Language Reference

Construct Syntax SV/Verilog Equivalent Key Difference
Clocked thread SC_CTHREAD(write_proc, clk.pos()) always_ff @(posedge clk) SystemC is sequential C++ with wait() checkpoints; SV re-evaluates the block at each trigger
Async reset declaration async_reset_signal_is(rst, true) or posedge rst in the sensitivity list SystemC declares reset separately from the process; SV lists it inline in the always trigger
Wait for clock edge wait() (inside SC_CTHREAD) Implicit — the always_ff block re-evaluates at every trigger SystemC requires explicit wait() to advance one clock cycle; SV has no equivalent call
Combinational read process SC_METHOD(read_proc); sensitive << rs1_addr << rs2_addr; assign rs1_data = regs[rs1_addr] Both re-trigger on address change, not on array content change
5-bit register address sc_uint<5> logic [4:0] Same semantics; SystemC template, SV range
Internal 32-bit state uint32_t regs[32] logic [31:0] regs [0:31] Plain C++ opts out of the delta-cycle event mechanism; SV logic always participates
Traceable internal state sc_signal<sc_uint<32>> regs[32] logic [31:0] regs [0:31] (always dumpable) SV internal state is always traceable with $dumpvars; SystemC requires explicit sc_trace opt-in
VCD trace sc_trace(tf, sig, "name") $dumpvars(0, module) SystemC traces individual named signals; SV dumps an entire scope
x0 hardwire if (rd_addr.read() != 0) guard in write path if (rd_addr != '0) Identical guard in both languages; neither makes the array element const

SC_CTHREAD with Async Reset — The Full Pattern

SC_CTHREAD combined with async_reset_signal_is is the canonical SystemC idiom for synchronous sequential logic with asynchronous reset. Every line of the pattern carries precise hardware meaning.

Step-by-step declaration:

SC_CTHREAD(write_proc, clk.pos());    // ① register to posedge clock
async_reset_signal_is(rst, true);     // ② rst=true → asynchronous reset

Step ① declares a clocked thread: a process that wakes up only on the registered clock edge. No other signal can activate it — not data inputs, not other signals. Only the clock (or reset) triggers it.

Step ② means: if rst goes high at any time — including mid-cycle, between clock edges — the scheduler immediately preempts the process and jumps to the reset section. The true argument specifies active-high reset. async_reset_signal_is(rst, false) specifies active-low.

Inside the process — three sections:

void write_proc() {
    // RESET HANDLER — runs at time 0 AND whenever rst asserts
    for (int i = 0; i < 32; i++) regs[i] = 0;
    wait();   // ← synchronize to first clock edge after reset releases

    while (true) {  // NORMAL OPERATION — one iteration per clock cycle
        if (wr_en.read() && rd_addr.read() != 0)
            regs[rd_addr.read()] = wr_data.read();
        wait();     // ← wait for next clock edge
    }
}

The code before the first wait() is the reset handler. It runs at simulation time 0 (before the first clock edge) and whenever rst is asserted asynchronously. The first wait() synchronizes to the first clock edge after reset releases. The while(true) loop models the perpetual operation of a clocked circuit — hardware is always on, there is no return.

Side-by-side with SystemVerilog always_ff:

Action SystemVerilog SystemC
Reset condition if (rst) at top of always_ff Code before first wait() in SC_CTHREAD
Sync to clock after reset Implicit — block re-evaluates at next trigger after rst deasserts Explicit wait() call
Normal operation else begin ... end while(true) { ...; wait(); }
Async reset trigger or posedge rst in sensitivity list async_reset_signal_is(rst, true)
Per-cycle advance Implicit at block boundary Explicit wait() at end of loop body

Both describe identical flip-flop behavior. The programmer's mental model differs: SystemC is a sequential narrative ("do reset, then loop forever"), SystemVerilog is a conditional re-evaluation ("on every trigger, if reset then A, else B").


Why uint32_t Instead of sc_signal<sc_uint<32>>

Internal registers do not need sc_signal. Three reasons:

1. No external observer. No other module reads individual regs[x] entries directly. They are private state. External modules only see rs1_data and rs2_data through ports. sc_signal exists to participate in the inter-module event mechanism — unnecessary here.

2. No delta-cycle protection needed. sc_signal uses the evaluate-update two-value protocol to prevent processes from reading values mid-update. The register array updates synchronously at posedge clk — only one process (write_proc) writes it, and the read process (read_proc) does not need delta-cycle isolation from writes because they are separated in simulation time by the clock edge itself.

3. Simulation performance. Each sc_signal write triggers a delta-cycle event notification, potentially waking dependent processes. Thirty-two signal objects with event queues for internal state that never needs broadcasting adds overhead without benefit.

Both approaches side by side:

// Approach A: uint32_t — no tracing, minimal overhead (production models)
uint32_t regs[32];
// Assignment: regs[addr] = data;  (direct C++ write, no event)

// Approach B: sc_signal — traceable, slight overhead (debug builds)
sc_signal<sc_uint<32>> regs[32];
// Assignment: regs[addr].write(data);  (triggers delta-cycle event)
// Tracing:    sc_trace(tf, regs[i], name);  (each register appears in VCD)

In professional environments, Approach B is used when debugging a new design — you open GTKWave and watch individual register values change cycle by cycle. Approach A is preferred for production models of large SoCs where the register file is instantiated hundreds of times.

The SystemVerilog comparison:

// SV: internal state — private to the module, not a port
logic [31:0] regs [0:31];
// In SV: always dumpable with $dumpvars — no opt-in required

In SystemVerilog, every logic/reg participates in the simulator's event-driven mechanism and is dumpable automatically. SystemC makes this explicit: sc_signal opts in; plain C++ members opt out. For RTL modeling where you want explicit control over simulation overhead, SystemC's distinction is a feature.


Common Pitfalls for SV Engineers — Extended

The earlier pitfalls section covered five common mistakes. Five additional pitfalls specific to the register file pattern:

Pitfall 6: Missing async_reset_signal_is = synchronous reset behavior.
Without async_reset_signal_is, the process only enters its reset section at the next clock posedge — not immediately when rst is asserted. This is fundamentally different from always_ff @(posedge clk or posedge rst). If your target hardware uses asynchronous reset (most ASIC flows do), omitting this declaration means the simulation model does not match the synthesized hardware's reset behavior. The mismatch only manifests when reset is asserted mid-cycle — a common scenario in power-on and hardware reset sequencing.

Pitfall 7: Forgetting the initial wait() in SC_CTHREAD.
The reset-handler code runs at simulation time 0, before any clock edge has occurred. Without the wait() that separates the reset section from the main loop, the first iteration of while(true) also runs at time 0 — before reset has released and before the first clock edge. This produces incorrect behavior where "normal operation" code runs at elaboration time. Always end the reset section with a wait() to synchronize to the first clock edge.

Pitfall 8: SC_METHOD for read_proc will not re-fire when regs[] changes.
Since regs is uint32_t (not sc_signal), writing to regs[x] does not generate a SystemC event. read_proc only fires when rs1_addr or rs2_addr changes. This is correct for hardware behavior — in silicon, the combinational read path propagates a new stored value to the output wire automatically when the cell updates at posedge clk. In simulation, this is fine for a single-cycle CPU because every new instruction decodes new (or possibly same) register addresses, and that address change is what triggers read_proc. If you construct a test that writes a register and reads it without changing the address, read_proc will not re-fire — use sc_start(SC_ZERO_TIME) or change the address to force propagation.

Pitfall 9: Writing to x0 in the reset loop is correct — the invariant lives in the write guard.
A common misconception: since x0 must always be zero, should regs[0] be const or declared differently? No. The reset loop correctly sets regs[0] = 0. The architectural invariant is enforced by the write guard if (rd_addr.read() != 0) in the normal operation path. If you try to make regs[0] const to enforce the invariant, you cannot zero-initialize it in the reset loop. The defense-in-depth is correct: reset initializes it to zero; the write guard prevents subsequent corruption.

Pitfall 10: SC_METHOD sensitivity list must include BOTH rs1_addr AND rs2_addr.
If you write sensitive << rs1_addr and omit rs2_addr, port 2 (rs2_data) will never update in response to address changes on that port. The output will appear stuck at whatever value it had when rs1_addr last changed. SystemC provides no warning for incomplete sensitivity lists. SystemVerilog's always_comb would auto-detect both addresses; SystemC's SC_METHOD requires explicit listing of every input that should trigger re-evaluation.


What's Next

Post 9 — Instruction Decoder: The decoder receives a 32-bit instruction word and produces all the control signals the datapath needs: which ALU operation to perform, whether to read from memory, whether to write back a result, whether to branch, and what the immediate value is. The decoder is purely combinational — one large SC_METHOD with a switch on the opcode field.

The register file you just built will connect to the decoder outputs (rs1_addr, rs2_addr, rd_addr) in Post 12 when we assemble the complete single-cycle CPU datapath.

Author
Mayur Kubavat
VLSI Design and Verification Engineer sharing knowledge about SystemVerilog, UVM, and hardware verification methodologies.

Comments (0)

Leave a Comment