Architecture & Design Verification: 5. SystemC Tutorial

Introduction

Every processor, from an embedded microcontroller to a datacenter CPU, has an Arithmetic-Logic Unit at its core. The ALU is the block that actually does the work — it adds, subtracts, compares, shifts, and performs bitwise operations on the data values that the program manipulates. Everything else in the CPU — the fetch unit, the decoder, the register file, the pipeline — exists to feed the ALU the right operands and the right opcode at the right time, and then route its result to the right destination.

The SiFive FU740, the chip inside the HiFive Unmatched development board, implements RV64GC — a 64-bit RISC-V core with multiply/divide and compressed instruction extensions. Its main ALU handles all RV64I integer operations for up to four hardware threads simultaneously. The Western Digital SweRV EH1, an in-order dual-issue RISC-V core used inside Western Digital NVMe SSDs, implements a deeply pipelined ALU to sustain one operation per clock at high frequency — splitting the ALU into multiple pipeline stages rather than computing the full result in one cycle. The T-HEAD C906, used in the Allwinner D1 chip that powers many low-cost RISC-V Linux boards, implements a similar RV64GC ALU. In every case, the fundamental operations are the same — the differences are in how many operations execute per cycle, how many pipeline stages the computation is split across, and how hazards are handled.

We are building RV32I — the 32-bit base integer instruction set. This is the smallest, cleanest version of the RISC-V ISA: 10 ALU operations, 32-bit operands, no multiply/divide. It is also the version that every RISC-V tutorial and textbook starts with, because the instruction encodings are clean, the operations are simple, and the full pipeline is buildable in reasonable time. The ALU we write here will be wired directly into the Execute stage of our single-cycle CPU (Post 18), driven by the decoded control signals from the Decode stage (Post 9), and fed operands from the Register File (Post 8) and the Forwarding Unit (Post 20).

This is also the first post where sc_uint<32> vs sc_int<32> matters for correctness. RISC-V defines some operations as signed (SLT, SRA) and some as unsigned (SLTU, SRL). Using the wrong type produces subtly wrong results for negative numbers and large unsigned values — results that look plausible but are wrong. The test cases in this post are designed to catch exactly those errors.

Prerequisites

Completed Post 4 — SC_METHOD, sensitivity lists, why combinational logic uses SC_METHOD
Post 4 — SC_METHOD vs SC_THREAD
Code for this post: GitHub — section1/post05

SystemC Language Reference

Quick-reference for the integer type constructs used in this post:

Construct	Syntax	SV Equivalent	Key Difference
Unsigned N-bit port	`sc_in<sc_uint<N>> port;`	`input logic [N-1:0] port`	Template parameter is width; no `[N-1:0]` syntax
Signed N-bit port	`sc_in<sc_int<N>> port;`	`input logic signed [N-1:0] port`	Two's complement by construction
Bit vector (no arith)	`sc_bv<N>`	`logic [N-1:0]`	Cannot add/subtract — compile error
4-state logic	`sc_lv<N>`	`logic [N-1:0]`	Supports X/Z; arithmetic ops undefined for X/Z
Bit select	`val[5]`	`val[5]`	Returns `sc_logic` not `bool`; use `.bit(5)` for bool
Part select (read)	`val.range(7, 4)`	`val[7:4]`	Arguments: `(high, low)` — same order as SV
Part select (write)	`val.range(15, 8) = 0xFF`	`val[15:8] = 8'hFF`	Returns a proxy that can be assigned
Signed cast	`(sc_int<32>)val`	`$signed(val)`	Reinterprets bit pattern; no bits change
Unsigned cast	`(sc_uint<32>)val`	`$unsigned(val)`	Reinterprets bit pattern; no bits change
Convert to C++ int	`val.to_uint()`	`int'(val)`	Needed for printf/cout in some SystemC versions
Convert to string	`val.to_string()`	`$sformatf("%b", val)`	Returns binary representation by default
Logical right shift	`sc_uint >> n`	`>> n` on `logic [N-1:0]`	Zero-fills from MSB
Arithmetic right shift	`sc_int >> n`	`>>> n` on signed	Sign-extends from MSB
Overflow behavior	Silent wrapping (mod 2^N)	Silent wrapping (mod 2^N)	Both mirror hardware; no exceptions

SystemC Integer Types — The Full Type System

Before writing the ALU, you must understand the four integer types SystemC provides and when to use each. Getting this wrong is the most common source of simulation-passes-but-hardware-is-wrong bugs in SystemC RTL.

The Four Types

sc_uint<N>  — unsigned integer, N bits, hardware semantics
sc_int<N>   — signed integer, N bits, two's complement
sc_bv<N>    — bit vector, N bits, no arithmetic (use for packed structs)
sc_lv<N>    — 4-valued logic vector (0/1/X/Z), like SV logic [N-1:0]

Side-by-side comparison across all three worlds:

Use Case	Classic Verilog	SystemVerilog	SystemC
Unsigned N-bit register	`reg [N-1:0] r`	`logic [N-1:0] r`	`sc_uint<N> r`
Signed N-bit register	`reg signed [N-1:0] r`	`logic signed [N-1:0] r`	`sc_int<N> r`
4-state logic (X/Z)	`wire [N-1:0] w`	`logic [N-1:0] w`	`sc_lv<N> w`
Packed struct / no arith	`reg [N-1:0] r` (no arith intent)	`logic [N-1:0] r`	`sc_bv<N> r`
Plain 32-bit integer	—	`int`	`int` (C++ int)
Unsigned 32-bit	—	`int unsigned`	`sc_uint<32>` or `uint32_t`
Signed 64-bit	—	`longint`	`sc_int<64>` or `int64_t`

Choosing between sc_bv and sc_uint:

Use sc_bv<N> when the value is a bit pattern you will pack/unpack but never do arithmetic on — for example, an instruction encoding, a CRC register, or a packed status register. Use sc_uint<N> when you need arithmetic (add, subtract, compare, shift). The compiler enforces this: adding two sc_bv<N> values does not compile.

sc_bv<32> instruction = 0x00A50513;   // ADDI x10, x10, 10 — instruction word
sc_uint<7> opcode = instruction.range(6, 0).to_uint();  // extract opcode field
sc_uint<5> rd     = instruction.range(11, 7).to_uint(); // extract rd field

// This would NOT compile:
// sc_bv<32> a, b;
// sc_bv<32> sum = a + b;  ← compiler error: + not defined for sc_bv

4-state logic with sc_lv<N>:

For bus-functional models and interface verification, you need X and Z states — for example, to model a tri-state bus or to detect undriven signals. sc_lv<N> holds 0, 1, X, or Z per bit. Arithmetic operations are undefined for X/Z values (they produce X, just as in SV). For RTL simulation where X/Z are not needed, sc_uint<N> is faster and preferred.

Bit Selection and Part Selection

SystemC's bit/part-select syntax differs from Verilog but maps to the same hardware concept:

sc_uint<32> x = 0xABCD1234;

// Bit select — single bit extraction
bool bit5    = x[5];              // SV: x[5]
sc_logic bit5_logic = x[5];      // returns sc_logic type
bool bit5_bool = x.bit(5);       // .bit() returns bool directly

// Part select — contiguous field extraction
sc_uint<4>  nibble = x.range(7, 4);   // SV: x[7:4]  — note: (high, low)
sc_uint<8>  byte2  = x.range(23, 16); // SV: x[23:16]

// Part select — write
x.range(15, 8) = 0xFF;          // SV: x[15:8] = 8'hFF
// x is now 0xABCDFF34

// Concatenation — no direct operator, use sc_bv
sc_bv<8> high = 0xAB, low = 0xCD;
sc_bv<16> concat;
concat.range(15, 8) = high;
concat.range(7, 0)  = low;
// Result: 0xABCD

The range(hi, lo) method is a critical difference from SV. In SV, x[7:4] is read left-to-right as [high:low]. SystemC preserves this convention exactly: x.range(7, 4) is high=7, low=4. If you reverse them (range(4, 7)), you get a zero-width or invalid range — not a reversed slice.

Overflow Behavior — Hardware Wrapping

Both sc_uint<N> and sc_int<N> wrap on overflow, exactly like hardware:

sc_uint<4> a = 15;
sc_uint<4> b = a + 1;   // b = 0 (wraps at 16, stays in 4-bit range)

sc_int<4> c = 7;
sc_int<4> d = c + 1;    // d = -8 (wraps from +7 to -8, two's complement)

// SV equivalent:
// logic [3:0] a = 4'hF;
// logic [3:0] b = a + 1;  // b = 4'h0 — same wrapping behavior

This is correct hardware behavior. But in a testbench, silent wrapping can hide bugs. The test suite in this post explicitly tests overflow cases (ADD with 0xFFFFFFFF + 1) to verify that the ALU handles wrapping correctly.

The SRA / SRL Type Selection — The Core Insight

The single most important type choice in this ALU is whether to use sc_uint<32> or sc_int<32> for shift-right operations. The distinction is between:

Logical right shift (SRL): fills vacated bits with zeros. 0x80000000 >> 1 = 0x40000000
Arithmetic right shift (SRA): fills vacated bits with the sign bit. 0x80000000 >> 1 = 0xC0000000

In SystemC, the type of the value determines which shift is performed:

sc_uint<32> u = 0x80000000;
sc_int<32>  s = 0x80000000;  // same bit pattern, interpreted as -2147483648

u >> 1;  // = 0x40000000  (logical shift — sc_uint always logical)
s >> 1;  // = 0xC0000000  (arithmetic shift — sc_int always arithmetic)

In SystemVerilog, the operator determines the behavior:

logic [31:0] u = 32'h80000000;
u >> 1;    // = 32'h40000000  (logical)
u >>> 1;   // = 32'hC0000000  (arithmetic, if u is declared signed)
$signed(u) >>> 1;  // = 32'hC0000000  (cast to signed, then arithmetic shift)

The (sc_int<32>)a.read() cast in the SRA case does exactly what $signed(a) does in SV: it reinterprets the bit pattern as signed without changing any bits, then the >> operator applies arithmetic shift semantics. The cast back to sc_uint<32> restores unsigned interpretation for the output port.

Translation Table

Concept	Classic Verilog	SystemVerilog	SystemC
`sc_uint<32>`	`reg [31:0]` (unsigned default)	`logic [31:0]`	`sc_uint<32>`
`sc_int<32>`	`reg signed [31:0]`	`logic signed [31:0]`	`sc_int<32>`
Signed comparison	—	`$signed(a) < $signed(b)`	`(sc_int<32>)a < (sc_int<32>)b`
Arithmetic right shift	`>>>` on signed reg	`>>> n`	`sc_int<32> >> n`
Logical right shift	`>> n`	`>> n` on logic	`sc_uint<32> >> n`
Part-select read	`val[7:4]`	`val[7:4]`	`val.range(7, 4)`
Part-select write	`val[7:4] = 4'hF`	`val[7:4] = 4'hF`	`val.range(7, 4) = 0xF`
Signal write	`a = val` (procedural)	`a <= val` (non-blocking)	`a.write(val)`
Signal read	`a` (direct)	`a` (direct)	`a.read()`
SC_METHOD + sensitivity	`always @(a or b or op)`	`always_comb`	`SC_METHOD` + `sensitive << a << b << op`
Cast to signed	—	`$signed(a)`	`(sc_int<32>)a`
Cast to unsigned	—	`$unsigned(a)`	`(sc_uint<32>)a`
Shift amount mask	Not needed (operator handles)	Not needed	`b.range(4, 0)` for RV32I 5-bit shift

The critical difference: sc_uint<32> arithmetic is unsigned. -1 stored as sc_uint<32> is 0xFFFFFFFF — the maximum unsigned 32-bit value. When you compare sc_uint<32>(0xFFFFFFFF) < sc_uint<32>(1), the result is false — because 4294967295 is not less than 1. That is correct for SLTU (unsigned less-than). But SLT (signed less-than) requires treating 0xFFFFFFFF as -1, which is less than 1. For SLT, you must cast to sc_int<32> before comparing. This is not an edge case — it is the correct RISC-V specification behavior.

RV32I ALU Operations

The ten RV32I integer ALU operations map directly to RISC-V R-type instructions (and their I-type immediate variants). Here is each operation, the instruction that exercises it, and its role in real programs:

Op	Instruction	Use Case in Real Programs
ADD	`ADD rd, rs1, rs2`	Address calculation, pointer arithmetic, loop counter increment
SUB	`SUB rd, rs1, rs2`	Loop counter decrement, comparisons (SUB then check zero), array index difference
AND	`AND rd, rs1, rs2`	Bit masking (extract a field: `AND rs1, mask`), alignment check
OR	`OR rd, rs1, rs2`	Set bits, combine flags, pack multiple small values into one register
XOR	`XOR rd, rs1, rs2`	Toggle bits; `XOR rd, rs1, rs1 = 0` is the RISC-V idiom for zeroing a register
SLT	`SLT rd, rs1, rs2`	Signed comparison for conditional branches: `if (a < b)` in C
SLTU	`SLTU rd, rs1, rs2`	Unsigned comparison: pointer comparison, array bounds check
SLL	`SLL rd, rs1, rs2`	Logical left shift — multiply by power of 2, pack fields
SRL	`SRL rd, rs1, rs2`	Logical right shift — unsigned divide by power of 2, extract high bits
SRA	`SRA rd, rs1, rs2`	Arithmetic right shift — signed divide by power of 2, sign-extend

The zero flag (not a RISC-V instruction, but a hardware output we add): the ALU outputs a zero signal that is true when result == 0. The branch unit uses this for BEQ (branch if equal) and BNE (branch if not equal) — it computes SUB(rs1, rs2) and checks the zero flag. If zero, the two registers are equal. This is a hardware trick that eliminates a dedicated comparator — the ALU does the work.

Implementation

Module Diagram

graph LR
    A["a\n(sc_uint<32>)"] --> ALU[ALU\nSC_METHOD]
    B["b\n(sc_uint<32>)"] --> ALU
    OP["op\n(sc_uint<4>)"] --> ALU
    ALU --> R["result\n(sc_uint<32>)"]
    ALU --> Z["zero\n(bool)"]
    style ALU fill:#06b6d4,color:#fff

Complete Code

// File: alu_tb.cpp
// RV32I ALU — 10 operations, directed testbench
#include <systemc.h>
#include <iostream>
#include <iomanip>

// ─── ALU Operation Encoding ───────────────────────────────────────────────────
// These values match the funct3/funct7 decoding we'll implement in Post 9.
// Using a plain enum with sc_uint<4> underlying type for clean port compatibility.
enum alu_op_t {
    ALU_ADD  = 0,
    ALU_SUB  = 1,
    ALU_AND  = 2,
    ALU_OR   = 3,
    ALU_XOR  = 4,
    ALU_SLT  = 5,   // signed less-than
    ALU_SLTU = 6,   // unsigned less-than
    ALU_SLL  = 7,   // shift left logical
    ALU_SRL  = 8,   // shift right logical (unsigned)
    ALU_SRA  = 9    // shift right arithmetic (signed, sign-extending)
};

// ─── ALU Module ───────────────────────────────────────────────────────────────
// Pure combinational — SC_METHOD with sensitivity to all inputs.
// Maps directly to always_comb in SystemVerilog.
//
// Key type decisions:
//   sc_uint<32> for a, b, result — default unsigned arithmetic
//   Cast to sc_int<32> for SLT, SRA — where signed semantics are required by spec
//   b.range(4,0) for shift amounts — RISC-V only uses the lower 5 bits for RV32I shifts

SC_MODULE(alu) {
    sc_in<sc_uint<32>>  a, b;
    sc_in<sc_uint<4>>   op;
    sc_out<sc_uint<32>> result;
    sc_out<bool>        zero;   // true when result == 0 (used by BEQ/BNE)

    void compute() {
        sc_uint<32> res = 0;

        switch ((int)op.read()) {
            case ALU_ADD:
                res = a.read() + b.read();
                break;
            case ALU_SUB:
                res = a.read() - b.read();
                break;
            case ALU_AND:
                res = a.read() & b.read();
                break;
            case ALU_OR:
                res = a.read() | b.read();
                break;
            case ALU_XOR:
                res = a.read() ^ b.read();
                break;
            case ALU_SLT:
                // Cast to sc_int<32> for correct signed comparison.
                // Without the cast: (sc_uint<32>)0xFFFFFFFF < 1 → false (wrong)
                // With the cast:    (sc_int<32>) 0xFFFFFFFF < 1 → true  (correct, -1 < 1)
                res = ((sc_int<32>)a.read() < (sc_int<32>)b.read()) ? 1 : 0;
                break;
            case ALU_SLTU:
                // No cast — unsigned comparison is sc_uint<32> default
                res = (a.read() < b.read()) ? 1 : 0;
                break;
            case ALU_SLL:
                // Only lower 5 bits of b are the shift amount in RV32I
                res = a.read() << b.read().range(4, 0);
                break;
            case ALU_SRL:
                // Logical right shift — zero-fills from the left
                res = a.read() >> b.read().range(4, 0);
                break;
            case ALU_SRA:
                // Arithmetic right shift — sign-fills from the left.
                // Cast to sc_int<32> so >> preserves the sign bit.
                res = (sc_uint<32>)((sc_int<32>)a.read() >> b.read().range(4, 0));
                break;
            default:
                res = 0;  // undefined opcode — output zero
                break;
        }

        result.write(res);
        zero.write(res == 0);
    }

    SC_CTOR(alu) {
        SC_METHOD(compute);
        sensitive << a << b << op;
    }
};

// ─── Test Infrastructure ──────────────────────────────────────────────────────

// sc_signals connecting tb to DUT
sc_signal<sc_uint<32>> sig_a, sig_b, sig_result;
sc_signal<sc_uint<4>>  sig_op;
sc_signal<bool>        sig_zero;

static int pass_count = 0;
static int fail_count = 0;

// Apply inputs, advance zero-time simulation step, check result
void run_test(const char* name,
              sc_uint<32> a_val, sc_uint<32> b_val, alu_op_t op_val,
              sc_uint<32> expected_result, bool expected_zero) {
    sig_op.write(op_val);
    sig_a.write(a_val);
    sig_b.write(b_val);
    sc_start(SC_ZERO_TIME);  // allow SC_METHOD to fire and result to settle

    bool result_ok = (sig_result.read() == expected_result);
    bool zero_ok   = (sig_zero.read()   == expected_zero);
    bool ok        = result_ok && zero_ok;

    std::cout << (ok ? "PASS" : "FAIL") << "  " << std::left << std::setw(5) << name
              << "  a=0x" << std::hex << std::setw(8) << std::setfill('0') << (uint32_t)a_val
              << "  b=0x" << std::setw(8) << (uint32_t)b_val
              << "  result=0x" << std::setw(8) << (uint32_t)sig_result.read()
              << "  zero=" << std::dec << sig_zero.read();
    if (!ok) {
        std::cout << "  ← expected result=0x"
                  << std::hex << (uint32_t)expected_result
                  << " zero=" << std::dec << expected_zero;
    }
    std::cout << std::setfill(' ') << std::endl;

    if (ok) pass_count++; else fail_count++;
}

// ─── sc_main ──────────────────────────────────────────────────────────────────

int sc_main(int argc, char* argv[]) {
    // Instantiate and connect ALU
    alu dut("dut");
    dut.a(sig_a);
    dut.b(sig_b);
    dut.op(sig_op);
    dut.result(sig_result);
    dut.zero(sig_zero);

    std::cout << "=== RV32I ALU Directed Test ===" << std::endl << std::endl;

    // ── ADD ─────────────────────────────────────────────────────────────────
    std::cout << "--- ADD ---" << std::endl;
    run_test("ADD",  5,          3,          ALU_ADD,  8,          false);
    run_test("ADD",  0xFFFFFFFF, 1,          ALU_ADD,  0,          true);   // overflow wraps to 0
    run_test("ADD",  0,          0,          ALU_ADD,  0,          true);   // zero flag

    // ── SUB ─────────────────────────────────────────────────────────────────
    std::cout << "--- SUB ---" << std::endl;
    run_test("SUB",  10,         3,          ALU_SUB,  7,          false);
    run_test("SUB",  0,          1,          ALU_SUB,  0xFFFFFFFF, false);  // underflow wraps
    run_test("SUB",  5,          5,          ALU_SUB,  0,          true);   // zero flag — used by BEQ

    // ── AND ─────────────────────────────────────────────────────────────────
    std::cout << "--- AND ---" << std::endl;
    run_test("AND",  0xFF00FF00, 0x0F0F0F0F, ALU_AND,  0x0F000F00, false);
    run_test("AND",  0xFFFFFFFF, 0,          ALU_AND,  0,          true);

    // ── OR ──────────────────────────────────────────────────────────────────
    std::cout << "--- OR ---" << std::endl;
    run_test("OR",   0xFF000000, 0x00FF0000, ALU_OR,   0xFFFF0000, false);
    run_test("OR",   0,          0,          ALU_OR,   0,          true);

    // ── XOR ─────────────────────────────────────────────────────────────────
    std::cout << "--- XOR ---" << std::endl;
    run_test("XOR",  0xAAAAAAAA, 0x55555555, ALU_XOR,  0xFFFFFFFF, false);
    run_test("XOR",  0xDEADBEEF, 0xDEADBEEF, ALU_XOR, 0,          true);   // XOR x, x = 0

    // ── SLT (signed less-than) ───────────────────────────────────────────────
    std::cout << "--- SLT ---" << std::endl;
    // -1 (stored as 0xFFFFFFFF) < 1 → signed result: true
    run_test("SLT",  0xFFFFFFFF, 1,          ALU_SLT,  1,          false);
    // 1 < -1 → false
    run_test("SLT",  1,          0xFFFFFFFF, ALU_SLT,  0,          true);
    // 5 < 3 → false
    run_test("SLT",  5,          3,          ALU_SLT,  0,          true);

    // ── SLTU (unsigned less-than) ────────────────────────────────────────────
    std::cout << "--- SLTU ---" << std::endl;
    // 0xFFFFFFFF > 1 unsigned → SLTU(0xFFFF..., 1) = 0 (not less)
    run_test("SLTU", 0xFFFFFFFF, 1,          ALU_SLTU, 0,          true);
    // 1 < 0xFFFFFFFF unsigned → SLTU(1, 0xFFFF...) = 1
    run_test("SLTU", 1,          0xFFFFFFFF, ALU_SLTU, 1,          false);

    // ── SLL (shift left logical) ─────────────────────────────────────────────
    std::cout << "--- SLL ---" << std::endl;
    run_test("SLL",  1,          0,          ALU_SLL,  1,          false);  // shift by 0 = no change
    run_test("SLL",  1,          4,          ALU_SLL,  16,         false);  // 1 << 4 = 16
    run_test("SLL",  1,          31,         ALU_SLL,  0x80000000, false);  // 1 << 31 = MSB set
    run_test("SLL",  0xFFFFFFFF, 1,          ALU_SLL,  0xFFFFFFFE, false);  // shift out MSB

    // ── SRL (shift right logical — unsigned) ─────────────────────────────────
    std::cout << "--- SRL ---" << std::endl;
    run_test("SRL",  0x80000000, 1,          ALU_SRL,  0x40000000, false);  // MSB not preserved
    run_test("SRL",  0xFFFFFFFF, 4,          ALU_SRL,  0x0FFFFFFF, false);  // zero-fills from left
    run_test("SRL",  16,         4,          ALU_SRL,  1,          false);

    // ── SRA (shift right arithmetic — signed) ────────────────────────────────
    std::cout << "--- SRA ---" << std::endl;
    // 0x80000000 = most negative 32-bit signed value (-2147483648)
    // SRA by 1 should give 0xC0000000 (-1073741824) — sign bit is preserved/replicated
    run_test("SRA",  0x80000000, 1,          ALU_SRA,  0xC0000000, false);
    // Positive number: SRA and SRL are identical
    run_test("SRA",  0x40000000, 1,          ALU_SRA,  0x20000000, false);
    // All ones >> 4 stays all ones (sign extension fills with 1s)
    run_test("SRA",  0xFFFFFFFF, 4,          ALU_SRA,  0xFFFFFFFF, false);

    // ── Summary ──────────────────────────────────────────────────────────────
    std::cout << std::endl
              << "=== Results: "
              << pass_count << " PASS, "
              << fail_count << " FAIL ==="
              << std::endl;

    return (fail_count > 0) ? 1 : 0;
}

Simulation Semantics

How sc_start(SC_ZERO_TIME) Works for Combinational Testing

The test harness calls sc_start(SC_ZERO_TIME) after each stimulus write. This is a deliberate choice that requires explanation.

1. sig_op.write(op_val)   — writes to signal's "new value" pending buffer
2. sig_a.write(a_val)     — same: pending, not yet visible
3. sig_b.write(b_val)     — same: pending, not yet visible
4. sc_start(SC_ZERO_TIME) — triggers one evaluate-update cycle:
   a. Evaluate phase: ALU's SC_METHOD fires because a, b, op are in its sensitivity list
   b. ALU reads a, b, op — sees the NEW values written above
   c. ALU writes result and zero — these are also in "pending" buffers
   d. Update phase: all pending signal values become visible
   e. Delta cycle 2: if result's change triggers anything, it fires now
5. sig_result.read()      — reads the settled, updated result

Without sc_start(SC_ZERO_TIME), you are reading the signal before the SC_METHOD has had a chance to react to the new inputs. The result would be from the previous stimulus. This is the correct simulation discipline for testing combinational logic: write inputs, advance one delta cycle, read outputs.

ASCII timing — one test transaction:

                    sc_start(SC_ZERO_TIME)
                           │
Write a,b,op ──────────────┤ Δ0: a,b,op update becomes visible
                           ├──────────────────────────────────
                           │ Δ1: SC_METHOD fires, writes result
                           │     result update is pending
                           ├──────────────────────────────────
                           │ Δ2: result update becomes visible
                           │     (no further changes → stable)
                           │
Read result ───────────────┘ reads settled value

Each call to sc_start(SC_ZERO_TIME) runs exactly the number of delta cycles needed to reach a stable state — no simulated time advances.

Why SC_ZERO_TIME Instead of sc_start(1, SC_NS)

For purely combinational logic, using sc_start(1, SC_NS) would also work — but it wastes simulation time and can interact with time-triggered processes. SC_ZERO_TIME is the precise tool: evaluate all pending changes, let them propagate to stability, advance no simulated time. This is the standard pattern for directed combinational tests.

Key Concepts Explained

sc_uint<32> vs sc_int<32> — Why It Matters

SystemC provides two 32-bit integer types with very different arithmetic behavior:

sc_uint<32> — unsigned 32-bit integer
- Range: 0 to 4,294,967,295 (0x00000000 to 0xFFFFFFFF)
- >> operator: logical right shift — fills with zeros from the left
- < operator: unsigned comparison — 0xFFFFFFFF > 0x00000001
- Overflow: wraps silently (0xFFFFFFFF + 1 = 0x00000000)

sc_int<32> — signed 32-bit two's complement integer
- Range: -2,147,483,648 to 2,147,483,647
- >> operator: arithmetic right shift — fills with the sign bit from the left
- < operator: signed comparison — 0xFFFFFFFF (-1) < 0x00000001 (1)
- 0x80000000 is the most negative value (-2,147,483,648)

The RISC-V specification defines which operations are signed and which are unsigned. Getting this wrong produces results that are silently incorrect — the simulation runs, the test passes on easy cases, and the bug surfaces only on negative numbers or values with the MSB set.

Casting rule for this ALU:
- Use sc_uint<32> by default for ports and intermediate values
- Cast to sc_int<32> only for signed operations: (sc_int<32>)a.read() for SLT and SRA
- Cast the SRA result back to sc_uint<32> before writing to the result port

The Zero Flag and Branch Instructions

The zero output port becomes essential at Post 14 when we add branches. RISC-V has no comparison instruction that directly sets a flag register. Instead, the branch unit uses the ALU:

BEQ rs1, rs2, label:
    1. Compute SUB(rs1, rs2) in the ALU
    2. If zero == true → rs1 == rs2 → branch taken
    3. If zero == false → rs1 != rs2 → branch not taken

This is why our ALU includes zero as an output — it is not redundant. It mirrors the hardware design of actual RISC-V processors, where the branch resolution logic uses the ALU result's zero condition.

sc_uint<32>::range(4,0) — The Shift Amount

RISC-V defines that for RV32I, shift operations use only the lower 5 bits of the shift amount register. Bits [31:5] are ignored. This means the maximum shift is 31, which covers the full 32-bit range.

res = a.read() << b.read().range(4, 0);

b.read().range(4, 0) extracts bits 4 down to 0 from b, giving a 5-bit value (0–31). Without this, a shift amount of 32 or larger produces undefined behavior in C++ (left-shift of 32+ bits on a 32-bit type is UB). The range() call enforces the hardware specification and prevents UB.

Build & Run

# CMakeLists.txt in section1/post05/
cmake_minimum_required(VERSION 3.16)
project(post05_alu)

set(SYSTEMC_HOME $ENV{SYSTEMC_HOME})
include_directories(${SYSTEMC_HOME}/include)
link_directories(${SYSTEMC_HOME}/lib-linux64)

add_executable(alu_tb alu_tb.cpp)
target_link_libraries(alu_tb systemc)

mkdir build && cd build
cmake .. && make
./alu_tb

Expected last line:

=== Results: 22 PASS, 0 FAIL ===

The return code from sc_main is 0 on all-pass, 1 if any test fails — so you can use ./alu_tb && echo "CI: PASS" || echo "CI: FAIL" in a CI script.

Common Pitfalls for SV Engineers

Pitfall 1: Signed vs Unsigned Comparisons — The Silent Wrong Answer

The most dangerous pitfall in SystemC integer types is using sc_uint<32> for signed comparisons. Because sc_uint is always unsigned, sc_uint<32>(0xFFFFFFFF) > sc_uint<32>(1) evaluates to true — even though 0xFFFFFFFF represents -1 in two's complement. The wrong type produces a wrong answer with no error, no warning, and no exception.

// Testing SLT: should -1 < 1? Yes.
sc_uint<32> a_unsigned = 0xFFFFFFFF;  // bit pattern for -1

// WRONG — unsigned comparison:
bool slt_wrong = (a_unsigned < sc_uint<32>(1));  // = false (-1 is NOT < 1 in unsigned world)

// CORRECT — signed comparison:
bool slt_correct = ((sc_int<32>)a_unsigned < sc_int<32>(1));  // = true (-1 IS < 1 signed)

In SystemVerilog, $signed() is used to mark a value as signed for a comparison. In SystemC, the type carries the signedness. The cast (sc_int<32>) is the SystemC equivalent of $signed().

Pitfall 2: sc_uint Wraps Silently — 5-bit Shift Amount Matters

sc_uint<5> holds values 0–31. If you pass a shift amount of 32 to an sc_uint<5>, it wraps to 0 silently:

sc_uint<5> shift_amount = 32;  // wraps to 0! (32 mod 32 = 0)
sc_uint<32> result = a.read() << shift_amount;  // shifts by 0, not 32

For the ALU, we extract the shift amount with .range(4, 0) before using it. This prevents both the UB of shifting a 32-bit value by 32+ bits and the incorrect wrapping behavior. Always be explicit about bit-width when extracting fields used as parameters.

Pitfall 3: Printing sc_uint — Version-Dependent Behavior

Different SystemC versions handle cout << sc_uint_var differently. Some print decimal, some print binary, some require .to_uint(). The safest approach for testbench output is always explicit:

sc_uint<32> result = /* ... */;

// Unreliable across versions:
std::cout << result;              // may print binary string, may print decimal

// Reliable and explicit:
std::cout << result.to_uint();    // always decimal uint32_t
std::cout << result.to_string();  // always binary string "0b00101..."
std::cout << (uint32_t)result;    // cast to plain C++ type — always decimal
std::cout << std::hex << (uint32_t)result;  // hex with explicit cast

Pitfall 4: sc_bv Does Not Support Arithmetic — Compile Error

If you accidentally use sc_bv<N> where you need arithmetic, the compiler error is immediate but the message is cryptic (template substitution failure). The fix is always to use sc_uint<N> or sc_int<N> instead.

sc_bv<32> a = 5, b = 3;
sc_bv<32> sum = a + b;   // COMPILE ERROR: operator+ not defined for sc_bv

sc_uint<32> c = 5, d = 3;
sc_uint<32> result = c + d;  // CORRECT: 8

The rule: sc_bv is for packing and unpacking bit fields. sc_uint/sc_int is for computation.

Pitfall 5: Mixing sc_uint<32> and uint32_t in Expressions — Implicit Conversion Surprises

sc_uint<32> and uint32_t are different types. Mixing them in arithmetic expressions can produce surprising results because C++ applies its own implicit promotion rules, not SystemC hardware semantics.

sc_uint<32> sig_val = port.read();
uint32_t    c_val   = 0xFFFF0000;

// This works — sc_uint knows how to convert from uint32_t in write/read
port.write(c_val);         // OK
sc_uint<32> x = c_val;    // OK: implicit conversion

// This can be surprising:
uint32_t diff = sig_val - c_val;  // C++ converts sig_val to integer,
                                   // applies C++ integer arithmetic,
                                   // may not match sc_uint<32> semantics
                                   // for large values near 2^32

// Safe: keep everything as sc_uint<32> during computation
sc_uint<32> diff_hw = sig_val - sc_uint<32>(c_val);  // hardware semantics

The general rule: do your arithmetic in sc_uint/sc_int types, then convert to C++ types only at print/assert boundaries.

DV Insight

DV Insight Two subtleties that trip up every engineer writing their first SystemC ALU.

First: sc_uint<32> overflow is silent. 0xFFFFFFFF + 1 produces 0 with no exception, no flag, no warning. This matches hardware behavior — hardware adders wrap around silently too. But it means your test cases must explicitly cover overflow if you want to verify it. The test above includes ADD(0xFFFFFFFF, 1) for exactly this reason. If you only test with small values where overflow is impossible, you have not verified the overflow behavior.

Second: The zero flag is most naturally tested by doing SUB(x, x) — any value minus itself is zero. But you should also verify that SUB(x, y) with x != y produces zero = false. A broken implementation that always outputs zero = true would pass a test suite that only checks SUB(x, x). Test the negative case too.

Broader principle: A directed test that only covers "happy path" inputs is not a test suite — it is an optimistic demo. For the ALU, the interesting cases are: overflow, underflow, most-negative signed value, maximum unsigned value, shift by 0, shift by 31, and XOR with itself. These are the cases where wrong type choices (signed vs unsigned) produce wrong results. Design your directed tests around the boundaries, not the interior of the valid range.

Integration

This ALU is now a complete, tested hardware block. Its role in the RISC-V CPU:

Immediate next use (Post 6 and 7): The ALU becomes the DUT for the first structured testbench and the Section 1 capstone. We will add a monitor, a checker, and VCD waveform generation on top of the directed test here.

Post 8 (Register File): The register file produces the a and b operand values that feed into the ALU's inputs. Once both the ALU and the register file are complete, we have the two core compute elements.

Post 9 (Decoder): The decoder takes a 32-bit RV32I instruction word and produces the op value (along with a, b source selects and write-back control). The op encoding we defined here — ALU_ADD = 0, ALU_SUB = 1, etc. — will be what the decoder outputs.

Post 18 (Execute Stage, Single-Cycle CPU): The ALU is wired into the execute stage as-is. The forwarding unit (Post 20) adds multiplexers in front of the a and b inputs to handle pipeline hazards, but the ALU module itself does not change.

Posts 23–27 (UVM Testbench): The ALU is the first block we write a formal UVM-SystemC testbench for. The directed tests here become the starting point for a constrained-random test environment with a reference model.

Series progress:
- Post 1 — Modules, Ports & Signals ✓
- Post 2 — Simulation Time & Clocks ✓
- Post 3 — Delta Cycles & Event-Driven Semantics ✓
- Post 4 — SC_METHOD vs SC_THREAD ✓
- Post 5 — Building the RV32I ALU ✓
- Post 6 — Your First SystemC Testbench (next)

What's Next

Post 6: Your First SystemC Testbench

The ALU test in sc_main above is functional, but it mixes stimulus generation, checking, and result logging into one flat function. That works for 22 tests. It does not scale to 2,200 tests, does not support multiple independent checkers, and does not produce structured output that a CI system can parse.

Post 6 restructures this test into a proper three-component testbench: a Monitor that passively observes the DUT outputs, a Checker that compares observations against expected values, and a Stimulus Driver that generates inputs. It also adds sc_trace to dump VCD waveforms for GTKWave inspection.

This is the testbench pattern we will use for the entire CPU build — and it is the same pattern as a UVM agent, just without the UVM infrastructure overhead.

Post 6 → Your First SystemC Testbench

← Part 4: SC_METHOD vs SC_THREAD Part 5 of 13 Part 6: Your First SystemC Testbench →

5. SystemC Tutorial - Building the RV32I ALU