5. SystemC Tutorial - Building the RV32I ALU

Introduction

Every processor, from an embedded microcontroller to a datacenter CPU, has an Arithmetic-Logic Unit at its core. The ALU is the block that actually does the work — it adds, subtracts, compares, shifts, and performs bitwise operations on the data values that the program manipulates. Everything else in the CPU — the fetch unit, the decoder, the register file, the pipeline — exists to feed the ALU the right operands and the right opcode at the right time, and then route its result to the right destination.

The SiFive FU740, the chip inside the HiFive Unmatched development board, implements RV64GC — a 64-bit RISC-V core with multiply/divide and compressed instruction extensions. Its main ALU handles all RV64I integer operations for up to four hardware threads simultaneously. The Western Digital SweRV EH1, an in-order dual-issue RISC-V core used inside Western Digital NVMe SSDs, implements a deeply pipelined ALU to sustain one operation per clock at high frequency — splitting the ALU into multiple pipeline stages rather than computing the full result in one cycle. The T-HEAD C906, used in the Allwinner D1 chip that powers many low-cost RISC-V Linux boards, implements a similar RV64GC ALU. In every case, the fundamental operations are the same — the differences are in how many operations execute per cycle, how many pipeline stages the computation is split across, and how hazards are handled.

We are building RV32I — the 32-bit base integer instruction set. This is the smallest, cleanest version of the RISC-V ISA: 10 ALU operations, 32-bit operands, no multiply/divide. It is also the version that every RISC-V tutorial and textbook starts with, because the instruction encodings are clean, the operations are simple, and the full pipeline is buildable in reasonable time. The ALU we write here will be wired directly into the Execute stage of our single-cycle CPU (Post 18), driven by the decoded control signals from the Decode stage (Post 9), and fed operands from the Register File (Post 8) and the Forwarding Unit (Post 20).

This is also the first post where sc_uint<32> vs sc_int<32> matters for correctness. RISC-V defines some operations as signed (SLT, SRA) and some as unsigned (SLTU, SRL). Using the wrong type produces subtly wrong results for negative numbers and large unsigned values — results that look plausible but are wrong. The test cases in this post are designed to catch exactly those errors.


Prerequisites


Translation Table

Concept C++ Engineer SystemVerilog Engineer
sc_uint<32> uint32_t (unsigned, wraps on overflow) logic [31:0] (unsigned by default)
sc_int<32> int32_t (signed, two's complement) logic signed [31:0]
sc_uint<32>::range(4,0) value & 0x1F (extract bits [4:0]) value[4:0]
(sc_int<32>)a.read() (int32_t)a.read() $signed(a)
result == 0 result == 0 result == 32'h0
SC_METHOD with sensitive << a << b << op Lambda called on any input change always_comb sensitive to a, b, op

The critical difference: sc_uint<32> arithmetic is unsigned. -1 stored as sc_uint<32> is 0xFFFFFFFF — the maximum unsigned 32-bit value. When you compare sc_uint<32>(0xFFFFFFFF) < sc_uint<32>(1), the result is false — because 4294967295 is not less than 1. That is correct for SLTU (unsigned less-than). But SLT (signed less-than) requires treating 0xFFFFFFFF as -1, which is less than 1. For SLT, you must cast to sc_int<32> before comparing. This is not an edge case — it is the correct RISC-V specification behavior.


RV32I ALU Operations

The ten RV32I integer ALU operations map directly to RISC-V R-type instructions (and their I-type immediate variants). Here is each operation, the instruction that exercises it, and its role in real programs:

Op Instruction Use Case in Real Programs
ADD ADD rd, rs1, rs2 Address calculation, pointer arithmetic, loop counter increment
SUB SUB rd, rs1, rs2 Loop counter decrement, comparisons (SUB then check zero), array index difference
AND AND rd, rs1, rs2 Bit masking (extract a field: AND rs1, mask), alignment check
OR OR rd, rs1, rs2 Set bits, combine flags, pack multiple small values into one register
XOR XOR rd, rs1, rs2 Toggle bits; XOR rd, rs1, rs1 = 0 is the RISC-V idiom for zeroing a register
SLT SLT rd, rs1, rs2 Signed comparison for conditional branches: if (a < b) in C
SLTU SLTU rd, rs1, rs2 Unsigned comparison: pointer comparison, array bounds check
SLL SLL rd, rs1, rs2 Logical left shift — multiply by power of 2, pack fields
SRL SRL rd, rs1, rs2 Logical right shift — unsigned divide by power of 2, extract high bits
SRA SRA rd, rs1, rs2 Arithmetic right shift — signed divide by power of 2, sign-extend

The zero flag (not a RISC-V instruction, but a hardware output we add): the ALU outputs a zero signal that is true when result == 0. The branch unit uses this for BEQ (branch if equal) and BNE (branch if not equal) — it computes SUB(rs1, rs2) and checks the zero flag. If zero, the two registers are equal. This is a hardware trick that eliminates a dedicated comparator — the ALU does the work.


Implementation

Module Diagram

graph LR
    A["a\n(sc_uint<32>)"] --> ALU[ALU\nSC_METHOD]
    B["b\n(sc_uint<32>)"] --> ALU
    OP["op\n(sc_uint<4>)"] --> ALU
    ALU --> R["result\n(sc_uint<32>)"]
    ALU --> Z["zero\n(bool)"]
    style ALU fill:#06b6d4,color:#fff

Complete Code

// File: alu_tb.cpp
// RV32I ALU — 10 operations, directed testbench
#include <systemc.h>
#include <iostream>
#include <iomanip>

// ─── ALU Operation Encoding ───────────────────────────────────────────────────
// These values match the funct3/funct7 decoding we'll implement in Post 9.
// Using a plain enum with sc_uint<4> underlying type for clean port compatibility.
enum alu_op_t {
    ALU_ADD  = 0,
    ALU_SUB  = 1,
    ALU_AND  = 2,
    ALU_OR   = 3,
    ALU_XOR  = 4,
    ALU_SLT  = 5,   // signed less-than
    ALU_SLTU = 6,   // unsigned less-than
    ALU_SLL  = 7,   // shift left logical
    ALU_SRL  = 8,   // shift right logical (unsigned)
    ALU_SRA  = 9    // shift right arithmetic (signed, sign-extending)
};

// ─── ALU Module ───────────────────────────────────────────────────────────────
// Pure combinational — SC_METHOD with sensitivity to all inputs.
// Maps directly to always_comb in SystemVerilog.
//
// Key type decisions:
//   sc_uint<32> for a, b, result — default unsigned arithmetic
//   Cast to sc_int<32> for SLT, SRA — where signed semantics are required by spec
//   b.range(4,0) for shift amounts — RISC-V only uses the lower 5 bits for RV32I shifts

SC_MODULE(alu) {
    sc_in<sc_uint<32>>  a, b;
    sc_in<sc_uint<4>>   op;
    sc_out<sc_uint<32>> result;
    sc_out<bool>        zero;   // true when result == 0 (used by BEQ/BNE)

    void compute() {
        sc_uint<32> res = 0;

        switch ((int)op.read()) {
            case ALU_ADD:
                res = a.read() + b.read();
                break;
            case ALU_SUB:
                res = a.read() - b.read();
                break;
            case ALU_AND:
                res = a.read() & b.read();
                break;
            case ALU_OR:
                res = a.read() | b.read();
                break;
            case ALU_XOR:
                res = a.read() ^ b.read();
                break;
            case ALU_SLT:
                // Cast to sc_int<32> for correct signed comparison.
                // Without the cast: (sc_uint<32>)0xFFFFFFFF < 1 → false (wrong)
                // With the cast:    (sc_int<32>) 0xFFFFFFFF < 1 → true  (correct, -1 < 1)
                res = ((sc_int<32>)a.read() < (sc_int<32>)b.read()) ? 1 : 0;
                break;
            case ALU_SLTU:
                // No cast — unsigned comparison is sc_uint<32> default
                res = (a.read() < b.read()) ? 1 : 0;
                break;
            case ALU_SLL:
                // Only lower 5 bits of b are the shift amount in RV32I
                res = a.read() << b.read().range(4, 0);
                break;
            case ALU_SRL:
                // Logical right shift — zero-fills from the left
                res = a.read() >> b.read().range(4, 0);
                break;
            case ALU_SRA:
                // Arithmetic right shift — sign-fills from the left.
                // Cast to sc_int<32> so >> preserves the sign bit.
                res = (sc_uint<32>)((sc_int<32>)a.read() >> b.read().range(4, 0));
                break;
            default:
                res = 0;  // undefined opcode — output zero
                break;
        }

        result.write(res);
        zero.write(res == 0);
    }

    SC_CTOR(alu) {
        SC_METHOD(compute);
        sensitive << a << b << op;
    }
};

// ─── Test Infrastructure ──────────────────────────────────────────────────────

// sc_signals connecting tb to DUT
sc_signal<sc_uint<32>> sig_a, sig_b, sig_result;
sc_signal<sc_uint<4>>  sig_op;
sc_signal<bool>        sig_zero;

static int pass_count = 0;
static int fail_count = 0;

// Apply inputs, advance zero-time simulation step, check result
void run_test(const char* name,
              sc_uint<32> a_val, sc_uint<32> b_val, alu_op_t op_val,
              sc_uint<32> expected_result, bool expected_zero) {
    sig_op.write(op_val);
    sig_a.write(a_val);
    sig_b.write(b_val);
    sc_start(SC_ZERO_TIME);  // allow SC_METHOD to fire and result to settle

    bool result_ok = (sig_result.read() == expected_result);
    bool zero_ok   = (sig_zero.read()   == expected_zero);
    bool ok        = result_ok && zero_ok;

    std::cout << (ok ? "PASS" : "FAIL") << "  " << std::left << std::setw(5) << name
              << "  a=0x" << std::hex << std::setw(8) << std::setfill('0') << (uint32_t)a_val
              << "  b=0x" << std::setw(8) << (uint32_t)b_val
              << "  result=0x" << std::setw(8) << (uint32_t)sig_result.read()
              << "  zero=" << std::dec << sig_zero.read();
    if (!ok) {
        std::cout << "  ← expected result=0x"
                  << std::hex << (uint32_t)expected_result
                  << " zero=" << std::dec << expected_zero;
    }
    std::cout << std::setfill(' ') << std::endl;

    if (ok) pass_count++; else fail_count++;
}

// ─── sc_main ──────────────────────────────────────────────────────────────────

int sc_main(int argc, char* argv[]) {
    // Instantiate and connect ALU
    alu dut("dut");
    dut.a(sig_a);
    dut.b(sig_b);
    dut.op(sig_op);
    dut.result(sig_result);
    dut.zero(sig_zero);

    std::cout << "=== RV32I ALU Directed Test ===" << std::endl << std::endl;

    // ── ADD ─────────────────────────────────────────────────────────────────
    std::cout << "--- ADD ---" << std::endl;
    run_test("ADD",  5,          3,          ALU_ADD,  8,          false);
    run_test("ADD",  0xFFFFFFFF, 1,          ALU_ADD,  0,          true);   // overflow wraps to 0
    run_test("ADD",  0,          0,          ALU_ADD,  0,          true);   // zero flag

    // ── SUB ─────────────────────────────────────────────────────────────────
    std::cout << "--- SUB ---" << std::endl;
    run_test("SUB",  10,         3,          ALU_SUB,  7,          false);
    run_test("SUB",  0,          1,          ALU_SUB,  0xFFFFFFFF, false);  // underflow wraps
    run_test("SUB",  5,          5,          ALU_SUB,  0,          true);   // zero flag — used by BEQ

    // ── AND ─────────────────────────────────────────────────────────────────
    std::cout << "--- AND ---" << std::endl;
    run_test("AND",  0xFF00FF00, 0x0F0F0F0F, ALU_AND,  0x0F000F00, false);
    run_test("AND",  0xFFFFFFFF, 0,          ALU_AND,  0,          true);

    // ── OR ──────────────────────────────────────────────────────────────────
    std::cout << "--- OR ---" << std::endl;
    run_test("OR",   0xFF000000, 0x00FF0000, ALU_OR,   0xFFFF0000, false);
    run_test("OR",   0,          0,          ALU_OR,   0,          true);

    // ── XOR ─────────────────────────────────────────────────────────────────
    std::cout << "--- XOR ---" << std::endl;
    run_test("XOR",  0xAAAAAAAA, 0x55555555, ALU_XOR,  0xFFFFFFFF, false);
    run_test("XOR",  0xDEADBEEF, 0xDEADBEEF, ALU_XOR, 0,          true);   // XOR x, x = 0

    // ── SLT (signed less-than) ───────────────────────────────────────────────
    std::cout << "--- SLT ---" << std::endl;
    // -1 (stored as 0xFFFFFFFF) < 1 → signed result: true
    run_test("SLT",  0xFFFFFFFF, 1,          ALU_SLT,  1,          false);
    // 1 < -1 → false
    run_test("SLT",  1,          0xFFFFFFFF, ALU_SLT,  0,          true);
    // 5 < 3 → false
    run_test("SLT",  5,          3,          ALU_SLT,  0,          true);

    // ── SLTU (unsigned less-than) ────────────────────────────────────────────
    std::cout << "--- SLTU ---" << std::endl;
    // 0xFFFFFFFF > 1 unsigned → SLTU(0xFFFF..., 1) = 0 (not less)
    run_test("SLTU", 0xFFFFFFFF, 1,          ALU_SLTU, 0,          true);
    // 1 < 0xFFFFFFFF unsigned → SLTU(1, 0xFFFF...) = 1
    run_test("SLTU", 1,          0xFFFFFFFF, ALU_SLTU, 1,          false);

    // ── SLL (shift left logical) ─────────────────────────────────────────────
    std::cout << "--- SLL ---" << std::endl;
    run_test("SLL",  1,          0,          ALU_SLL,  1,          false);  // shift by 0 = no change
    run_test("SLL",  1,          4,          ALU_SLL,  16,         false);  // 1 << 4 = 16
    run_test("SLL",  1,          31,         ALU_SLL,  0x80000000, false);  // 1 << 31 = MSB set
    run_test("SLL",  0xFFFFFFFF, 1,          ALU_SLL,  0xFFFFFFFE, false);  // shift out MSB

    // ── SRL (shift right logical — unsigned) ─────────────────────────────────
    std::cout << "--- SRL ---" << std::endl;
    run_test("SRL",  0x80000000, 1,          ALU_SRL,  0x40000000, false);  // MSB not preserved
    run_test("SRL",  0xFFFFFFFF, 4,          ALU_SRL,  0x0FFFFFFF, false);  // zero-fills from left
    run_test("SRL",  16,         4,          ALU_SRL,  1,          false);

    // ── SRA (shift right arithmetic — signed) ────────────────────────────────
    std::cout << "--- SRA ---" << std::endl;
    // 0x80000000 = most negative 32-bit signed value (-2147483648)
    // SRA by 1 should give 0xC0000000 (-1073741824) — sign bit is preserved/replicated
    run_test("SRA",  0x80000000, 1,          ALU_SRA,  0xC0000000, false);
    // Positive number: SRA and SRL are identical
    run_test("SRA",  0x40000000, 1,          ALU_SRA,  0x20000000, false);
    // All ones >> 4 stays all ones (sign extension fills with 1s)
    run_test("SRA",  0xFFFFFFFF, 4,          ALU_SRA,  0xFFFFFFFF, false);

    // ── Summary ──────────────────────────────────────────────────────────────
    std::cout << std::endl
              << "=== Results: "
              << pass_count << " PASS, "
              << fail_count << " FAIL ==="
              << std::endl;

    return (fail_count > 0) ? 1 : 0;
}

Key Concepts Explained

sc_uint<32> vs sc_int<32> — Why It Matters

SystemC provides two 32-bit integer types with very different arithmetic behavior:

sc_uint<32> — unsigned 32-bit integer
- Range: 0 to 4,294,967,295 (0x00000000 to 0xFFFFFFFF)
- >> operator: logical right shift — fills with zeros from the left
- < operator: unsigned comparison — 0xFFFFFFFF > 0x00000001
- Overflow: wraps silently (0xFFFFFFFF + 1 = 0x00000000)

sc_int<32> — signed 32-bit two's complement integer
- Range: -2,147,483,648 to 2,147,483,647
- >> operator: arithmetic right shift — fills with the sign bit from the left
- < operator: signed comparison — 0xFFFFFFFF (-1) < 0x00000001 (1)
- 0x80000000 is the most negative value (-2,147,483,648)

The RISC-V specification defines which operations are signed and which are unsigned. Getting this wrong produces results that are silently incorrect — the simulation runs, the test passes on easy cases, and the bug surfaces only on negative numbers or values with the MSB set.

Casting rule for this ALU:
- Use sc_uint<32> by default for ports and intermediate values
- Cast to sc_int<32> only for signed operations: (sc_int<32>)a.read() for SLT and SRA
- Cast the SRA result back to sc_uint<32> before writing to the result port

The Zero Flag and Branch Instructions

The zero output port becomes essential at Post 14 when we add branches. RISC-V has no comparison instruction that directly sets a flag register. Instead, the branch unit uses the ALU:

BEQ rs1, rs2, label:
    1. Compute SUB(rs1, rs2) in the ALU
    2. If zero == true → rs1 == rs2 → branch taken
    3. If zero == false → rs1 != rs2 → branch not taken

This is why our ALU includes zero as an output — it is not redundant. It mirrors the hardware design of actual RISC-V processors, where the branch resolution logic uses the ALU result's zero condition.

sc_uint<32>::range(4,0) — The Shift Amount

RISC-V defines that for RV32I, shift operations use only the lower 5 bits of the shift amount register. Bits [31:5] are ignored. This means the maximum shift is 31, which covers the full 32-bit range.

res = a.read() << b.read().range(4, 0);

b.read().range(4, 0) extracts bits 4 down to 0 from b, giving a 5-bit value (0–31). Without this, a shift amount of 32 or larger produces undefined behavior in C++ (left-shift of 32+ bits on a 32-bit type is UB). The range() call enforces the hardware specification and prevents UB.


Build & Run

# CMakeLists.txt in section1/post05/
cmake_minimum_required(VERSION 3.16)
project(post05_alu)

set(SYSTEMC_HOME $ENV{SYSTEMC_HOME})
include_directories(${SYSTEMC_HOME}/include)
link_directories(${SYSTEMC_HOME}/lib-linux64)

add_executable(alu_tb alu_tb.cpp)
target_link_libraries(alu_tb systemc)
mkdir build && cd build
cmake .. && make
./alu_tb

Expected last line:

=== Results: 22 PASS, 0 FAIL ===

The return code from sc_main is 0 on all-pass, 1 if any test fails — so you can use ./alu_tb && echo "CI: PASS" || echo "CI: FAIL" in a CI script.


DV Insight

DV Insight Two subtleties that trip up every engineer writing their first SystemC ALU.

First: sc_uint<32> overflow is silent. 0xFFFFFFFF + 1 produces 0 with no exception, no flag, no warning. This matches hardware behavior — hardware adders wrap around silently too. But it means your test cases must explicitly cover overflow if you want to verify it. The test above includes ADD(0xFFFFFFFF, 1) for exactly this reason. If you only test with small values where overflow is impossible, you have not verified the overflow behavior.

Second: The zero flag is most naturally tested by doing SUB(x, x) — any value minus itself is zero. But you should also verify that SUB(x, y) with x != y produces zero = false. A broken implementation that always outputs zero = true would pass a test suite that only checks SUB(x, x). Test the negative case too.

Broader principle: A directed test that only covers "happy path" inputs is not a test suite — it is an optimistic demo. For the ALU, the interesting cases are: overflow, underflow, most-negative signed value, maximum unsigned value, shift by 0, shift by 31, and XOR with itself. These are the cases where wrong type choices (signed vs unsigned) produce wrong results. Design your directed tests around the boundaries, not the interior of the valid range.


Integration

This ALU is now a complete, tested hardware block. Its role in the RISC-V CPU:

Immediate next use (Post 6 and 7): The ALU becomes the DUT for the first structured testbench and the Section 1 capstone. We will add a monitor, a checker, and VCD waveform generation on top of the directed test here.

Post 8 (Register File): The register file produces the a and b operand values that feed into the ALU's inputs. Once both the ALU and the register file are complete, we have the two core compute elements.

Post 9 (Decoder): The decoder takes a 32-bit RV32I instruction word and produces the op value (along with a, b source selects and write-back control). The op encoding we defined here — ALU_ADD = 0, ALU_SUB = 1, etc. — will be what the decoder outputs.

Post 18 (Execute Stage, Single-Cycle CPU): The ALU is wired into the execute stage as-is. The forwarding unit (Post 20) adds multiplexers in front of the a and b inputs to handle pipeline hazards, but the ALU module itself does not change.

Posts 23–27 (UVM Testbench): The ALU is the first block we write a formal UVM-SystemC testbench for. The directed tests here become the starting point for a constrained-random test environment with a reference model.

Series progress:
- Post 1 — Modules, Ports & Signals ✓
- Post 2 — Simulation Time & Clocks ✓
- Post 3 — Delta Cycles & Event-Driven Semantics ✓
- Post 4 — SC_METHOD vs SC_THREAD ✓
- Post 5 — Building the RV32I ALU ✓
- Post 6 — Your First SystemC Testbench (next)


What's Next

Post 6: Your First SystemC Testbench

The ALU test in sc_main above is functional, but it mixes stimulus generation, checking, and result logging into one flat function. That works for 22 tests. It does not scale to 2,200 tests, does not support multiple independent checkers, and does not produce structured output that a CI system can parse.

Post 6 restructures this test into a proper three-component testbench: a Monitor that passively observes the DUT outputs, a Checker that compares observations against expected values, and a Stimulus Driver that generates inputs. It also adds sc_trace to dump VCD waveforms for GTKWave inspection.

This is the testbench pattern we will use for the entire CPU build — and it is the same pattern as a UVM agent, just without the UVM infrastructure overhead.

Post 6 → Your First SystemC Testbench

Author
Mayur Kubavat
VLSI Design and Verification Engineer sharing knowledge about SystemVerilog, UVM, and hardware verification methodologies.

Comments (0)

Leave a Comment