9. SystemC Tutorial - Instruction Decoder: RV32I Encoding

Introduction

The decoder is the brain of dispatch. Every instruction that enters the pipeline passes through it first, and everything downstream — the ALU, memory, register file, branch logic — acts on what the decoder says. Get the decoder wrong and the processor silently computes the wrong answer. No exception, no crash, just bad data.

RV32I uses fixed 32-bit instructions. The lower 7 bits (the opcode field) identify the instruction format; the remaining 25 bits carry operands, function codes, and immediates. This fixed-width property is a deliberate hardware-friendly choice. Compare it to x86, where instructions can be 1 to 15 bytes long. An x86 front end must first determine instruction length before it can decode operands — this requires a complex variable-length pre-decode stage. RISC-V eliminates that problem entirely: you always know before the first bit arrives that you have exactly 32 bits to decode.

More importantly, RV32I places rs1, rs2, and rd at the same bit positions across all instruction formats. The register address fields are:

  • rs1 always at bits [19:15]
  • rs2 always at bits [24:20]
  • rd always at bits [11:7]

This means hardware can extract register addresses and drive the register file read ports before the instruction type is known. The decode and register-read stages can overlap. The bit scrambling in B-type and J-type instructions (described below) exists solely to preserve this invariant — every rearrangement is a trade-off that keeps rs1/rs2/rd wires tappable at fixed positions regardless of format.

This post builds the complete combinational decoder for RV32I in SystemC. It is a pure SC_METHOD — no state, no clock, no latency. Instruction in, all control signals out, in the same delta cycle.


RV32I Instruction Formats

RV32I defines six instruction formats. Every 32-bit instruction belongs to exactly one of them.

Format [31:25] [24:20] [19:15] [14:12] [11:7] [6:0]
R-type funct7 rs2 rs1 funct3 rd opcode
I-type imm[11:5] imm[4:0] rs1 funct3 rd opcode
S-type imm[11:5] rs2 rs1 funct3 imm[4:0] opcode
B-type imm[12,10:5] rs2 rs1 funct3 imm[4:1,11] opcode
U-type imm[31:12] rd opcode
J-type imm[20,10:1,11,19:12] rd opcode

Detailed Bit Positions

R-type (register-register arithmetic):

[31:25] funct7   [24:20] rs2   [19:15] rs1   [14:12] funct3   [11:7] rd   [6:0] opcode

I-type (immediate arithmetic, loads, JALR):

[31:20] imm[11:0]   [19:15] rs1   [14:12] funct3   [11:7] rd   [6:0] opcode

S-type (stores):

[31:25] imm[11:5]   [24:20] rs2   [19:15] rs1   [14:12] funct3   [11:7] imm[4:0]   [6:0] opcode

B-type (branches) — note the bit scrambling:

[31] imm[12]   [30:25] imm[10:5]   [24:20] rs2   [19:15] rs1
[14:12] funct3   [11:8] imm[4:1]   [7] imm[11]   [6:0] opcode

U-type (LUI, AUIPC):

[31:12] imm[31:12]   [11:7] rd   [6:0] opcode

J-type (JAL) — scrambled again:

[31] imm[20]   [30:21] imm[10:1]   [20] imm[11]   [19:12] imm[19:12]   [11:7] rd   [6:0] opcode

Why B-type and J-type Are Scrambled

The scrambling is not arbitrary. Both B-type and J-type must keep rs1 at [19:15], rs2 at [24:20], and rd at [11:7] so that the register file read ports are driven by the same physical wires in every format.

For B-type: the immediate encodes a 13-bit signed byte offset (bit 0 is always 0, implicit). The encoder distributes the 12 significant bits around the rs1/rs2/rd fields without disturbing them. imm[11] lands at bit 7 (the MSB of where rd would be), and imm[12] (the sign bit) lands at bit 31 — this keeps sign extension simple (bit 31 is always the sign bit for all formats).

For J-type: the immediate is a 21-bit signed byte offset (bit 0 implicit). The 20 significant bits are distributed in a pattern that again keeps rd at [11:7] and the sign bit at [31].

The result: hardware muxes that tap register addresses see the same bit positions for every instruction. No format detection is needed before the register file is read. This is RISC-V's encoding philosophy — spend complexity in the assembler, save it in the hardware.

Reference: RISC-V ISA Specification v20191213, Section 2.3, "Immediate Encoding Variants."


Control Signals Reference

For each instruction group, the decoder asserts these control signals:

Instruction Group alu_op alu_src mem_read mem_write reg_write branch jump wb_sel
R-type (ADD/SUB…) varies REG 0 0 1 0 0 ALU
I-type ALU (ADDI…) varies IMM 0 0 1 0 0 ALU
LOAD (LW/LH/LB…) ADD IMM 1 0 1 0 0 MEM
STORE (SW/SH/SB) ADD IMM 0 1 0 0 0 ALU
BRANCH (BEQ/BNE…) SUB REG 0 0 0 1 0 ALU
JAL ADD IMM 0 0 1 0 1 PC4
JALR ADD IMM 0 0 1 0 1 PC4
LUI LUI IMM 0 0 1 0 0 ALU
AUIPC AUIPC IMM 0 0 1 0 0 ALU
FENCE/ECALL/EBREAK ADD REG 0 0 0 0 0 ALU

alu_src: REG = use rs2, IMM = use sign-extended immediate
wb_sel: ALU = write ALU result, MEM = write memory read data, PC4 = write PC+4 (return address)


Decoder Architecture

flowchart TD
    INSTR["instr[31:0]"]

    INSTR -->|"[6:0]"| OPCODE["opcode[6:0]"]
    INSTR -->|"[14:12]"| FUNCT3["funct3[14:12]"]
    INSTR -->|"[31:25]"| FUNCT7["funct7[31:25]"]
    INSTR -->|"[19:15]"| RS1["rs1_addr[19:15]"]
    INSTR -->|"[24:20]"| RS2["rs2_addr[24:20]"]
    INSTR -->|"[11:7]"| RD["rd_addr[11:7]"]

    OPCODE --> CTRL["Control Logic\n(switch on opcode)"]
    FUNCT3 --> CTRL
    FUNCT7 --> CTRL

    INSTR --> IMMGEN["Immediate Generator\n(format-specific extraction\n+ sign extension)"]
    OPCODE --> IMMGEN

    CTRL --> SIG1["alu_op[3:0]"]
    CTRL --> SIG2["alu_src"]
    CTRL --> SIG3["mem_read"]
    CTRL --> SIG4["mem_write"]
    CTRL --> SIG5["reg_write"]
    CTRL --> SIG6["branch"]
    CTRL --> SIG7["jump"]
    CTRL --> SIG8["wb_sel[1:0]"]
    CTRL --> SIG9["funct3[2:0] passthrough"]

    IMMGEN --> IMM["imm[31:0]"]

The decoder is purely combinational. All outputs update whenever instr changes. There is no registered state — the module is a wiring harness with combinational logic.


Complete Implementation

decoder.h

// decoder.h — RV32I Instruction Decoder
// SystemC 2.3.x compatible
#pragma once

#include <systemc.h>
#include <cstdint>

// ALU operation encoding — 4 bits, fits sc_uint<4>
enum class AluOp : uint8_t {
    ADD   = 0,   // rs1 + rs2 (or rs1 + imm)
    SUB   = 1,   // rs1 - rs2
    AND   = 2,   // rs1 & rs2
    OR    = 3,   // rs1 | rs2
    XOR   = 4,   // rs1 ^ rs2
    SLL   = 5,   // rs1 << rs2[4:0]
    SRL   = 6,   // rs1 >> rs2[4:0]  (logical)
    SRA   = 7,   // rs1 >> rs2[4:0]  (arithmetic)
    SLT   = 8,   // signed less-than
    SLTU  = 9,   // unsigned less-than
    LUI   = 10,  // pass imm directly (LUI)
    AUIPC = 11,  // PC + imm (AUIPC)
    PASS_B= 12   // pass second operand (for moves)
};

// Write-back source selection
enum class WbSel : uint8_t {
    ALU = 0,  // write ALU result to rd
    MEM = 1,  // write memory read data to rd
    PC4 = 2   // write PC+4 to rd (JAL/JALR link)
};

SC_MODULE(decoder) {
    // ---- Inputs ----
    sc_in<sc_uint<32>> instr;   // Raw 32-bit instruction word

    // ---- Register address outputs ----
    sc_out<sc_uint<5>> rs1_addr;  // Source register 1 address
    sc_out<sc_uint<5>> rs2_addr;  // Source register 2 address
    sc_out<sc_uint<5>> rd_addr;   // Destination register address

    // ---- Immediate output ----
    sc_out<sc_uint<32>> imm;      // Sign-extended immediate (all formats)

    // ---- ALU control ----
    sc_out<sc_uint<4>> alu_op;    // ALU operation (encoded as AluOp)
    sc_out<bool>       alu_src;   // false=use rs2, true=use imm

    // ---- Memory control ----
    sc_out<bool> mem_read;        // Assert for LOAD instructions
    sc_out<bool> mem_write;       // Assert for STORE instructions

    // ---- Register file control ----
    sc_out<bool> reg_write;       // Assert when rd should be written

    // ---- Branch/Jump control ----
    sc_out<bool> branch;          // Assert for conditional branches
    sc_out<bool> jump;            // Assert for JAL/JALR

    // ---- Write-back source ----
    sc_out<sc_uint<2>> wb_sel;    // Write-back mux select

    // ---- Function code passthrough ----
    sc_out<sc_uint<3>> funct3;    // Passed to memory unit (byte/halfword/word)

    SC_CTOR(decoder) {
        SC_METHOD(decode_proc);
        sensitive << instr;
    }

    void decode_proc();
};

decoder.cpp

// decoder.cpp — RV32I Instruction Decoder Implementation
// SystemC 2.3.x compatible
#include "decoder.h"

// ---- Opcode constants (bits [6:0]) ----
static constexpr uint8_t OP_R      = 0x33; // R-type: ADD, SUB, SLL, SLT, SLTU, XOR, SRL, SRA, OR, AND
static constexpr uint8_t OP_I_ALU  = 0x13; // I-type ALU: ADDI, SLTI, SLTIU, XORI, ORI, ANDI, SLLI, SRLI, SRAI
static constexpr uint8_t OP_LOAD   = 0x03; // Loads: LB, LH, LW, LBU, LHU
static constexpr uint8_t OP_STORE  = 0x23; // Stores: SB, SH, SW
static constexpr uint8_t OP_BRANCH = 0x63; // Branches: BEQ, BNE, BLT, BGE, BLTU, BGEU
static constexpr uint8_t OP_JAL    = 0x6F; // JAL
static constexpr uint8_t OP_JALR   = 0x67; // JALR
static constexpr uint8_t OP_LUI    = 0x37; // LUI
static constexpr uint8_t OP_AUIPC  = 0x17; // AUIPC
static constexpr uint8_t OP_SYSTEM = 0x73; // ECALL, EBREAK, FENCE

// ---- funct3 codes for R-type and I-type ALU ----
static constexpr uint8_t F3_ADD_SUB = 0x0; // ADD/SUB (R), ADDI (I)
static constexpr uint8_t F3_SLL     = 0x1; // SLL, SLLI
static constexpr uint8_t F3_SLT     = 0x2; // SLT, SLTI
static constexpr uint8_t F3_SLTU    = 0x3; // SLTU, SLTIU
static constexpr uint8_t F3_XOR     = 0x4; // XOR, XORI
static constexpr uint8_t F3_SR      = 0x5; // SRL/SRA, SRLI/SRAI (funct7 bit 30 distinguishes)
static constexpr uint8_t F3_OR      = 0x6; // OR, ORI
static constexpr uint8_t F3_AND     = 0x7; // AND, ANDI

void decoder::decode_proc() {
    // ---- Extract raw instruction ----
    uint32_t raw = (uint32_t)instr.read();

    // ---- Field extraction ----
    uint8_t  opcode = (uint8_t)(raw & 0x7F);          // [6:0]
    uint8_t  f3     = (uint8_t)((raw >> 12) & 0x7);   // [14:12]
    uint8_t  f7     = (uint8_t)((raw >> 25) & 0x7F);  // [31:25]
    uint8_t  rs1    = (uint8_t)((raw >> 15) & 0x1F);  // [19:15]
    uint8_t  rs2    = (uint8_t)((raw >> 20) & 0x1F);  // [24:20]
    uint8_t  rd     = (uint8_t)((raw >>  7) & 0x1F);  // [11:7]

    // ---- Register addresses — always valid, drive register file early ----
    rs1_addr.write(rs1);
    rs2_addr.write(rs2);
    rd_addr.write(rd);

    // ---- funct3 passthrough (memory unit uses this for width selection) ----
    funct3.write(f3);

    // ---- Default control signal values (safe/NOP-like) ----
    bool     s_alu_src   = false;
    bool     s_mem_read  = false;
    bool     s_mem_write = false;
    bool     s_reg_write = false;
    bool     s_branch    = false;
    bool     s_jump      = false;
    AluOp    s_alu_op    = AluOp::ADD;
    WbSel    s_wb_sel    = WbSel::ALU;
    uint32_t s_imm       = 0;

    // ---- Helper: decode ALU op from funct3/funct7 for R-type ----
    auto decode_r_alu = [&]() -> AluOp {
        switch (f3) {
            case F3_ADD_SUB: return (f7 & 0x20) ? AluOp::SUB  : AluOp::ADD;
            case F3_SLL:     return AluOp::SLL;
            case F3_SLT:     return AluOp::SLT;
            case F3_SLTU:    return AluOp::SLTU;
            case F3_XOR:     return AluOp::XOR;
            case F3_SR:      return (f7 & 0x20) ? AluOp::SRA  : AluOp::SRL;
            case F3_OR:      return AluOp::OR;
            case F3_AND:     return AluOp::AND;
            default:         return AluOp::ADD;
        }
    };

    // ---- Helper: decode ALU op from funct3 for I-type ALU ----
    // Note: SRAI vs SRLI distinguished by imm[10] (same bit position as funct7 bit 30)
    auto decode_i_alu = [&]() -> AluOp {
        switch (f3) {
            case F3_ADD_SUB: return AluOp::ADD;   // ADDI
            case F3_SLL:     return AluOp::SLL;   // SLLI
            case F3_SLT:     return AluOp::SLT;   // SLTI
            case F3_SLTU:    return AluOp::SLTU;  // SLTIU
            case F3_XOR:     return AluOp::XOR;   // XORI
            case F3_SR:      return (raw & (1u << 30)) ? AluOp::SRA : AluOp::SRL; // SRAI vs SRLI
            case F3_OR:      return AluOp::OR;    // ORI
            case F3_AND:     return AluOp::AND;   // ANDI
            default:         return AluOp::ADD;
        }
    };

    // ---- Immediate extraction with sign extension ----
    // I-type: 12-bit signed, bits [31:20]
    auto imm_i = [&]() -> uint32_t {
        return (uint32_t)(int32_t)(sc_int<12>)((int32_t)raw >> 20);
    };

    // S-type: 12-bit signed, bits [31:25] and [11:7]
    auto imm_s = [&]() -> uint32_t {
        uint32_t upper = (raw >> 25) & 0x7F; // [11:5]
        uint32_t lower = (raw >>  7) & 0x1F; // [4:0]
        uint32_t raw12 = (upper << 5) | lower;
        return (uint32_t)(int32_t)(sc_int<12>)(int32_t)raw12;
    };

    // B-type: 13-bit signed (bit 0 implicit zero), scrambled encoding
    //   imm[12]    = bit[31]
    //   imm[11]    = bit[7]
    //   imm[10:5]  = bits[30:25]
    //   imm[4:1]   = bits[11:8]
    //   imm[0]     = 0 (implicit, branch targets are 2-byte aligned)
    auto imm_b = [&]() -> uint32_t {
        uint32_t imm12 = ((raw >> 31) & 0x1) << 12
                       | ((raw >>  7) & 0x1) << 11
                       | ((raw >> 25) & 0x3F) << 5
                       | ((raw >>  8) & 0xF)  << 1;
        return (uint32_t)(int32_t)(sc_int<13>)(int32_t)imm12;
    };

    // U-type: 20-bit upper immediate, zero-padded at [11:0], no sign extension
    auto imm_u = [&]() -> uint32_t {
        return raw & 0xFFFFF000u;
    };

    // J-type: 21-bit signed (bit 0 implicit zero), scrambled encoding
    //   imm[20]    = bit[31]
    //   imm[10:1]  = bits[30:21]
    //   imm[11]    = bit[20]
    //   imm[19:12] = bits[19:12]
    //   imm[0]     = 0 (implicit)
    auto imm_j = [&]() -> uint32_t {
        uint32_t imm20 = ((raw >> 31) & 0x1)   << 20
                       | ((raw >> 21) & 0x3FF)  << 1
                       | ((raw >> 20) & 0x1)    << 11
                       | ((raw >> 12) & 0xFF)   << 12;
        return (uint32_t)(int32_t)(sc_int<21>)(int32_t)imm20;
    };

    // ---- Main decode switch ----
    switch (opcode) {

        case OP_R:
            s_alu_op   = decode_r_alu();
            s_alu_src  = false;   // use rs2
            s_reg_write= true;
            s_wb_sel   = WbSel::ALU;
            s_imm      = 0;       // R-type has no immediate
            break;

        case OP_I_ALU:
            s_alu_op   = decode_i_alu();
            s_alu_src  = true;    // use imm
            s_reg_write= true;
            s_wb_sel   = WbSel::ALU;
            s_imm      = imm_i();
            break;

        case OP_LOAD:
            s_alu_op   = AluOp::ADD;  // effective address = rs1 + imm
            s_alu_src  = true;
            s_mem_read = true;
            s_reg_write= true;
            s_wb_sel   = WbSel::MEM;
            s_imm      = imm_i();
            break;

        case OP_STORE:
            s_alu_op    = AluOp::ADD;  // effective address = rs1 + imm
            s_alu_src   = true;
            s_mem_write = true;
            s_reg_write = false;       // stores do not write rd
            s_wb_sel    = WbSel::ALU;
            s_imm       = imm_s();
            break;

        case OP_BRANCH:
            s_alu_op   = AluOp::SUB;  // subtract to compare
            s_alu_src  = false;        // compare rs1 vs rs2
            s_branch   = true;
            s_reg_write= false;
            s_wb_sel   = WbSel::ALU;
            s_imm      = imm_b();     // branch target offset
            break;

        case OP_JAL:
            s_alu_op   = AluOp::ADD;
            s_alu_src  = true;
            s_jump     = true;
            s_reg_write= true;
            s_wb_sel   = WbSel::PC4;  // rd = PC+4 (link register)
            s_imm      = imm_j();
            break;

        case OP_JALR:
            s_alu_op   = AluOp::ADD;
            s_alu_src  = true;
            s_jump     = true;
            s_reg_write= true;
            s_wb_sel   = WbSel::PC4;
            s_imm      = imm_i();
            break;

        case OP_LUI:
            s_alu_op   = AluOp::LUI;
            s_alu_src  = true;
            s_reg_write= true;
            s_wb_sel   = WbSel::ALU;
            s_imm      = imm_u();
            break;

        case OP_AUIPC:
            s_alu_op   = AluOp::AUIPC;
            s_alu_src  = true;
            s_reg_write= true;
            s_wb_sel   = WbSel::ALU;
            s_imm      = imm_u();
            break;

        case OP_SYSTEM:
            // FENCE, ECALL, EBREAK — treat as NOP for basic implementation
            s_alu_op   = AluOp::ADD;
            s_alu_src  = false;
            s_reg_write= false;
            s_wb_sel   = WbSel::ALU;
            s_imm      = imm_i();     // SYSTEM uses I-type encoding
            break;

        default:
            // Illegal opcode — all signals remain at safe defaults
            s_alu_op   = AluOp::ADD;
            s_alu_src  = false;
            s_reg_write= false;
            s_imm      = 0;
            break;
    }

    // ---- Drive all outputs ----
    alu_op.write((uint8_t)s_alu_op);
    alu_src.write(s_alu_src);
    mem_read.write(s_mem_read);
    mem_write.write(s_mem_write);
    reg_write.write(s_reg_write);
    branch.write(s_branch);
    jump.write(s_jump);
    wb_sel.write((uint8_t)s_wb_sel);
    imm.write(s_imm);
}

Testbench

// tb_decoder.cpp — RV32I Decoder Testbench
// Tests all 42 base RV32I instructions
// SystemC 2.3.x compatible
#include <systemc.h>
#include <iostream>
#include <cstdint>
#include <vector>
#include <string>
#include "decoder.h"

// ---- Test result tracking ----
static int pass_count = 0;
static int fail_count = 0;

// ---- Golden reference structure ----
struct DecoderExpected {
    uint32_t instr;          // Encoded 32-bit instruction
    std::string name;        // Mnemonic for error messages
    AluOp    alu_op;         // Expected ALU operation
    bool     alu_src;        // false=REG, true=IMM
    bool     mem_read;
    bool     mem_write;
    bool     reg_write;
    bool     branch;
    bool     jump;
    WbSel    wb_sel;
    int32_t  expected_imm;   // INT32_MIN means "do not check"
};

// ---- Instruction Encoders ----
// These replicate the assembler's encoding logic — useful to have
// locally so test vectors are self-documenting.

// R-type: funct7 | rs2 | rs1 | funct3 | rd | opcode
static uint32_t enc_r(uint8_t f7, uint8_t rs2, uint8_t rs1, uint8_t f3, uint8_t rd, uint8_t op) {
    return ((uint32_t)f7 << 25) | ((uint32_t)rs2 << 20) | ((uint32_t)rs1 << 15)
         | ((uint32_t)f3 << 12) | ((uint32_t)rd << 7) | op;
}

// I-type: imm[11:0] | rs1 | funct3 | rd | opcode
static uint32_t enc_i(int32_t imm, uint8_t rs1, uint8_t f3, uint8_t rd, uint8_t op) {
    return ((uint32_t)(imm & 0xFFF) << 20) | ((uint32_t)rs1 << 15)
         | ((uint32_t)f3 << 12) | ((uint32_t)rd << 7) | op;
}

// S-type: imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode
static uint32_t enc_s(int32_t imm, uint8_t rs2, uint8_t rs1, uint8_t f3, uint8_t op) {
    uint32_t upper = ((uint32_t)(imm >> 5) & 0x7F) << 25;
    uint32_t lower = ((uint32_t)(imm & 0x1F)) << 7;
    return upper | ((uint32_t)rs2 << 20) | ((uint32_t)rs1 << 15) | ((uint32_t)f3 << 12) | lower | op;
}

// B-type: imm[12|10:5] | rs2 | rs1 | funct3 | imm[4:1|11] | opcode
static uint32_t enc_b(int32_t imm, uint8_t rs2, uint8_t rs1, uint8_t f3, uint8_t op) {
    uint32_t b12  = ((uint32_t)(imm >> 12) & 0x1) << 31;
    uint32_t b11  = ((uint32_t)(imm >> 11) & 0x1) << 7;
    uint32_t b105 = ((uint32_t)(imm >>  5) & 0x3F) << 25;
    uint32_t b41  = ((uint32_t)(imm >>  1) & 0xF)  << 8;
    return b12 | b105 | ((uint32_t)rs2 << 20) | ((uint32_t)rs1 << 15) | ((uint32_t)f3 << 12) | b41 | b11 | op;
}

// U-type: imm[31:12] | rd | opcode
static uint32_t enc_u(int32_t imm, uint8_t rd, uint8_t op) {
    return ((uint32_t)(imm & 0xFFFFF000u)) | ((uint32_t)rd << 7) | op;
}

// J-type: imm[20|10:1|11|19:12] | rd | opcode
static uint32_t enc_j(int32_t imm, uint8_t rd, uint8_t op) {
    uint32_t b20   = ((uint32_t)(imm >> 20) & 0x1)   << 31;
    uint32_t b101  = ((uint32_t)(imm >>  1) & 0x3FF)  << 21;
    uint32_t b11   = ((uint32_t)(imm >> 11) & 0x1)   << 20;
    uint32_t b1912 = ((uint32_t)(imm >> 12) & 0xFF)  << 12;
    return b20 | b101 | b11 | b1912 | ((uint32_t)rd << 7) | op;
}

// ---- Check helper ----
static void check(const std::string& name, bool cond, const std::string& what) {
    if (!cond) {
        std::cerr << "  FAIL [" << name << "] " << what << "\n";
        fail_count++;
    }
}

static void run_test(decoder& dut,
                     sc_signal<sc_uint<32>>& s_instr,
                     sc_signal<sc_uint<5>>&  s_rs1, sc_signal<sc_uint<5>>& s_rs2,
                     sc_signal<sc_uint<5>>&  s_rd,
                     sc_signal<sc_uint<32>>& s_imm,
                     sc_signal<sc_uint<4>>&  s_alu_op,
                     sc_signal<bool>&        s_alu_src,
                     sc_signal<bool>&        s_mem_read,
                     sc_signal<bool>&        s_mem_write,
                     sc_signal<bool>&        s_reg_write,
                     sc_signal<bool>&        s_branch,
                     sc_signal<bool>&        s_jump,
                     sc_signal<sc_uint<2>>&  s_wb_sel,
                     const DecoderExpected& e)
{
    s_instr.write(e.instr);
    sc_start(1, SC_NS);

    bool ok = true;
    auto fail = [&](const std::string& msg) { check(e.name, false, msg); ok = false; };

    if ((uint8_t)s_alu_op.read() != (uint8_t)e.alu_op)
        fail("alu_op mismatch: got " + std::to_string((int)s_alu_op.read())
           + " want " + std::to_string((int)e.alu_op));
    if (s_alu_src.read()   != e.alu_src)   fail("alu_src mismatch");
    if (s_mem_read.read()  != e.mem_read)  fail("mem_read mismatch");
    if (s_mem_write.read() != e.mem_write) fail("mem_write mismatch");
    if (s_reg_write.read() != e.reg_write) fail("reg_write mismatch");
    if (s_branch.read()    != e.branch)    fail("branch mismatch");
    if (s_jump.read()      != e.jump)      fail("jump mismatch");
    if ((uint8_t)s_wb_sel.read() != (uint8_t)e.wb_sel)
        fail("wb_sel mismatch");
    if (e.expected_imm != INT32_MIN) {
        int32_t got_imm = (int32_t)(uint32_t)s_imm.read();
        if (got_imm != e.expected_imm)
            fail("imm mismatch: got " + std::to_string(got_imm)
               + " want " + std::to_string(e.expected_imm));
    }

    if (ok) {
        std::cout << "  PASS [" << e.name << "]\n";
        pass_count++;
    }
}

int sc_main(int argc, char* argv[]) {
    // ---- DUT signals ----
    sc_signal<sc_uint<32>> s_instr("instr");
    sc_signal<sc_uint<5>>  s_rs1("rs1_addr"), s_rs2("rs2_addr"), s_rd("rd_addr");
    sc_signal<sc_uint<32>> s_imm("imm");
    sc_signal<sc_uint<4>>  s_alu_op("alu_op");
    sc_signal<bool>        s_alu_src("alu_src");
    sc_signal<bool>        s_mem_read("mem_read"), s_mem_write("mem_write");
    sc_signal<bool>        s_reg_write("reg_write");
    sc_signal<bool>        s_branch("branch"), s_jump("jump");
    sc_signal<sc_uint<2>>  s_wb_sel("wb_sel");
    sc_signal<sc_uint<3>>  s_funct3("funct3");

    // ---- Instantiate DUT ----
    decoder dut("dut");
    dut.instr(s_instr);
    dut.rs1_addr(s_rs1); dut.rs2_addr(s_rs2); dut.rd_addr(s_rd);
    dut.imm(s_imm);
    dut.alu_op(s_alu_op);
    dut.alu_src(s_alu_src);
    dut.mem_read(s_mem_read); dut.mem_write(s_mem_write);
    dut.reg_write(s_reg_write);
    dut.branch(s_branch); dut.jump(s_jump);
    dut.wb_sel(s_wb_sel);
    dut.funct3(s_funct3);

    // ---- Register addresses used in tests ----
    // rs1=x1(ra), rs2=x2(sp), rd=x3(gp) for most tests
    const uint8_t R1=1, R2=2, R3=3;

    // ---- Golden reference table ----
    // Opcode constants mirrored here for readability
    const uint8_t OP_R=0x33, OP_I=0x13, OP_LD=0x03, OP_ST=0x23;
    const uint8_t OP_BR=0x63, OP_JAL=0x6F, OP_JALR=0x67;
    const uint8_t OP_LUI=0x37, OP_AUIPC=0x17, OP_SYS=0x73;

    //                         instr                 name     alu_op         src    mr     mw     rw     br     jmp    wbsel      imm
    std::vector<DecoderExpected> tests = {

        // ---- R-type (10 instructions) ----
        // ADD x3, x1, x2  → 0x00208133  (funct7=0, rs2=2, rs1=1, f3=0, rd=3, op=0x33)
        { enc_r(0x00,R2,R1,0x0,R3,OP_R), "ADD",  AluOp::ADD,  false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // SUB x3, x1, x2  → 0x40208133  (funct7=0x20, distinguishes from ADD)
        { enc_r(0x20,R2,R1,0x0,R3,OP_R), "SUB",  AluOp::SUB,  false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // SLL x3, x1, x2  → 0x00209133
        { enc_r(0x00,R2,R1,0x1,R3,OP_R), "SLL",  AluOp::SLL,  false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // SLT x3, x1, x2
        { enc_r(0x00,R2,R1,0x2,R3,OP_R), "SLT",  AluOp::SLT,  false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // SLTU x3, x1, x2
        { enc_r(0x00,R2,R1,0x3,R3,OP_R), "SLTU", AluOp::SLTU, false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // XOR x3, x1, x2
        { enc_r(0x00,R2,R1,0x4,R3,OP_R), "XOR",  AluOp::XOR,  false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // SRL x3, x1, x2  → funct7=0, funct3=5
        { enc_r(0x00,R2,R1,0x5,R3,OP_R), "SRL",  AluOp::SRL,  false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // SRA x3, x1, x2  → funct7=0x20, funct3=5 — CRITICAL: must differ from SRL
        { enc_r(0x20,R2,R1,0x5,R3,OP_R), "SRA",  AluOp::SRA,  false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // OR x3, x1, x2
        { enc_r(0x00,R2,R1,0x6,R3,OP_R), "OR",   AluOp::OR,   false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // AND x3, x1, x2
        { enc_r(0x00,R2,R1,0x7,R3,OP_R), "AND",  AluOp::AND,  false, false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // ---- I-type ALU (9 instructions) ----
        // ADDI x3, x1, 100   → imm=100=0x64
        { enc_i(100, R1,0x0,R3,OP_I),   "ADDI",  AluOp::ADD,  true,  false, false, true,  false, false, WbSel::ALU, 100       },

        // SLTI x3, x1, -1
        { enc_i(-1,  R1,0x2,R3,OP_I),   "SLTI",  AluOp::SLT,  true,  false, false, true,  false, false, WbSel::ALU, -1        },

        // SLTIU x3, x1, 1
        { enc_i(1,   R1,0x3,R3,OP_I),   "SLTIU", AluOp::SLTU, true,  false, false, true,  false, false, WbSel::ALU, 1         },

        // XORI x3, x1, 0xFF
        { enc_i(0xFF,R1,0x4,R3,OP_I),   "XORI",  AluOp::XOR,  true,  false, false, true,  false, false, WbSel::ALU, 0xFF      },

        // ORI x3, x1, 0x0F
        { enc_i(0x0F,R1,0x6,R3,OP_I),   "ORI",   AluOp::OR,   true,  false, false, true,  false, false, WbSel::ALU, 0x0F      },

        // ANDI x3, x1, 0x3F
        { enc_i(0x3F,R1,0x7,R3,OP_I),   "ANDI",  AluOp::AND,  true,  false, false, true,  false, false, WbSel::ALU, 0x3F      },

        // SLLI x3, x1, 4  → shamt in [24:20], imm[10]=0
        { enc_i(4,   R1,0x1,R3,OP_I),   "SLLI",  AluOp::SLL,  true,  false, false, true,  false, false, WbSel::ALU, 4         },

        // SRLI x3, x1, 4  → funct7 bit 30 = 0
        { enc_i(4,   R1,0x5,R3,OP_I),   "SRLI",  AluOp::SRL,  true,  false, false, true,  false, false, WbSel::ALU, 4         },

        // SRAI x3, x1, 4  → bit 30 of instruction set (imm[10]=1, so imm=0x404)
        { (uint32_t)0x40405193,          "SRAI",  AluOp::SRA,  true,  false, false, true,  false, false, WbSel::ALU, INT32_MIN },

        // ---- Loads (5 instructions) ----
        // LB  x3, 8(x1)
        { enc_i(8,   R1,0x0,R3,OP_LD),  "LB",    AluOp::ADD,  true,  true,  false, true,  false, false, WbSel::MEM, 8         },

        // LH  x3, 4(x1)
        { enc_i(4,   R1,0x1,R3,OP_LD),  "LH",    AluOp::ADD,  true,  true,  false, true,  false, false, WbSel::MEM, 4         },

        // LW  x3, 0(x1)
        { enc_i(0,   R1,0x2,R3,OP_LD),  "LW",    AluOp::ADD,  true,  true,  false, true,  false, false, WbSel::MEM, 0         },

        // LBU x3, -4(x1)
        { enc_i(-4,  R1,0x4,R3,OP_LD),  "LBU",   AluOp::ADD,  true,  true,  false, true,  false, false, WbSel::MEM, -4        },

        // LHU x3, 16(x1)
        { enc_i(16,  R1,0x5,R3,OP_LD),  "LHU",   AluOp::ADD,  true,  true,  false, true,  false, false, WbSel::MEM, 16        },

        // ---- Stores (3 instructions) ----
        // SB  x2, 8(x1)
        { enc_s(8,   R2,R1,0x0,OP_ST),  "SB",    AluOp::ADD,  true,  false, true,  false, false, false, WbSel::ALU, 8         },

        // SH  x2, 4(x1)
        { enc_s(4,   R2,R1,0x1,OP_ST),  "SH",    AluOp::ADD,  true,  false, true,  false, false, false, WbSel::ALU, 4         },

        // SW  x2, -8(x1)
        { enc_s(-8,  R2,R1,0x2,OP_ST),  "SW",    AluOp::ADD,  true,  false, true,  false, false, false, WbSel::ALU, -8        },

        // ---- Branches (6 instructions) ----
        // BEQ  x1, x2, +16
        { enc_b(16,  R2,R1,0x0,OP_BR),  "BEQ",   AluOp::SUB,  false, false, false, false, true,  false, WbSel::ALU, 16        },

        // BNE  x1, x2, +16
        { enc_b(16,  R2,R1,0x1,OP_BR),  "BNE",   AluOp::SUB,  false, false, false, false, true,  false, WbSel::ALU, 16        },

        // BLT  x1, x2, -8
        { enc_b(-8,  R2,R1,0x4,OP_BR),  "BLT",   AluOp::SUB,  false, false, false, false, true,  false, WbSel::ALU, -8        },

        // BGE  x1, x2, +32
        { enc_b(32,  R2,R1,0x5,OP_BR),  "BGE",   AluOp::SUB,  false, false, false, false, true,  false, WbSel::ALU, 32        },

        // BLTU x1, x2, +64
        { enc_b(64,  R2,R1,0x6,OP_BR),  "BLTU",  AluOp::SUB,  false, false, false, false, true,  false, WbSel::ALU, 64        },

        // BGEU x1, x2, +128
        { enc_b(128, R2,R1,0x7,OP_BR),  "BGEU",  AluOp::SUB,  false, false, false, false, true,  false, WbSel::ALU, 128       },

        // ---- JAL ----
        // JAL x3, +1024
        { enc_j(1024,R3,OP_JAL),         "JAL",   AluOp::ADD,  true,  false, false, true,  false, true,  WbSel::PC4, 1024      },

        // ---- JALR ----
        // JALR x3, 8(x1)
        { enc_i(8,   R1,0x0,R3,OP_JALR),"JALR",  AluOp::ADD,  true,  false, false, true,  false, true,  WbSel::PC4, 8         },

        // ---- LUI ----
        // LUI x3, 0x12345000 — upper 20 bits = 0x12345
        { enc_u(0x12345000, R3, OP_LUI), "LUI",   AluOp::LUI,  true,  false, false, true,  false, false, WbSel::ALU, (int32_t)0x12345000 },

        // ---- AUIPC ----
        // AUIPC x3, 0x00001000
        { enc_u(0x00001000, R3, OP_AUIPC),"AUIPC",AluOp::AUIPC,true, false, false, true,  false, false, WbSel::ALU, 0x00001000},

        // ---- SYSTEM (FENCE, ECALL, EBREAK) ----
        // FENCE — 0x0000000F
        { 0x0000000Fu,                   "FENCE", AluOp::ADD,  false, false, false, false, false, false, WbSel::ALU, INT32_MIN },

        // ECALL — 0x00000073
        { 0x00000073u,                   "ECALL", AluOp::ADD,  false, false, false, false, false, false, WbSel::ALU, INT32_MIN },

        // EBREAK — 0x00100073
        { 0x00100073u,                   "EBREAK",AluOp::ADD,  false, false, false, false, false, false, WbSel::ALU, INT32_MIN },
    };

    // ---- Run all tests ----
    std::cout << "=== RV32I Decoder Testbench ===\n";
    std::cout << "Testing " << tests.size() << " instruction encodings...\n\n";

    for (const auto& t : tests) {
        run_test(dut, s_instr, s_rs1, s_rs2, s_rd, s_imm,
                 s_alu_op, s_alu_src, s_mem_read, s_mem_write,
                 s_reg_write, s_branch, s_jump, s_wb_sel, t);
    }

    std::cout << "\n=== Summary ===\n";
    std::cout << "PASS: " << pass_count << "  FAIL: " << fail_count << "\n";

    if (fail_count > 0) {
        std::cout << "RESULT: FAIL — decoder has bugs\n";
        return 1;
    }
    std::cout << "RESULT: PASS — all " << pass_count << " instructions correct\n";
    return 0;
}

DV Insight

DV Insight Decoder bugs are silent — a wrong control signal does not crash, it silently computes the wrong result. The two most dangerous confusions in the RV32I decoder are:

SRAI vs SRLI: Both use opcode 0x13 and funct3 0x5. They differ only in bit 30 of the instruction word (imm[10] in the immediate field). A testbench that only tests SRLI will never catch a decoder that asserts AluOp::SRL for both. You must encode both SRLI x1, x1, 4 (0x00405093) and SRAI x1, x1, 4 (0x40405093) and verify that alu_op differs between them.

SUB vs ADD: Both use opcode 0x33 and funct3 0x0. They differ only in funct7 bit 30. A testbench that only tests ADD will never expose a broken SUB path. Encode ADD x3, x1, x2 (0x00208133) and SUB x3, x1, x2 (0x40208133) as adjacent test cases, and explicitly assert AluOp::ADD vs AluOp::SUB.

The golden reference struct pattern used in this testbench forces you to pre-commit expected values before writing the DUT. This is the same principle as reference model-driven verification in UVM — the golden model is defined independently of the implementation.


SystemVerilog to SystemC Translation

Concept SystemVerilog SystemC
Combinational decode always_comb case(opcode) SC_METHOD sensitive << instr
Immediate sign-extend {{20{instr[31]}}, instr[31:20]} (sc_int<32>)(sc_int<12>)(raw >> 20)
Named constants localparam OP_R = 7'h33 constexpr uint8_t OP_R = 0x33
Enum for control typedef enum logic [3:0] {...} AluOp enum class AluOp : uint8_t {...}
Bit slice instr[19:15] (raw >> 15) & 0x1F
Part-select assign ctrl.alu_op = R_OP alu_op.write((uint8_t)s_alu_op)
Default case default: alu_op = ADD default: s_alu_op = AluOp::ADD

The most important translation is the immediate sign extension. In SystemVerilog, {{20{instr[31]}}, instr[31:20]} is a concatenation of 20 copies of the sign bit followed by the 12-bit field — the language handles sign extension as a structural operation on bits. In SystemC, (sc_int<32>)(sc_int<12>)(raw >> 20) chains two casts: the shift places the 12-bit field at the LSBs, the sc_int<12> cast interprets it as a signed 12-bit value, and the outer sc_int<32> cast sign-extends it to 32 bits. Both approaches produce identical results; the SystemC version is arguably more explicit about the two-step nature of the operation.


Industry Reference

Krste Asanović's RISC-V design rationale documents that fixed-width encoding reduces pre-decode logic by approximately 40% compared to x86 on equivalent manufacturing nodes. The saving comes entirely from eliminating the variable-length length-detection stage. The instruction stream can be word-aligned by construction; no scanning for prefix bytes.

For comparison, the ARM Thumb-2 ISA used in Cortex-M cores is mixed 16/32-bit. The decode stage must first examine bits [15:11] of each halfword to determine whether the next instruction is 16 or 32 bits. This "instruction width detect" pass adds a pipeline stage in some implementations or requires look-ahead buffering in others. Cortex-M3 handles this with a prefetch buffer and decode pipeline; the complexity is real but manageable for the code-density benefit.

The Western Digital SweRV EH1 core (RV32IMC) adds Compressed instruction support (the C extension) to the base RV32I decode. Compressed instructions are 16-bit with a different encoding. The decode extension is straightforward: check bits [1:0] of the instruction word. If both are 1 the instruction is 32-bit standard encoding; any other combination indicates a 16-bit compressed instruction. In the SweRV implementation this becomes an additional outer switch case before the opcode decode shown above. The decoder described in this post handles the standard 32-bit subset; adding C-extension support would require:

if ((raw & 0x3) != 0x3) {
    // 16-bit compressed instruction — decode from lower 16 bits
    decode_compressed(raw & 0xFFFF);
} else {
    // 32-bit standard instruction — existing switch(opcode)
    switch (opcode) { ... }
}

The full compressed encoding table maps 16-bit encodings to their 32-bit equivalents; the control signal outputs are the same — the decoder expands the instruction before generating control.


SystemC Language Reference

Construct Syntax SV/Verilog Equivalent Key Difference
Combinational process SC_METHOD(decode_proc); sensitive << instr; always_comb SV auto-detects sensitivity list from RHS; SystemC requires every input listed manually
32-bit instruction word sc_in<sc_uint<32>> instr input logic [31:0] instr Identical semantics; SystemC template, SV range declaration
Typed enum for control signals enum class AluOp : uint8_t { ADD=0, ... } typedef enum logic [3:0] { ADD=4'd0, ... } alu_op_t enum class requires AluOp::ADD prefix; SV enum allows bare ADD
Named opcode constants static constexpr uint8_t OP_R = 0x33 localparam logic [6:0] OP_R = 7'h33 Both compile-time constants; SystemC is C++ constexpr, SV is localparam
Sign-extend 12-bit to 32-bit (sc_int<32>)(sc_int<12>)(raw >> 20) {{20{instr[31]}}, instr[31:20]} SystemC uses chained casts; SV uses bit replication and concatenation
Sign-extend 13-bit (B-type) (sc_int<32>)(sc_int<13>)(int32_t)imm12 {{19{imm[12]}}, imm[12:1], 1'b0} Same two-step: narrow cast to interpret sign, wide cast to extend
Sign-extend 21-bit (J-type) (sc_int<32>)(sc_int<21>)(int32_t)imm20 {{11{imm[20]}}, imm[20:1], 1'b0} Same pattern at 21-bit width
5-bit register address sc_out<sc_uint<5>> rs1_addr output logic [4:0] rs1_addr Same semantics; template vs. range
Writing an enum value to a port alu_op.write((uint8_t)s_alu_op) alu_op = s_alu_op (if widths match) C++ enum class requires explicit cast; SV enum can assign to matching logic width

Pure Combinational SC_METHOD — Formal Rules

A correctly written combinational SC_METHOD is the SystemC equivalent of always_comb. Five rules that, if violated, produce incorrect simulation or synthesis results:

Rule 1: ALL inputs must appear in the sensitivity list.
The sensitivity list tells the SystemC scheduler which signal changes should trigger this process. Any input omitted from the list means the process does not re-run when that input changes — the output will be stale. SystemC provides no warning for incomplete lists. For decode_proc, only instr is in the list because all other fields are derived from instr inside the function.

Rule 2: ALL outputs must be assigned in EVERY code path.
If a particular opcode path fails to assign mem_read, mem_read retains its value from the previous invocation — equivalent to an inferred latch. The decoder avoids this with the "set safe defaults first" pattern: all control signals are initialized to safe values (false/ADD/ALU) before the switch statement, so every path through the switch is guaranteed to have a complete output assignment.

Rule 3: NEVER call wait() inside an SC_METHOD.
wait() suspends a process and advances simulation time. An SC_METHOD cannot be suspended — it must run to completion in one scheduler activation. Calling wait() inside an SC_METHOD causes a runtime abort: "wait() is not allowed in SC_METHOD processes." If your combinational function needs to wait, you have misidentified the process type — use SC_THREAD.

Rule 4: Must be a pure function of current inputs.
No side effects, no reads from member variables that change between calls (other than through ports). The decoder reads only instr.read() and derives everything else from it in local variables. This is equivalent to the synthesis tool's assumption: the output at time T is determined entirely by the inputs at time T.

Rule 5: Must complete in finite, statically-bounded time.
No loops whose bound depends on runtime input values. for (int i = 0; i < opcode_defined_count; i++) is not synthesizable — the bound must be a compile-time constant. All loops in decode_proc are either absent or have fixed bounds.

Comparison with SV always_comb:

Property SV always_comb SystemC SC_METHOD
Sensitivity list Auto-computed from RHS expressions Must be manually listed
Warning on incomplete list Tool warning (or error) Silently incorrect
Inferred latch detection Tool warning if output not assigned on all paths Output holds last value — no warning
wait() / #delay allowed No — simulation error No — runtime abort
State (memory of previous call) Not allowed (latch if present) Not prevented; will cause incorrect synthesis

The "forgotten output = latch" trap:

// WRONG: mem_read not assigned in OP_R case
// mem_read holds its previous value → inferred latch
switch (opcode) {
    case OP_R:
        alu_op.write((uint8_t)AluOp::ADD);
        reg_write.write(true);
        // mem_read: not touched — takes previous value
        break;
    case OP_LOAD:
        mem_read.write(true);
        break;
}

// CORRECT: set safe defaults before the switch
bool s_mem_read = false;  // ← safe default for all paths
switch (opcode) {
    case OP_R:  /* mem_read stays false — correct */   break;
    case OP_LOAD: s_mem_read = true;                   break;
}
mem_read.write(s_mem_read);  // ← always driven

The "set defaults first" pattern in decode_proc is exactly this fix applied to all 9 control outputs simultaneously.


enum class vs. SV typedef enum

SystemC (C++11 strongly-typed enum):

enum class AluOp : uint8_t { ADD=0, SUB=1, AND=2, OR=3, XOR=4,
                              SLL=5, SRL=6, SRA=7, SLT=8, SLTU=9 };
// Strongly typed: AluOp::ADD cannot be compared with plain 0 without a cast

SystemVerilog (weakly-typed enum):

typedef enum logic [3:0] { ADD=4'd0, SUB=4'd1, AND_OP=4'd2 } alu_op_t;
// Weakly typed: ADD can be assigned to logic [3:0] directly

Practical differences:

Property C++ enum class SV typedef enum
Prefix requirement AluOp::ADD mandatory Bare ADD works anywhere
Integer comparison Compile error without explicit cast ADD == 0 compiles and works
Underlying type Explicit: : uint8_t Implied by logic [3:0] width
Port write alu_op.write((uint8_t)s_alu_op) — cast required alu_op = s_alu_op — direct if widths match
Name collision Safe — AluOp::ADD never conflicts with other ADD Potential collision with other enum members in scope

The enum class prefix requirement is a safety feature: if (op == 0) is a compile error, forcing you to write if (op == AluOp::ADD). This catches the common mistake of comparing against a raw integer constant when you should be comparing against a named enum value. In SystemVerilog, ADD == 0 compiles silently — whether that is a feature or a footgun depends on the context.

The underlying type uint8_t controls what the port carries. sc_uint<4> on the port holds 4 bits; uint8_t fits in 4 bits for the opcodes used. The explicit cast (uint8_t)s_alu_op when writing the port is required precisely because enum class does not implicitly convert to its underlying type.


Immediate Sign Extension — The SystemC Cast Chain

The sign-extension idiom appears four times in the decoder (I, S, B, J types). Here is the step-by-step breakdown for I-type:

// I-type: sign-extend bits[31:20] to 32 bits
uint32_t raw = (uint32_t)instr.read();
int32_t i_imm = (int32_t)(int32_t)(sc_int<12>)((int32_t)raw >> 20);

Or equivalently, using the lambda in the implementation:

auto imm_i = [&]() -> uint32_t {
    return (uint32_t)(int32_t)(sc_int<12>)((int32_t)raw >> 20);
};

What the compiler does, step by step:

  1. (int32_t)raw >> 20 — arithmetic right shift of the raw 32-bit word by 20 positions, placing the 12-bit immediate field in bits [11:0] with the sign bit replicated into [31:12]
  2. (sc_int<12>) — reinterpret the lower 12 bits as a signed 12-bit integer, discarding the upper 20 bits
  3. (int32_t) — sign-extend the 12-bit signed value to 32 bits: if bit 11 of the immediate was 1 (negative), bits [31:12] of the result are all 1s; if bit 11 was 0 (positive), they are all 0s
  4. (uint32_t) — reinterpret the signed 32-bit value as unsigned for port writing (the bit pattern is unchanged)

Comparison with SystemVerilog:

logic signed [31:0] i_imm = {{20{instr[31]}}, instr[31:20]};
// Step 1: {instr[31:20]} — extract 12-bit field
// Step 2: {20{instr[31]}} — replicate sign bit 20 times
// Step 3: concatenate — produces 32-bit sign-extended value

Both produce identical 32-bit two's complement values. The SystemC cast chain is shorter to write but requires understanding the intermediate types. The SystemVerilog replication syntax is longer but self-documenting — you can read "replicate bit 31 twenty times" directly from the code.

The same pattern for B-type (13-bit) and J-type (21-bit):

// B-type: after assembling 13-bit value in imm12:
return (uint32_t)(int32_t)(sc_int<13>)(int32_t)imm12;

// J-type: after assembling 21-bit value in imm20:
return (uint32_t)(int32_t)(sc_int<21>)(int32_t)imm20;

The width in the sc_int<N> cast matches the number of significant bits before sign extension. Change this width and you get a different sign bit — a subtle bug that only manifests on negative immediates.


Common Pitfalls for SV Engineers — Extended

Five pitfalls that arise specifically in the instruction decoder:

Pitfall 1: SC_METHOD sensitivity only has instr — adding more module inputs requires updating the list.
If you add a port to the decoder module (e.g., a pc input for AUIPC computation), that new port must be added to sensitive << instr << pc. SV always_comb handles this automatically. SystemC requires you to remember to update the list every time the module interface changes. A missing signal in the sensitivity list means the combinational function does not re-run when that signal changes — silent incorrect output.

Pitfall 2: SRAI vs. SRLI — only funct7 bit 5 distinguishes them.
Both instructions use opcode 0x13 and funct3 = 0x5. Only bit 30 of the instruction word (imm[10] in the I-type encoding) distinguishes them. A decoder that only tests SRLI will never expose a bug where SRAI is decoded as SRLI. The test suite includes both 0x00405093 (SRLI x1, x1, 4) and 0x40405093 (SRAI x1, x1, 4) as adjacent tests with different expected alu_op outputs.

Pitfall 3: S-type immediate bits must be assembled from two separate fields.
S-type immediate: bits [11:5] from raw[31:25], bits [4:0] from raw[11:7]. A common bug is reading raw[31:20] (the I-type extraction) for S-type stores — this gives the wrong immediate because the S-type splits the immediate around the rs2 field. Always implement S-type as two separate extractions assembled into one value before sign extension:

uint32_t upper = (raw >> 25) & 0x7F;  // [11:5]
uint32_t lower = (raw >>  7) & 0x1F;  // [4:0]
uint32_t raw12 = (upper << 5) | lower;

Pitfall 4: Missing default: case in the opcode switch leaves outputs at their previous values.
If the switch has no default: case and an unknown opcode arrives (e.g., from a corrupted instruction stream), all output signals retain their values from the last invocation. In simulation this is a latch; in synthesis it is a source of uninitialized path warnings. Always include a default: that zeros or NOPs all outputs, even if "illegal opcode" should never occur in a correct program.

Pitfall 5: Passthrough fields are as important as computed control signals.
The decoder does not just compute control signals — it also passes raw instruction fields directly downstream: rs1_addr, rs2_addr, rd_addr, and funct3. These are not decoded; they are wired through. Missing any one of them means a downstream module (register file, memory unit) cannot access a field it needs. A common mistake is implementing the full control signal decode but forgetting to drive funct3 — the memory unit uses funct3 to select byte/halfword/word access width, and it will silently use the wrong width if funct3 is not driven.


What's Next

Post 10 adds the Program Counter and Instruction Memory — the modules that fetch one instruction per cycle and drive this decoder's instr input. The PC increments by 4 every cycle for sequential execution; branches and jumps redirect it using the imm and jump/branch signals this decoder produces. With the register file (Post 8), the ALU (Post 5), and this decoder all built, we have every functional unit needed for a single-cycle CPU — which comes together in Post 12.

Author
Mayur Kubavat
VLSI Design and Verification Engineer sharing knowledge about SystemVerilog, UVM, and hardware verification methodologies.

Comments (0)

Leave a Comment