Architecture & Design Verification: 3. SystemC Tutorial - Delta Cycles & Event-Driven Semantics

Introduction

"Why doesn't my output update immediately after I write a signal?"

This is the single most common question from engineers who are new to SystemC. You call sig_in.write(true), you immediately read sig_out.read(), and you get the old value back. You add a sc_start(SC_ZERO_TIME) call. Now the first stage updates — but the second stage is still wrong. You add another sc_start(SC_ZERO_TIME). Now everything looks correct, but you're not sure why, and you're worried you're masking a bug rather than understanding the model.

The answer is delta cycles — and understanding them is not optional. It is the foundation of every multi-process simulation you will ever write in SystemC.

Delta cycles exist because SystemC uses the same evaluate-update model as real hardware. In silicon, a signal driven by one flip-flop does not arrive at the next flip-flop instantaneously — it propagates through wires and combinational logic over a finite (if tiny) time. In a simulator, zero-time propagation through multiple stages still requires an ordering mechanism. If stage 1 writes a signal and stage 2 reads that signal in the same instant, which process runs first? The evaluate-update model answers this by separating the act of computing a new value (evaluate phase) from the act of making that value visible to other processes (update phase). A delta cycle is one pass through that pair of phases.

This is not a SystemC quirk. Every hardware simulator you have used — ModelSim, VCS, Xcelium, Questa, Verilator — uses exactly this model. The IEEE 1364 Verilog standard defines it. The IEEE 1800 SystemVerilog standard defines it. IEEE 1666 (SystemC) defines it. They all converge on the same evaluate-update loop because they are all modeling the same physical reality.

Consider the forwarding logic in a real RISC-V pipeline. When the EX stage computes a result and the MEM stage needs that value in the same clock cycle, the forwarding path must propagate through several levels of combinational logic — a multiplexer, a comparator, routing wires — before settling at the MEM stage input. In silicon this takes maybe 200–400ps. In a simulator it takes delta cycles: each level of logic that re-evaluates is one delta. The simulator produces the correct stable value before it advances the clock, just as silicon produces the correct stable value before the next clock edge samples it.

Post 3 builds two concrete models. First, a two-stage combinational chain that makes delta cycle propagation visible in your terminal output — you will see stage 1 and stage 2 both fire within a single sc_start() call. Second, a producer-consumer handshake using sc_event, which is the explicit event mechanism that underlies transaction-level communication. The producer-consumer pattern we build here is structurally identical to the fetch→decode handshake in our RISC-V CPU at Post 10.

Prerequisites

Completed Post 1 — the pass_through module must be working. SC_METHOD, sc_signal, and port binding are used throughout this post.
Post 1 — Modules, Ports & Signals
Completed Post 2 — the dff module must be working. Familiarity with sc_start(), SC_ZERO_TIME, and sc_time_stamp() is assumed.
Post 2 — Simulation Time & Clocks
SystemC 2.3.x installed — see the install guides linked in Post 1
Code for this post: GitHub — section1/post03

Translation Table

Before the concept explanations, here is the mapping from what you already know:

Concept	C++ Engineer	SystemVerilog Engineer
Delta cycle	Sub-timestep evaluation round	Delta delay (same concept, same name)
`sc_event`	`std::condition_variable`	`event` keyword
`event.notify()`	`cv.notify_one()`	`-> event_name`
`wait(event)`	`cv.wait()` (blocking in sim, not in OS)	`@(event_name)`
Evaluate phase	"run all ready functions"	`always` block execution
Update phase	"apply buffered writes"	Non-blocking assignment update (`#0` semantics)
`sc_signal.write()`	Buffered write — not visible until update phase	Non-blocking assignment `<=`

The key insight for C++ engineers: sc_signal.write() is not like a normal variable assignment. It does not change the visible value immediately. It enqueues the new value for the update phase. If you want the value you just wrote to be visible to another process, a delta cycle must complete first.

The key insight for SystemVerilog engineers: delta cycles in SystemC work exactly as they do in SystemVerilog. If you have used non-blocking assignments and wondered why a chain of always blocks triggered by the same event all see the old value until the #0 update, this is the same mechanism. SystemC makes it more explicit — you can see it in the sc_start() call boundaries — but the semantics are identical.

Concept Explanation

SystemC Language Reference

Every delta-cycle and event construct used in this post:

Construct	Syntax	SV Equivalent	Key Difference
`sc_event`	`sc_event my_ev;`	`event my_ev;`	No associated value — pure occurrence signal. Unlike `sc_signal`, it does not hold state between firings
`notify()` (immediate)	`my_ev.notify()`	`-> my_ev`	Fires in the CURRENT evaluate phase; triggered processes run in the same evaluation cycle as the notifier
`notify(SC_ZERO_TIME)`	`my_ev.notify(SC_ZERO_TIME)`	`#0 -> my_ev` (approximately)	Fires in the NEXT delta — one evaluate-update cycle later; avoids immediate notification races
`notify(T, unit)`	`my_ev.notify(10, SC_NS)`	`fork #10 -> my_ev; join_none`	Fires at a future simulation timestamp; no direct single-statement SV equivalent
`wait(event)`	`wait(my_ev)`	`@(my_ev)`	Suspends SC_THREAD until event fires; no timeout — waits indefinitely
`value_changed_event()`	`sig.value_changed_event()`	`@(sig)` (any change)	Returns the event that fires whenever `sig`'s committed value changes
`posedge_event()`	`sig.posedge_event()`	`@(posedge sig)`	Returns the event for 0→1 transition of a boolean signal
`negedge_event()`	`sig.negedge_event()`	`@(negedge sig)`	Returns the event for 1→0 transition
`sc_delta_count()`	`sc_delta_count()`	No equivalent	Returns the current delta counter (implementation-specific; for debugging only)
Evaluate phase	Kernel internal	`always` block execution round	Phase where all runnable processes execute and call `write()`
Update phase	Kernel internal	NBA update region	Phase where all queued `write()` values are committed to signals
Delta cycle	One evaluate+update pair	"delta delay"	Occurs at constant simulation time T; simulation time does not advance during delta cycles

1. The Evaluate-Update Cycle — Fully Explained

The SystemC simulation kernel operates as a loop. At each simulation time step, it does not simply "run all processes." It runs two distinct phases, potentially many times, before advancing the clock:

Evaluate phase: All processes (SC_METHOD or SC_THREAD) whose sensitivity lists have triggered run to completion. During this phase, a process may call signal.write(new_value). That write is buffered — the signal's visible value does not change yet. Other processes reading the signal in the same evaluate phase see the old value.

Update phase: All buffered writes are committed atomically. Signals now reflect their new values. The kernel checks: did any signal change? If yes, some process may be sensitive to that change and needs to run — so the kernel triggers another evaluate phase. This is one delta cycle.

Stability: When an evaluate-update pair completes with no new signal changes, the system has reached a stable state at this time step. The kernel advances to the next scheduled time event.

The kernel's inner loop as pseudocode:

while (simulation_not_finished):

  // ── EVALUATE PHASE ──────────────────────────────────────────────────────
  while (runnable_processes_exist):
    process = ready_queue.pop()
    process.execute()
    // Inside execute():
    //   signal.read()  → returns signal.m_cur_val (current committed value)
    //   signal.write() → stores into signal.m_new_val ONLY (buffered)
    //   event.notify() → adds triggered processes to ready_queue

  // ── UPDATE PHASE ────────────────────────────────────────────────────────
  changed_signals = []
  for each signal in written_signals_this_phase:
    if signal.m_new_val != signal.m_cur_val:
      signal.m_cur_val = signal.m_new_val      // commit the write
      changed_signals.append(signal)
      // fire signal.value_changed_event()
      // → adds processes sensitive to this signal to ready_queue

  // ── DELTA DECISION ──────────────────────────────────────────────────────
  if ready_queue is not empty:
    delta_count++
    // Loop back to EVALUATE (still at timestamp T)
  else:
    // System is stable at timestamp T
    // Advance simulation time to next_scheduled_event
    T = event_queue.pop_earliest_time()

This is not a SystemC invention. IEEE 1666 (SystemC), IEEE 1364 (Verilog), and IEEE 1800 (SystemVerilog) all specify essentially the same loop. The evaluate-update model exists because hardware physics requires it: gates propagate signals in finite time, and a simulator must provide an ordering mechanism that respects that causality even when modeling zero-delay ideal gates.

Relationship to Verilog's scheduling regions:

Verilog defines multiple scheduling regions within a single timestep: Active, NBA (non-blocking assignment), Observed, Reactive, Postponed. SystemC collapses this to evaluate + update, which maps directly to Verilog's Active + NBA regions. The correspondence is:

Verilog Region	SystemC Phase	What happens
Active	Evaluate	Blocking assignments, `always @(*)` execution, function calls
NBA	Update	Non-blocking assignment (`<=`) commits
Observed	—	SV assertions (no direct SystemC equivalent)
Reactive	—	SV program blocks (no direct SystemC equivalent)

SystemC also has a concept of "delta notification" vs "immediate notification" for sc_event, which maps roughly to the distinction between scheduling events in the Active vs. NBA region.

2. Delta Cycles — Concrete Example with Step-by-Step Numbering

Consider a three-process chain: A drives B, B drives C, C drives the output.

Input → [Process A] → sig_b → [Process B] → sig_c → [Process C] → Output

In hardware, this represents three levels of combinational logic. Each level adds propagation delay. In the simulator, each level takes one delta cycle.

Why delta cycles are necessary:

Without delta cycles, the simulator would need to run processes in strict topological order — A before B before C — based on the signal dependency graph. For combinational logic this is computable (Verilator does it). But for feedback loops, mutual dependencies, and dynamically-created process graphs, strict topological ordering is impossible. The evaluate-update model solves this by never requiring topological ordering: all processes can run in any order during evaluate, and the update phase makes writes visible atomically.

Step-by-step execution of the three-stage chain:

Initial state at timestamp T:
  sig_b.m_cur_val = old_b
  sig_c.m_cur_val = old_c
  output.m_cur_val = old_out

Input signal changes to new_input (from testbench write):
  input.m_cur_val = new_input  (testbench writes take effect after sc_start)
  Process A is in ready_queue (sensitive to input)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
EVALUATE Δ0 at timestamp T
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Process A runs:
    reads input.m_cur_val = new_input
    calls sig_b.write(transform(new_input))
    sig_b.m_new_val = transform(new_input)  ← BUFFERED, not yet visible
  Process A returns.

  Process B does NOT run — sig_b.m_cur_val hasn't changed yet.

UPDATE Δ0:
  sig_b.m_cur_val ← sig_b.m_new_val = transform(new_input)
  sig_b changed → value_changed_event fires
  Process B added to ready_queue (sensitive to sig_b)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
EVALUATE Δ1 at timestamp T
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Process B runs:
    reads sig_b.m_cur_val = transform(new_input)  ← sees NEW value
    calls sig_c.write(transform2(sig_b))
    sig_c.m_new_val = transform2(sig_b)  ← BUFFERED

UPDATE Δ1:
  sig_c.m_cur_val ← sig_c.m_new_val
  sig_c changed → value_changed_event fires
  Process C added to ready_queue

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
EVALUATE Δ2 at timestamp T
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Process C runs:
    reads sig_c.m_cur_val = transform2(sig_b)  ← sees NEW value
    calls output.write(final_value)

UPDATE Δ2:
  output.m_cur_val ← final_value
  output changed → value_changed_event fires
  No processes sensitive to output → ready_queue is empty

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STABLE at timestamp T
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  output.m_cur_val = final_value (correct, fully propagated)
  Simulation time advances to next scheduled event.

The key observation: All three delta cycles happen inside a single sc_start(SC_ZERO_TIME) call. From the caller's perspective, one function call → system is stable → all values are correct. You do not call sc_start once per delta cycle.

ASCII timeline — three-stage chain:

Timestamp T (sc_time_stamp() returns T throughout all deltas):

          Δ0             Δ1             Δ2          Stable
          ├──────────────┼──────────────┼──────────────┤
EVALUATE  A runs         B runs         C runs
          write(sig_b)   write(sig_c)   write(output)
UPDATE    sig_b commits  sig_c commits  output commits
          B→ready        C→ready        no one→ready

sig_b:    old_b──────────new_b──────────────────────────
sig_c:    old_c──────────────────new_c──────────────────
output:   old_out─────────────────────────final_value───

Time:     ────────────────────────────── T ─────────────►
          (sc_time_stamp() == T for ALL of these delta cycles)

Equivalence to Verilog's always-chain:

// Three-stage chain in SystemVerilog
always @(input)  sig_b <= transform(input);
always @(sig_b)  sig_c <= transform2(sig_b);
always @(sig_c)  output <= transform3(sig_c);

When input changes in Verilog, the first always block fires (Active region), computes sig_b's new value via NBA. After the NBA region, sig_b is committed — the second always fires, computes sig_c, and so on. This is delta-cycle propagation in Verilog's terminology. Each always firing is one delta. The Verilog simulator handles N levels of always blocks in N delta cycles — exactly the SystemC behavior.

flowchart TD
    A[Simulation time T] --> B[Evaluate Phase\nRun all ready processes]
    B --> C{Any signals\nchanged?}
    C -->|Yes — delta cycle| B
    C -->|No — stable| D[Advance to next\ntime event]
    D --> A
    style B fill:#06b6d4,color:#fff
    style C fill:#f59e0b,color:#fff
    style D fill:#10b981,color:#fff

Notice that the evaluate phase loops back to itself, not forward to a new time step. That loop is the delta cycle. Each iteration is "delta N" at the same simulation timestamp T. From the perspective of sc_time_stamp(), the time does not change during delta cycles — they all occur at time T. The delta counter is an internal kernel bookkeeping number that is not directly exposed to user code in standard SystemC (though some implementations provide sc_delta_count() as a debugging aid).

3. Two-Stage Chain: Where Delta Cycles Become Visible

Consider a combinational chain: A → stage1 → B → stage2 → C.

Stage 1 is an SC_METHOD sensitive to signal A. Stage 2 is an SC_METHOD sensitive to signal B.

When A changes at time T:

Delta	What happens
Evaluate Δ0	Stage 1 runs. It reads `A` (new value) and calls `B.write(new_value)`. Write is buffered. Stage 2 does NOT run yet — `B` has not officially changed.
Update Δ0	`B`'s new value is committed. The kernel sees `B` changed. Stage 2 is sensitive to `B`.
Evaluate Δ1	Stage 2 runs. It reads `B` (new value) and calls `C.write(new_value)`. Write is buffered.
Update Δ1	`C`'s new value is committed. No more sensitive processes.
Stable	System is stable at time `T`. Both `B` and `C` hold the correct propagated value.

From your perspective as a user calling sc_start(SC_ZERO_TIME), both delta cycles happen inside that one call. When sc_start returns, the system is stable — C already has the correct value. You do not need to call sc_start once per delta cycle. The kernel handles all internal delta iterations automatically.

This is the most important thing to understand before reading Example 1 below.

flowchart LR
    A([A\nchanges at T]) --> |Δ0 evaluate| S1[Stage 1\nreads A\nwrites B ← buffered]
    S1 --> |Δ0 update| B([B\ncommitted])
    B --> |Δ1 evaluate| S2[Stage 2\nreads B\nwrites C ← buffered]
    S2 --> |Δ1 update| C([C\ncommitted\nstable])
    style A fill:#f59e0b,color:#fff
    style B fill:#06b6d4,color:#fff
    style C fill:#10b981,color:#fff
    style S1 fill:#334155,color:#fff
    style S2 fill:#334155,color:#fff

4. sc_event Theory — Explicit Event Signaling

sc_signal<T> carries an implicit event: the signal fires its value_changed_event automatically whenever its committed value changes. This is what makes sensitive << my_signal work.

sc_event is a separate, explicit event object with no associated value. It is a pure occurrence — a moment in time that something happened. It has no history, no current state, no "has this event fired recently" query. Use sc_event when you need to signal occurrence, not value.

sc_signal vs. sc_event — key behavioral differences:

Property	`sc_signal<T>`	`sc_event`
Carries a value	Yes — `read()` returns current value	No — pure occurrence
Persists between firings	Yes — value holds until next write	No — event is instantaneous
Automatic notification	Yes — fires on value change only	No — must call `notify()` explicitly
Can notify unconditionally	No — write same value = no event	Yes — `notify()` always fires
Multiple readers	Yes — any process can `read()`	Via `wait(event)` — process suspends until event
Hardware analogy	Wire / net	Interrupt pulse / strobe

The three forms of notify() — semantics matter:

sc_event e;

// FORM 1: Immediate notification
e.notify();
// Fires NOW, in the CURRENT evaluate phase.
// Any process calling wait(e) that is ALREADY suspended on this event
// is added to the ready queue and runs in THIS delta cycle's evaluate phase.
// Processes that call wait(e) AFTER this notify() in a later delta will
// NOT see this notification — it has already passed.

// FORM 2: Delta notification
e.notify(SC_ZERO_TIME);
// Fires in the NEXT delta cycle.
// This is equivalent to: schedule a value_changed_event for e
// at the next evaluate phase. Processes waiting on e will wake up
// in the next evaluate phase after the current one.

// FORM 3: Timed notification
e.notify(10, SC_NS);
// Fires 10 simulation nanoseconds from now.
// The kernel places this notification in the future event queue.
// No direct single-statement equivalent in SV.

Mapping to SystemVerilog event semantics:

Notification form	SystemC	SystemVerilog
Immediate	`e.notify()`	`-> e`
Next delta	`e.notify(SC_ZERO_TIME)`	`#0 -> e` (approximately)
Timed	`e.notify(10, SC_NS)`	`fork #10 -> e; join_none`
Wait for event	`wait(e)` in SC_THREAD	`@(e)`
Multiple events (OR)	`wait(e1 \\| e2)`	`@(e1 or e2)`
Multiple events (AND)	No direct primitive — use flags	`@(e1) @(e2)` sequentially

When immediate notify() can cause an infinite loop:

SC_MODULE(bad_example) {
  sc_event my_event;

  void process_a() {
    // This process is triggered by my_event
    // ...some work...
    my_event.notify();   // BUG: immediate notification re-triggers this same process!
    // next delta: process_a runs again → notifies again → infinite delta loop
    // Simulator detects this and aborts with "delta cycle limit exceeded"
  }

  SC_CTOR(bad_example) {
    SC_THREAD(process_a);
    // process_a waits on my_event somewhere inside
  }
};

If a process notifies the same event it is sensitive to using immediate notify(), the kernel will keep re-triggering the process indefinitely. Use notify(SC_ZERO_TIME) to break the cycle — the process completes first, then the event fires in the next delta, giving the kernel a chance to check stability:

void process_a() {
  // ...some work...
  my_event.notify(SC_ZERO_TIME);  // CORRECT: fires in next delta, not current
}

sc_event vs sc_signal — which to use:

// Use sc_signal<bool> when the value matters:
sc_signal<bool> ready;
ready.write(true);   // other modules can read ready.read() at any time

// Use sc_event when the occurrence matters, not the value:
sc_event data_valid;
data_valid.notify(); // signals "data is ready NOW" — no persistent state

The practical rule: if any module needs to query "is the signal currently high?" at an arbitrary time (polling), use sc_signal<bool>. If modules only need to react to the moment of occurrence (event-driven), use sc_event. In our RISC-V CPU, instruction-ready handshakes between fetch and decode use sc_event (react to occurrence), while the pipeline flush flag uses sc_signal<bool> (decode stage polls whether flush is currently active).

Simulation Semantics — The Scheduler from First Principles

The SystemC scheduler is the most important piece of infrastructure in this entire tutorial series. Every behavioral nuance — why values appear when they do, why processes run in a certain order, why sc_start(SC_ZERO_TIME) settles everything — flows from understanding the scheduler.

Here is the scheduler written as complete pseudocode, including both the evaluate-update loop and the event queue management:

// ── GLOBAL SCHEDULER STATE ───────────────────────────────────────────────
event_queue:    sorted list of (time, sc_event) pairs — future events
ready_queue:    list of processes to run in current evaluate phase
delta_count:    integer counter, resets at each new timestamp
current_time:   sc_time — current simulation timestamp

// ── INITIALIZATION ────────────────────────────────────────────────────────
// (runs before first sc_start() call)
for each SC_METHOD registered:
  add to ready_queue  // all methods run once at t=0 for initialization

// ── MAIN SIMULATION LOOP ─────────────────────────────────────────────────
while (simulation_not_stopped):

  // EVALUATE PHASE
  while ready_queue is not empty:
    process = ready_queue.dequeue()
    process.execute()
    // execute() calls:
    //   signal.read()   → reads signal.m_cur_val
    //   signal.write(v) → stores v in signal.m_new_val; marks signal as dirty
    //   event.notify()  → immediate: adds waiting processes to ready_queue
    //                     SC_ZERO_TIME: adds event to delta_event_queue
    //                     timed: adds event to future event_queue

  // UPDATE PHASE
  delta_events_fired = []
  for each dirty_signal:
    old_val = dirty_signal.m_cur_val
    dirty_signal.m_cur_val = dirty_signal.m_new_val
    if dirty_signal.m_cur_val != old_val:
      // Signal value actually changed — notify waiters
      for each process in dirty_signal.waiters:
        ready_queue.enqueue(process)
      delta_events_fired.append(dirty_signal.value_changed_event)

  // DELTA DECISION
  if ready_queue is not empty or delta_event_queue is not empty:
    // Process delta events
    for each event in delta_event_queue:
      for each process waiting on event:
        ready_queue.enqueue(process)
    delta_event_queue.clear()
    delta_count++
    // Loop back to EVALUATE at same current_time

  else:
    // System is stable at current_time
    // Advance to next future event
    if event_queue is empty:
      break  // simulation ends — no more events

    next_time, next_event = event_queue.pop_minimum()
    current_time = next_time
    delta_count = 0

    // Wake all processes waiting on next_event
    for each process waiting on next_event:
      ready_queue.enqueue(process)

This pseudocode is faithful to IEEE 1666. Real implementations add thread scheduling, process priorities, and optimization (Verilator-style static scheduling), but the observable behavior matches this model.

Why this model is correct for hardware:

In real CMOS gates, when an input changes, the output changes after the gate's propagation delay. Gates that are inputs to the changed gate then change after their propagation delays. This creates a wave of changes propagating forward through the combinational logic cloud. In ideal (zero-delay) simulation, all those propagation delays collapse to zero real time but must still be ordered. Delta cycles provide that ordering: each "wave front" of changes is one delta cycle. The simulator runs waves until no new changes occur — exactly as silicon settles to a new stable state after an input transition.

Expanded Translation: Verilog, SystemVerilog, and SystemC Side by Side

Concept	Classic Verilog	Modern SystemVerilog	SystemC
Deferred signal update	`b <= a + 1` (non-blocking)	`b <= a + 1` (same)	`sig_b.write(sig_a.read() + 1)` always deferred
Immediate update	`b = a + 1` (blocking, within one always)	`b = a + 1` (same, procedural)	Local C++ variable: `int b = sig_a.read() + 1`
Sensitivity list	`always @(a or b)`	`always_comb` (auto-sensitivity)	`sensitive << a << b` (manual, always)
Auto-sensitivity	Not in Verilog-95; `@*` in Verilog-2001	`always_comb` infers automatically	Not supported — must explicitly list
Named events	`event done; -> done; @(done)`	Same, plus `triggered` property	`sc_event done; done.notify(); wait(done)`
Timed event	`#10 -> done`	`fork #10 -> done; join_none`	`done.notify(10, SC_NS)`
Delta cycle count	Not user-accessible	`$get_time_precision` (time only)	`sc_delta_count()` (impl-specific)
Combinational chain depth	N always blocks = N delta cycles	Same	N SC_METHODs = N delta cycles
Circular comb dependency	Oscillates until delta limit	Same (simulation error)	Same — kernel detects and aborts
Wait for condition	`wait (sig == 1)` (Verilog task)	`wait (sig == 1)` (native)	`while(sig.read()!=1) wait(sig.value_changed_event())`
Concurrent threads	`fork ... join_none`	`fork ... join_none` + `class` + `task`	Multiple `SC_THREAD` registrations

Implementation

Example 1: Two-Stage Combinational Chain

This example makes the delta cycle propagation visible in the terminal. Read the output carefully — both stages fire within the single sc_start(SC_ZERO_TIME) call.

// File: two_stage_chain.cpp
#include <systemc.h>

SC_MODULE(two_stage_chain) {
  sc_in<bool>  in_sig;
  sc_out<bool> mid_sig;
  sc_out<bool> out_sig;

  sc_signal<bool> internal; // intermediate signal between the two stages

  // Stage 1: combinational — propagates in_sig → internal
  void stage1() {
    internal.write(in_sig.read());
    std::cout << "  stage1 fired: internal will be " << in_sig.read()
              << " (write buffered, not yet visible)" << std::endl;
  }

  // Stage 2: combinational — propagates internal → out_sig
  void stage2() {
    out_sig.write(internal.read());
    mid_sig.write(internal.read());
    std::cout << "  stage2 fired: out_sig will be " << internal.read()
              << " (write buffered, not yet visible)" << std::endl;
  }

  SC_CTOR(two_stage_chain) {
    SC_METHOD(stage1); sensitive << in_sig;
    SC_METHOD(stage2); sensitive << internal;
  }
};

int sc_main(int argc, char* argv[]) {
  sc_signal<bool> sig_in, sig_mid, sig_out;

  two_stage_chain dut("dut");
  dut.in_sig(sig_in);
  dut.mid_sig(sig_mid);
  dut.out_sig(sig_out);

  // Initialize to known state
  sig_in.write(false);
  sc_start(SC_ZERO_TIME); // settle — both stages fire once during this call
  std::cout << "[t=0 settled] in=" << sig_in.read()
            << " mid=" << sig_mid.read()
            << " out=" << sig_out.read() << std::endl;

  // Drive input high — BOTH stages will fire inside this single sc_start call
  std::cout << "\n--- Driving in=true ---\n";
  sig_in.write(true);
  sc_start(SC_ZERO_TIME); // stage1 fires in delta 1, stage2 fires in delta 2 — all inside here
  std::cout << "[after sc_start] in=" << sig_in.read()
            << " mid=" << sig_mid.read()
            << " out=" << sig_out.read() << std::endl;

  return 0;
}

Expected output:

  stage1 fired: internal will be 0 (write buffered, not yet visible)
  stage2 fired: out_sig will be 0 (write buffered, not yet visible)
[t=0 settled] in=0 mid=0 out=0

--- Driving in=true ---
  stage1 fired: internal will be 1 (write buffered, not yet visible)
  stage2 fired: out_sig will be 1 (write buffered, not yet visible)
[after sc_start] in=1 mid=1 out=1

PEDAGOGICAL NOTE — this is the most important thing to take away from this example:

Look at the output for the in=true drive. Both stage1 fired and stage2 fired print before the final state line. This means both stages ran within the single sc_start(SC_ZERO_TIME) call. Stage 1 fired in one internal delta, stage 2 fired in the next internal delta, and by the time sc_start returned, the system had settled through both deltas automatically.

You might expect that you need to call sc_start(SC_ZERO_TIME) once to propagate through stage 1, and again to propagate through stage 2. This is incorrect. One sc_start(SC_ZERO_TIME) call drives the SystemC kernel until stability — which means it runs as many internal evaluate-update cycles (delta cycles) as necessary. For a two-stage chain, it runs two internal deltas. For a ten-stage chain, it would run ten. You never need to call sc_start once per delta cycle.

The labels "delta 1" and "delta 2" that you might write in comments or diagrams are pedagogical labels — conceptual markers that help you reason about propagation order. They are not values you can read from a counter in standard user code. The SystemC kernel tracks them internally. What you observe as a user is: one sc_start call → system converges → correct values everywhere.

Example 2: sc_event Producer-Consumer Handshake

This example shows sc_event as an explicit synchronization mechanism between two modules. The producer writes data to a signal and fires an event. The consumer blocks on that event and reads the data when it fires. No polling. No shared state beyond the event object and the signal.

These RISC-V instruction encodings are real — 0x00500113 is ADDI x2, x0, 5 and 0x00A00193 is ADDI x3, x0, 10. We will parse these exact encodings in our decode stage at Post 11.

// File: producer_consumer.cpp
#include <systemc.h>

SC_MODULE(producer) {
  sc_out<sc_uint<32>> data;
  sc_event            data_ready; // explicit event — fires when data is valid

  void produce() {
    // Transaction 1: ADDI x2, x0, 5  (RV32I encoding)
    data.write(0x00500113);
    data_ready.notify();    // fire event immediately — data is on the bus now
    wait(20, SC_NS);

    // Transaction 2: ADDI x3, x0, 10  (RV32I encoding)
    data.write(0x00A00193);
    data_ready.notify();
    wait(20, SC_NS);

    sc_stop();
  }

  SC_CTOR(producer) {
    SC_THREAD(produce);
  }
};

SC_MODULE(consumer) {
  sc_in<sc_uint<32>> data;
  sc_event&          data_ready; // reference to producer's event — no copy

  void consume() {
    while (true) {
      wait(data_ready); // suspend here until producer calls data_ready.notify()
      std::cout << "[" << sc_time_stamp() << "] Received instruction: 0x"
                << std::hex << data.read() << std::dec << std::endl;
    }
  }

  SC_HAS_PROCESS(consumer);
  consumer(sc_module_name name, sc_event& ev)
    : sc_module(name), data_ready(ev) {
    SC_THREAD(consume);
  }
};

int sc_main(int argc, char* argv[]) {
  sc_signal<sc_uint<32>> instr_bus;

  producer prod("prod");
  prod.data(instr_bus);

  // Pass producer's event to consumer at construction time
  consumer cons("cons", prod.data_ready);
  cons.data(instr_bus);

  sc_start(); // run until sc_stop() is called
  return 0;
}

Expected output:

[0 ns] Received instruction: 0x500113
[20 ns] Received instruction: 0xa00193

The 20ns gap between the two transactions is exactly the wait(20, SC_NS) in the producer. The consumer fires immediately when the event arrives — there is no polling delay, no wasted cycles. This is the efficiency advantage of event-driven simulation over polling loops.

Note: This producer-consumer handshake is a preview of the instruction fetch mechanism in Post 10. The data_ready event is structurally identical to the handshake signal between the fetch unit and the decode stage in our RISC-V CPU. When the fetch unit retrieves an instruction from memory, it will fire an event to wake the decode stage — exactly the pattern shown here.

Build & Run

Both examples use the same CMake structure established in Posts 1 and 2. Create a CMakeLists.txt in section1/post03/:

cmake_minimum_required(VERSION 3.16)
project(post03_delta_cycles)

set(SYSTEMC_HOME $ENV{SYSTEMC_HOME})
include_directories(${SYSTEMC_HOME}/include)
link_directories(${SYSTEMC_HOME}/lib-linux64)

add_executable(two_stage_chain two_stage_chain.cpp)
target_link_libraries(two_stage_chain systemc)

add_executable(producer_consumer producer_consumer.cpp)
target_link_libraries(producer_consumer systemc)

mkdir build && cd build
cmake .. && make

./two_stage_chain
./producer_consumer

If stage 2 does not fire in your two_stage_chain output, check that internal is declared as sc_signal<bool>, not a plain bool. A plain bool has no event mechanism — stage 2's sensitivity list will never trigger.

Verification

The producer-consumer example is its own timing verification. The 20ns gap between the two output lines confirms that:

The consumer blocked correctly on wait(data_ready) and did not spin
The producer's wait(20, SC_NS) advanced simulation time exactly as expected
The event fired at the correct time (immediately after data.write() + notify())

For a more rigorous checker, you can attach a monitor SC_THREAD that also waits on data_ready and asserts that the received instruction matches an expected sequence:

SC_MODULE(checker) {
  sc_in<sc_uint<32>> data;
  sc_event& data_ready;

  sc_uint<32> expected[2] = {0x00500113, 0x00A00193};
  int count = 0;

  void check() {
    while (count < 2) {
      wait(data_ready);
      sc_assert(data.read() == expected[count]);
      std::cout << "[PASS] Transaction " << count
                << ": 0x" << std::hex << data.read() << std::dec << std::endl;
      count++;
    }
  }

  SC_HAS_PROCESS(checker);
  checker(sc_module_name name, sc_event& ev)
    : sc_module(name), data_ready(ev) {
    SC_THREAD(check);
  }
};

sc_assert terminates the simulation with an error message if the condition is false. This is the lightweight equivalent of a UVM uvm_error — no framework overhead, just a condition check that halts on failure.

sc_event is the foundation of transaction-level modeling (TLM). When we reach Posts 14–17, TLM ports and sockets are built on this same event mechanism, extended with standardized payload types and timing protocols. Understanding sc_event now makes TLM-2.0 straightforward later.

DV Insight

Signal vs. Event — Choosing the Right Tool

sc_signal and sc_event are both synchronization mechanisms, but they serve different purposes:

Use sc_signal<T> when:
- You are modeling a physical wire or bus that carries a value
- Multiple modules need to read the current state at any time
- The value persists between events (it holds its last-written value)
- Example: data buses, address lines, control flags

Use sc_event when:
- You are signaling that something happened, not communicating a value
- The event is transient — "pulse" semantics, not "level" semantics
- You need a handshake or acknowledgment between two specific modules
- Example: "transaction complete", "pipeline flush requested", "interrupt asserted"

In our RISC-V CPU, most inter-stage communication uses sc_signal for data (the instruction word, the PC value, the ALU result) and sc_event for control (pipeline stall requests, flush signals, instruction-ready handshakes). This mirrors how real CPU designs separate data paths from control paths — a clean architectural boundary that makes verification much easier.

The Three Forms of notify() — Getting It Right

The timing of notify() matters:

event.notify();              // Immediate: fires in current evaluate phase
event.notify(SC_ZERO_TIME);  // Next delta: fires at next delta boundary
event.notify(10, SC_NS);     // Timed: fires 10ns from now

For the producer-consumer pattern shown in Example 2, immediate notify() after a data write is correct. The sequence is:

data.write(0x00500113) — buffered write to signal
data_ready.notify() — event fires immediately
Update phase runs — data signal is committed
Consumer's wait(data_ready) wakes up — reads committed data value

If you used notify(SC_ZERO_TIME) instead, the event fires one delta after the immediate case, but the data write commits in the same delta as the signal update — so the consumer still sees the correct value. Both are safe here.

Where it becomes dangerous: if you call notify(SC_ZERO_TIME) and then the consumer reads the signal before the delta where the write commits, it will see stale data. This is a classic race condition in SystemC that notify(SC_ZERO_TIME) can introduce. When in doubt: write the data, notify immediately, let the evaluate-update cycle handle the ordering.

Common Pitfalls for SV Engineers

Delta cycles and sc_event are the #1 source of hard-to-find bugs for engineers coming from SystemVerilog. These five pitfalls cover the most common failure modes.

Pitfall 1: Immediate notify() inside a process triggered by that same event — infinite delta loop.

void process_a() {
  // ...
  my_event.notify();   // fires immediately → this process re-triggers immediately
                       // → fires again → re-triggers → ...
                       // Simulator detects delta cycle limit and aborts:
                       // "delta cycle limit of 10000 exceeded"
}

SC_CTOR(bad) {
  SC_METHOD(process_a);
  sensitive << my_event;  // BUG: same event triggers and is notified here
}

Fix: use notify(SC_ZERO_TIME) to fire the event in the next delta. The process completes, the update phase runs, then the event fires in the next evaluate phase — giving the kernel a cycle to check stability.

Pitfall 2: sc_signal only fires value_changed_event when the VALUE changes — writing the same value does nothing.

sc_signal<bool> flag;
flag.write(true);    // fires value_changed_event (false → true)
flag.write(true);    // DOES NOT fire value_changed_event (value unchanged)
flag.write(false);   // fires value_changed_event (true → false)

This surprises engineers used to SV's always @(posedge clk) where every posedge fires regardless. With sc_signal<bool>, if you write true to an already-true signal, no event fires and no sensitive process re-runs. If you need unconditional notification, use sc_event::notify() — it always fires regardless of any value.

This distinction matters for reset signals that stay asserted: if reset is true and you write true again (e.g., during testbench initialization), processes sensitive to the reset signal will not re-run.

Pitfall 3: Delta cycle depth with deeply cascaded combinational logic.

A chain of N SC_METHOD processes connected by N-1 intermediate sc_signal objects takes N delta cycles to propagate a change from input to output. For a 10-stage pipeline hazard detection unit with 10 levels of combinational checks, this is 10 delta cycles — all happening inside one sc_start(SC_ZERO_TIME) call, and all correct.

The danger: circular combinational dependencies. If stage A writes to sig_b and stage B writes to sig_a, and both are sensitive to each other's outputs:

A → writes sig_b → B wakes up → writes sig_a → A wakes up → writes sig_b → ...

The kernel will run delta cycles until it hits the maximum (default 10,000) and aborts. In hardware, circular combinational logic is a design error (latch inferred or oscillation). In SystemC, the kernel catches it and reports the delta limit exceeded. This is a design bug, not a SystemC bug.

Pitfall 4: Don't confuse posedge_event() with value_changed_event() for boolean signals.

sc_signal<bool> sig;

// value_changed_event fires on ANY transition (0→1 OR 1→0):
wait(sig.value_changed_event());   // wakes on both rising and falling

// posedge_event fires ONLY on 0→1:
wait(sig.posedge_event());         // wakes only on rising edge

// negedge_event fires ONLY on 1→0:
wait(sig.negedge_event());         // wakes only on falling edge

SV engineers translating @(posedge rst_n) (wait for reset deassert) must use wait(rst_n.posedge_event()), not wait(rst_n.value_changed_event()) — the latter would wake on both assertion and deassertion.

Pitfall 5: Reading a signal inside the update phase — it doesn't happen and you can't do it.

User code (process bodies) always runs in the evaluate phase. The update phase is kernel-internal — you never write code that executes during the update phase. The kernel calls your process functions during evaluate, and during evaluate, signal.read() always returns the current committed value (m_cur_val).

The implication: if two processes A and B both run in the same evaluate phase (Δ0), and A calls sig.write(v), B's call to sig.read() returns the value BEFORE A's write, even if A ran first. There is no process ordering within an evaluate phase that makes A's write visible to B in the same delta. B will only see A's write after the update phase — in delta Δ1, when B re-runs due to the value_changed_event.

This is correct and intentional. It prevents process-ordering races: all processes see the same consistent snapshot of signal values within an evaluate phase, regardless of the order they execute.

Integration

Delta cycles are what make the RISC-V forwarding unit in Post 19 correct. The EX stage computes an ALU result at simulation time T (within one clock cycle). Before the clock edge that advances to time T + 10ns, that result must propagate through the forwarding multiplexers to the MEM stage's operand inputs. In silicon, this happens in 200–400ps of combinational propagation delay. In SystemC, it happens in delta cycles — each forwarding path level that re-evaluates is one delta, all within the same sc_time timestamp.

When you model the forwarding unit as a set of SC_METHOD processes sensitive to the EX stage's output signals, the SystemC kernel automatically runs the evaluate-update loop until the forwarded values settle at the MEM stage inputs — before the clock edge SC_METHODs fire. You do not need to manually sequence these. The evaluate-update model handles it, just as propagation delay handles it in silicon.

The same applies to pipeline flush signals. When a branch misprediction is detected in the MEM stage (Post 21), a flush event fires. The fetch and decode stage SC_METHODs that are sensitive to the flush signal re-evaluate within the same clock cycle's delta iterations, clearing the pipeline registers before the next clock edge latches the new state. Delta cycles make this zero-time correction possible.

Series progress:
- Post 1 — Modules, Ports & Signals ✓
- Post 2 — Simulation Time & Clocks ✓
- Post 3 — Delta Cycles & Event-Driven Semantics ✓
- Post 4 — SC_METHOD vs SC_THREAD (next)

Section 1 foundation is taking shape. The two_stage_chain module will be reused directly in Post 7 as the basis for the combinational ALU interconnect, and the producer-consumer event pattern reappears in Post 10 as the fetch-unit handshake. The concepts in this post are not introductory scaffolding — they are the mechanisms we will rely on through the entire CPU build.

What's Next

Post 4: SC_METHOD vs SC_THREAD

Now that we understand when processes fire (sensitivity lists, delta cycles, event notifications), Post 4 covers which process type to use and why getting this choice wrong causes either deadlocks or missed events.

The rules are not arbitrary. SC_METHOD processes must run to completion without blocking — they model combinational logic and clocked registers. SC_THREAD processes can call wait() and suspend — they model sequential behavior, testbench drivers, and monitors. Using SC_METHOD where you need blocking behavior causes a runtime error. Using SC_THREAD where you need strict sensitivity-list semantics causes subtle missed-event bugs that are very hard to find.

Post 4 builds a concrete example of both mistakes and shows the correct version of each, with the rules stated explicitly enough to apply to any new module you write.

Post 4 → SC_METHOD vs SC_THREAD

← Part 2: Simulation Time & Clocks Part 3 of 13 Part 4: SC_METHOD vs SC_THREAD →

3. SystemC Tutorial - Delta Cycles & Event-Driven Semantics