Skip to content

Optimization Tuning

Celox provides two layers of optimization control: SIRT passes (Celox's own SIR-level optimizer) and backend options. On x86-64 the native backend is used by default; on other architectures Cranelift JIT is the fallback.

TL;DR

The default settings (O1, native backend on x86-64) are the best general-purpose choice. Only tune if you have a specific compile-time or simulation-speed bottleneck, and always benchmark your actual design.

Optimization Levels

Celox uses a GCC-style optimization model: preset levels set defaults, and per-pass overrides allow fine-grained control.

LevelSIR PassesDSECranelift
O0TailCallSplit onlyOfffast_compile()
O1 (default)All 18 passesOffSpeed / Backtracking
O2All 18 passesPreserveTopPortsSpeed / Backtracking

Quick Start

ts
import { Simulator } from '@celox-sim/celox';

// Default (O1): all optimizations enabled
const sim = await Simulator.create(module);

// O0: minimal optimization (fast compile, slower simulation)
const sim = await Simulator.create(module, { optLevel: "O0" });

// O2: all optimizations + dead store elimination
const sim = await Simulator.create(module, { optLevel: "O2" });

// O1 with specific passes disabled
const sim = await Simulator.create(module, {
    optLevel: "O1",
    passOverrides: ["-sir:reschedule", "-sir:commit_sinking"],
});

// Legacy (still supported):
const sim = await Simulator.create(module, { optimize: false });

SIRT Optimization Passes

SIRT (Simulator IR Transform) passes optimize the intermediate representation before handing it to the backend for code generation. All 18 passes are individually controllable via SirPass (Rust) or passOverrides (TypeScript).

PassWhat it does
store_load_forwardingReuses a stored value directly instead of reloading it from memory
hoist_common_branch_loadsWhen both branches of a conditional start with the same load, moves it before the branch
bit_extract_peepholeConverts (value >> shift) & mask into a single ranged load
optimize_blocksDead block removal, block merging, load coalescing
split_wide_commitsSplits wide commit operations into narrower ones
commit_sinkingMoves commit operations closer to where their values are used
inline_commit_forwardingWrites directly to the destination region, removing the intermediate commit copy
eliminate_dead_working_storesRemoves stores to working memory that are never read
rescheduleReorders instructions to reduce register pressure
coalesce_storesMerges consecutive narrow stores into wider Concat+Store operations
gvnGlobal value numbering / dead code elimination
concat_foldingFolds redundant Concat operations
xor_chain_foldingFolds XOR chains
vectorize_concatVectorizes Concat patterns in combinational blocks
split_coalesced_storesSplits wide coalesced stores back after reschedule to reduce register pressure
partial_forwardPartial store-load forwarding in combinational blocks
identity_store_bypassDetects identity copies and registers address aliases for layout sharing
tail_call_splitSplits large functions into tail-call chains (enabled even at O0)

Post-Merge Passes (Native Backend)

When multiple execution units are merged at the SIR level, additional passes run on the merged EU:

PassWhat it does
Working memory eliminationRedirects WORKING region accesses to STABLE for independent variables
Cross-EU commit forwardingForwards stored values across EU boundaries, eliminating redundant commits
Coalesced store splittingBreaks wide Concat+Store back into 64-bit stores interleaved with computation, reducing register pressure

MIR Optimization Passes (Native Backend)

The native backend has its own MIR-level optimization pipeline that runs after ISel:

PassWhat it does
Constant foldingEvaluates operations with constant operands at compile time
Algebraic simplificationIdentity, annihilation, strength reduction (e.g. mul x, 2^nshl x, n)
Redundant mask eliminationRemoves AND masks when known-bits analysis proves they are unnecessary
Global value numbering (GVN)Dominator-tree scoped CSE with alias-aware Load invalidation
If-conversionConverts diamond-shaped branches into Select (cmov) for small arms
Cmp+Branch fusionEmits cmp + jcc directly instead of setcc + movzx + test + jne
32-bit emitUses 32-bit registers when values are known ≤ 32 bits (auto zero-extend)
Branch fall-throughEliminates jmp when the target is the next block in layout order
CFG simplificationThreads jumps through empty blocks

Pass Interactions

The passes are not independent. They form a pipeline where earlier passes prepare the IR for later ones:

storeLoadForwarding ─┐
                     ├─► cleanIR ──► commitSinking ──► inlineCommitForwarding ──► ...
hoistCommonBranchLoads┘

storeLoadForwarding and hoistCommonBranchLoads simplify the IR so that inlineCommitForwarding can better match commit patterns. Disabling them individually may appear harmless, but disabling them together degrades the IR quality fed to the backend, causing compile time and simulation speed to suffer.

WARNING

Do not disable store_load_forwarding, hoist_common_branch_loads, and inline_commit_forwarding as a group. In benchmarks, this combination increased combinational compile time by +69% and eval time by +17%.

Backend Selection

PlatformDefault BackendNotes
x86-64NativeCustom x86-64 codegen, fastest
ARM / RISC-VCranelift JITAutomatic fallback
WASMWASM codegenFor Playground

Cranelift Backend Options

These apply only when using the Cranelift backend (non-x86-64 platforms, or explicit build_cranelift()):

OptionDefaultDescription
craneliftOptLevel"speed""none" / "speed" / "speedAndSize"
regallocAlgorithm"backtracking""backtracking" (better code) / "singlePass" (faster compile)
enableAliasAnalysistrueAlias analysis in egraph pass
enableVerifiertrueIR correctness verifier

Rust API

rust
use celox::{Simulator, OptLevel, SirPass, CraneliftOptions};

// Default (O1, native on x86-64):
let sim = Simulator::builder(code, "Top").build()?;

// O0 (fast compile):
let sim = Simulator::builder(code, "Top")
    .opt_level(OptLevel::O0)
    .build()?;

// O1 with specific pass disabled:
let sim = Simulator::builder(code, "Top")
    .opt_level(OptLevel::O1)
    .disable_pass(SirPass::Reschedule)
    .build()?;

// Explicit Cranelift backend:
let sim = Simulator::builder(code, "Top").build_cranelift()?;

TypeScript API

ts
// Default: O1, native backend on x86-64
const sim = await Simulator.create(module);

// O2: all optimizations + DSE
const sim = await Simulator.create(module, { optLevel: "O2" });

// O1 with per-pass overrides:
const sim = await Simulator.create(module, {
    optLevel: "O1",
    passOverrides: ["-sir:reschedule", "-sir:commit_sinking"],
});

// O0 (fast compile, slower simulation):
const sim = await Simulator.create(module, { optLevel: "O0" });