A-SOUL is a modular out-of-order RISC-V CPU inspired by MIPS R10K, featuring n-way superscalar execution, speculative LSQ and cache optimizations, advanced branch prediction, and a full simulation–profiling framework for microarchitectural exploration and performance tuning.
A-SOUL is a fully functional out-of-order RISC-V CPU inspired by the MIPS R10K microarchitecture, featuring a unified RTL-level microarchitecture and a highly modular pipeline design. The project implements n-way superscalar execution, register renaming, speculative load scheduling, precise states, and early branch resolution within a synthesizable SystemVerilog framework.
The processor pipeline supports instruction-level parallelism with a multi-issue dispatch stage, a Reservation Station and Reorder Buffer, and a Load-Store Queue (LSQ) capable of byte-level forwarding and non-blocking memory access through an MSHR-based D-Cache. The instruction side incorporates GShare and Tournament branch predictors, return address stack, and a configurable adaptive instruction prefetcher with victim cache for reducing i-cache misses.
To improve performance, we designed a speculative LSQ that issues loads before dependent stores are fully resolved, reducing pipeline stalls caused by load/store dependency chains. This optimization alone improved CPI by 0.15 (≈ 8.1%) and eliminated 21 % of issue-stage stalls, bringing the overall average CPI down to ~1.7 at a 7.7 ns clock period after timing closure.
Beyond RTL design, the project features a robust testing and profiling infrastructure. We developed a top-down, counter-based performance simulator and an automated experiment pipeline for architectural design-space exploration. The framework systematically sweeps key microarchitectural parameters, collects CPI, occupancy, and stall breakdowns, and generates detailed visualizations for bottleneck identification and performance analysis. These tools enabled iterative optimization of critical paths (e.g., ROB/RS sizing, pipeline partitioning) and empirical validation across 20+ C benchmarks against a verified in-order golden model.