APS/Aquas

Holistic MLIR-based ASIP hardware-software co-design framework

Aquas is a holistic MLIR-based framework for automated ASIP (Application-Specific Instruction-Set Processor) hardware-software co-design. It enhances synthesis with burst-capable DMA and HLS optimizations, and introduces an e-graph-based retargetable compiler for automatic ISAX adoption.

┌────────────────────────────────────────────────────────────────────┐
│                      Aquas Framework (MLIR)                         │
├────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   CADL                              C (App)                         │
│     │                                  │                            │
│     ▼                                  ▼                            │
│  ┌──────────────────────┐    ┌─────────────────────────────────┐   │
│  │  Hardware Synthesizer │    │   Retargetable Compiler         │   │
│  │  ┌────────────────┐  │    │  ┌───────────┐   ┌───────────┐  │   │
│  │  │ aquas dialect  │  │    │  │   MLIR    │◄─►│  e-graph  │  │   │
│  │  │ + affine/scf   │  │    │  └─────┬─────┘   └─────┬─────┘  │   │
│  │  └───────┬────────┘  │    │        │               │        │   │
│  │          │ optimize  │    │   Internal    External          │   │
│  │          ▼           │    │   Rewrites    Rewrites          │   │
│  │  ┌────────────────┐  │    │        │               │        │   │
│  │  │ HECTOR (tor)   │  │    │        └───────┬───────┘        │   │
│  │  │ + scheduling   │  │    │                ▼                │   │
│  │  └───────┬────────┘  │    │  ┌─────────────────────────┐    │   │
│  │          │           │    │  │ Skeleton-Component      │    │   │
│  │          ▼           │    │  │ Pattern Matching        │    │   │
│  │  ┌────────────────┐  │    │  └───────────┬─────────────┘    │   │
│  │  │ RTL (CIRCT)    │  │    │              ▼                  │   │
│  │  └────────────────┘  │    │         LLVM IR → Binary        │   │
│  └──────────────────────┘    └─────────────────────────────────┘   │
│              │                              │                       │
│              ▼                              ▼                       │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │              Rocket/BOOM Core + RoCC Adapter                  │  │
│  │  ┌─────────┐  ┌─────────────────┐  ┌───────────────────────┐ │  │
│  │  │ L1I/D$  │  │ Burst DMA Engine│  │ Banked Scratchpad Mem │ │  │
│  │  └─────────┘  │ (TileLink-UH)   │  │ (partition-aware)     │ │  │
│  │               └─────────────────┘  └───────────────────────┘ │  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────┘

CADL with Optimization Directives

Aquas extends CADL with blockwise memory access and synthesis directives:

#[partition_array([0],[4],"C")]      // Cyclic partition into 4 banks
static mat: [i32; 16];
#[partition_array([0],[4],"C")]
static vec: [i32; 4];

rtype gemv(rs1: u5, rs2: u5, rd: u5) {
    let ia: u32 = _irf[rs1];
    let oa: u32 = _irf[rs2];
    mat[0+:] = _blockld[ia +:16];    // Burst load 16 elements
    vec[0+:] = _blockld[ia+64 +:4];  // Burst load 4 elements
    with i: u32 = (0, i+1) do {
        acc = 0;
        #[unroll(4)]                  // Full unroll inner loop
        with j: u32 = (0, j_) do {
            acc += mat[i*4+j] * vec[j];
        } while (j_ < 4);
        res[i] = acc;
    } while (i + 1 < 4);
    _irf[rd] = 0;
}

Fast Memory Access via DMA

Aquas synthesizes a burst-capable DMA engine to overcome memory bottlenecks:

Access Method Latency Throughput Use Case
RoCC port (single-shot) 2-3 cycles/elem Low Small transfers
Burst DMA (TileLink-UH) 15 cycles init 1 elem/cycle sustained Large blocks

Implementation selection via ILP optimization:

min  Σ t_bur(b)·x_bur,b + t_ss·x_ss
s.t. Σ b·x_bur,b + d_ss·x_ss ≥ D

Partition-aware access: DMA distributes each 64-bit word across multiple banks in one cycle.

E-Graph-Based Retargetable Compiler

Bidirectional MLIR ↔ E-graph Translation

  • MLIR → e-graph: Operations become e-nodes; blocks become tuple(...) of roots
  • E-graph → MLIR: Witness extraction reconstructs SSA form

Hybrid Rewriting

Rewrite Type Mechanism Purpose
Internal Egglog fixpoint reasoning Dataflow equivalences (e.g., x<<2 ⇝ x*4)
External MLIR passes via e-graph Control-flow transforms (tiling, unrolling)

Skeleton-Component Pattern Matching

ISAXs are decomposed into:

  • Skeleton: Control structure (loop nesting, trip counts)
  • Components: Dataflow patterns rooted at side-effect nodes
ISAX gemv:
  Skeleton: for i { for j*2 { ... } }
  Components: [yield(acc1,acc2), store(c_ptr)]

Matching engine: tag components via Egglog rules → skeleton matcher validates structure → emit ISAX node.

Hardware Synthesis Flow

CADL ──► Pre-Opt MLIR ──► Optimize ──► Schedule ──► FIRRTL ──► Verilog
              │              │            │
              │   (affine    │  (modulo   └──► CIRCT
              │    raises)   │   sched)
              │              │
              └── aquas dialect ops: readrf, writerf, blockload, memstore

Dynamic pipeline elaboration: Each stage is a transaction with valid-ready handshakes. Loops decompose into entry/body/next transactions.

Related Publications

2025

  1. Preprint
    Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR
    Yuyang Zou, Youwei Xiao, Yansong Xu, and 6 more authors
    2025

2025

  1. ICCAD
    Invited Paper: APS: Open-Source Hardware-Software Co-Design Framework for Agile Processor Specialization
    Youwei Xiao, Yuyang Zou, Yansong Xu, and 6 more authors
    In Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’25), 2025

2025

  1. ICCAD
    Clay: High-level ASIP Framework for Flexible Microarchitecture-Aware Instruction Customization
    Weijie Peng*Youwei Xiao*, Yuyang Zou, and 2 more authors
    In Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’25), 2025