ISAMORE

ISAMORE is an end-to-end framework for discovering reusable custom instructions from domain applications using e-graph anti-unification.

+---------------------------------------------------------------+
|                      ISAMORE Framework                        |
+---------------------------------------------------------------+
|                                                               |
|  Domain Programs                    Ruleset Generation        |
|       |                                   |                   |
|       v LLVM                              |                   |
|  +---------------------------------------------------------+  |
|  |                         RII                             |  |
|  |  +---------+   +---------+   +----------+               |  |
|  |  |Structured|-->| E-graph |-->|  Phase   |<-- Ruleset   |  |
|  |  |   DSL   |   | encode  |   | Scheduler|               |  |
|  |  +---------+   +---------+   +-----+----+               |  |
|  |                                    |                    |  |
|  |      +-----------------------------+------------+       |  |
|  |      |        Per-Phase Loop       |            |       |  |
|  |      |  +--------+  +--------+  +--v-------+    |       |  |
|  |      |  | EqSat  |->|Smart AU|->|Vectorize |    |       |  |
|  |      |  +--------+  +--------+  +-----+----+    |       |  |
|  |      |                    +-----------+         |       |  |
|  |      |  +--------+  +-----v---+                 |       |  |
|  |      |  |Extract |<-| Select  |<-- Cost Model   |       |  |
|  |      |  +--------+  +---------+    (HLS+PDK)    |       |  |
|  |      +------------------------------------------+       |  |
|  +---------------------------------------------------------+  |
|       |                                                       |
|       v                                                       |
|  CI_0, CI_1, ... --> Verilog (via HLS/CIRCT)                  |
+---------------------------------------------------------------+

The Reusability Problem

Existing instruction customization approaches suffer from low reusability:

Approach	Problem
Fine-grained [enumeration]	Enumerate convex subgraphs, miss larger patterns
Coarse-grained [NOVIA]	Merge basic blocks, syntactic-only, hard to reuse
Both	Overlook semantic equivalence, no vectorization

ISAMORE’s insight: Use e-graph anti-unification to identify semantically equivalent common patterns across applications, fundamentally considering reusability.

Reusable Instruction Identification (RII)

RII is the core methodology, combining e-graph techniques with hardware-aware optimization:

1. Structured DSL

ISAMORE introduces a DSL to encode programs with control flow as e-graph terms:

e ::= Literal | UnaryOp(e) | BinaryOp(e1, e2)
    | If(e_in, e_then, e_else)   // Conditional
    | Loop(e_in, e_body)         // Do-while loop
    | Vec(e1, ...) | VecOp(...)  // Vector operations
    | ?x | App(e_pat, e*)        // Pattern application

2. Phase-Oriented Iteration

Instead of monolithic equality saturation (which explodes), RII applies phases:

Phase 1: int-sat rules    -> saturate integer equivalences
Phase 2: float-sat rules  -> saturate float equivalences
Phase 3+: nonsat rules    -> selective, restrained application

Each phase identifies patterns over previously identified ones, discovering increasingly reusable patterns.

3. Smart AU Identification

Two heuristics overcome exponential AU complexity:

Similarity-based e-class pairing: Only pair e-classes with:

Consistent result types (via type inference)
High structural similarity (via 64-bit structural hashing)

Heuristic pattern sampling: Instead of all AU patterns:

Boundary: Keep only min/max by feature function
KD-tree: Partition by feature vector, sample evenly

4. Pattern Vectorization

RII exploits data-level parallelism from scalar programs:

+-----------------------------------------------------+
| 1. Seed Packing: Find scalar pattern instances      |
|    (seeds), pack into Vec e-nodes                   |
|                                                     |
| 2. Pack Expansion: Lift rewrites recover vector     |
|    constructors: Vec(Add a b)(Add c d) =>           |
|                  VecAdd(Vec a c)(Vec b d)           |
|                                                     |
| 3. Acyclic Pruning: Greedy extraction + compress    |
|    to eliminate Get->Vec->Get cycles                |
+-----------------------------------------------------+

5. Hardware-Aware Selection

Cost model estimates performance and area impact:

dL(p) = Sum_{u in use(p)} (Sum_{o in p} CPO(bb(o,u))) - L_HLS(p)
S(P)  = L_cpu / (L_cpu - Sum_{p in P} dL(p))
A(P)  = Sum_{p in P} A_HLS(p)

CPO: Cycles per operation from GEM5 profiling
L_HLS: Hardware latency from ASAP scheduling
Pareto-optimal selection: Trade off speedup vs. area

E-Graph Anti-Unification

Given two e-classes, AU finds their least general generalization:

AU(a*2+b, 1+i<<1) via e-graph:
  1. EqSat: 1+i<<1 == (1+i)*2
  2. AU: (a+b)*2, (1+i)*2 -> (?x+?y)*2

Formalization:

AU(a, b) = Union_{n_a in M(a), n_b in M(b)} AU(n_a, n_b)
AU(s(a1,...,ak), s(b1,...,bk)) = {s(p1,...,pk) | pi in AU(ai,bi)}
AU(s1(...), s2(...)) = ?x   if s1 != s2