Youwei Xiao

Youwei Xiao (肖有为)

School of Integrated Circuits

Peking University

Beijing, China

I am a Ph.D. candidate at the School of Integrated Circuits, Peking University, advised by Prof. Yun Liang. My research focuses on software techniques for MLSys/Architecture/EDA, with emphasis on domain-specific languages (DSLs) and compiler techniques. Before that, I received my Bachelor of Science in EECS at Peking University in 2022.

My research centers on developing EDA software techniques that bridge the gap between high-level architectural specifications and register-transfer-level (RTL) hardware implementations. I have contributed to and led several projects on multi-level intermediate representations and hardware synthesis. Notable contributions include the open-source hardware description language Cement (FPGA 2024) and the high-level synthesis framework Hector (ICCAD 2022). We built these frameworks with the MLIR infrastructure and the Rust programming language. More recently, I’ve been exploring e-graph techniques for hardware synthesis optimization in the SkyEgg project.

For computer architecture, I explored the automated generation of domain-specific accelerators and custom instructions. I combined application profiling, design space exploration, and dynamic programming to build the Cayman framework (DAC 2025) for automatic accelerator generation with control flow and data access strategies considered. I also proposed reusable instruction customization using e-graph anti-unification techniques, implemented as the ISAMORE framework (ASPLOS 2026).

With my research experiences spanning hardware synthesis and computer architecture, I picked up a goal to create a fully-integrated co-design toolchain - to generate everything (architecture design, hardware implementation, and compiler support) from just ONE agile specification or even only the target applications. For example, one of our ultimate goals is to generate an optimized ML ASIC solution with full ML compiler support given some ML models as acceleration targets, without any human intervention. To achieve this goal, I initiated and led the APS project together with my lab classmates. Actually, we are not far from the dream! I also actively contribute to tutorials at major EDA and architecture conferences, sharing our research on agile hardware specialization and co-design methodologies (see APS tutorials).

Based on my accumulated skills in compilers, DSLs, and architecture, I am actively exploring interesting topics in ML compilers and systems. The software stack for deploying and training LLMs spans multiple levels including system, graph compilation, and kernel generation, with retargetable requirements for different hardware architectures. I believe the potential of compilation across the whole design space (multi-level stack × heterogeneous hardware) has not yet been fully recognized and exploited. Currently, I am actively researching or contributing to e-graph superoptimizers for retargetable tensor compilers, distributed tensor compilation, mega-kernel compilation, and KV-Cache optimization for agentic AI infrastructure. I’m looking forward to sharing these works with everyone.

selected publications

ASPLOS

Finding Reusable Instructions via E-Graph Anti-Unification

Youwei Xiao, Chenyun Yin, Yitian Sun, and 1 more author

2026 (to appear)

PDF
Preprint

Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR

Yuyang Zou, Youwei Xiao, Yansong Xu, and 6 more authors

2025

arXiv HTML PDF
Preprint

SkyEgg: Joint Implementation Selection and Scheduling for Hardware Synthesis using E-graphs

Youwei Xiao, Yuyang Zou, and Yun Liang

2025

arXiv HTML PDF
Preprint

Cement2: Temporal Hardware Transactions for High-Level and Efficient FPGA Programming

Youwei Xiao, Zizhang Luo, Weijie Peng, and 2 more authors

2025

arXiv HTML PDF
ICCAD

Invited Paper: APS: Open-Source Hardware-Software Co-Design Framework for Agile Processor Specialization

Youwei Xiao, Yuyang Zou, Yansong Xu, and 6 more authors

In Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’25), 2025

DOI HTML PDF
ICCAD

Clay: High-level ASIP Framework for Flexible Microarchitecture-Aware Instruction Customization

Weijie Peng^*, Youwei Xiao^*, Yuyang Zou, and 2 more authors

In Proceedings of the 44rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’25), 2025

DOI HTML PDF
DAC

Cayman: Custom Accelerator Generation with Control Flow and Data Access Optimization

Youwei Xiao, Fan Cui, Zizhang Luo, and 2 more authors

In Proceedings of the 62nd ACM/IEEE Design Automation Conference (DAC ’25), 2025

DOI HTML PDF
LATTE

cmt2: Rule-Based Hardware Description in Rust with Temporal Semantics

Youwei Xiao, Zizhang Luo, and Yun Liang

In 5th Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE’25), 2025

HTML PDF
FPGA

An Empirical Comparison of LLM-based Hardware Design and High-level Synthesis

Fan Cui, Youwei Xiao, Kexing Zhou, and 1 more author

In Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA ’25), 2025

DOI HTML PDF
FPGA

Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and Synthesis

Youwei Xiao, Zizhang Luo, Kexing Zhou, and 1 more author

In Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA ’24), 2024

DOI HTML PDF
ICCAD

HECTOR: A Multi-Level Intermediate Representation for Hardware Synthesis Methodologies

Ruifan Xu, Youwei Xiao, Jin Luo, and 1 more author

In Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’22), 2022

DOI HTML PDF