PICACHU: Plug-In CGRA Handling Upcoming Nonlinear Operations in LLMs

Jiajun Qin, Tianhua Xia, Cheng Tan, Jeff Zhang, Sai Qian Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large language models (LLMs) have revolutionized natural language processing (NLP) domain by achieving state-of-the-art performance across a range of benchmarks. However, nonlinear operations in LLMs significantly contribute to inference latency and present unique challenges that have not been encountered previously. Addressing these challenges requires accelerators that combine efficiency, flexibility, and support for user-defined precision. Our analysis reveals that Coarse-Grained Reconfigurable Arrays (CGRAs) provide an effective solution, offering a balance of performance and flexibility tailored to domain-specific workloads. This paper introduces PICACHU, a plug-in coarse-grained reconfigurable accelerator tailored to efficiently handle nonlinear operations by using custom algorithms and a dedicated compiler toolchain. PICACHU is the first to target all nonlinear operations within LLMs and to consider CGRA as a plug-in accelerator for LLM inference. Our evaluation shows that PICACHU achieves speedups of 1.86× and 1.55× over prior state-of-the-art accelerators in LLM inference.

Original languageEnglish (US)
Title of host publicationASPLOS 2025 - Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
PublisherAssociation for Computing Machinery
Pages845-861
Number of pages17
ISBN (Electronic)9798400710797
DOIs
StatePublished - Mar 30 2025
Event30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2025 - Rotterdam, Netherlands
Duration: Mar 30 2025Apr 3 2025

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
Volume2

Conference

Conference30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2025
Country/TerritoryNetherlands
CityRotterdam
Period3/30/254/3/25

Keywords

  • coarse-grained reconfigurable array (cgra)
  • domain specific architecture (dsa)
  • large language models (llm)

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'PICACHU: Plug-In CGRA Handling Upcoming Nonlinear Operations in LLMs'. Together they form a unique fingerprint.

Cite this