HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models

Tianfan Peng, Tianhua Xia, Jiajun Qin, Sai Qian Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large language models (LLMs) have revolutionized natural language processing (NLP) tasks by achieving state-of-the-art performance across a range of benchmarks. Central to the success of these models is the integration of sophisticated architectural components aimed at improving training stability, convergence speed, and generalization capabilities. Among these components, normalization operation, such as layer normalization (LayerNorm), emerges as a pivotal technique, offering substantial benefits to the overall model performance. However, previous studies have indicated that normalization operations can substantially elevate processing latency and energy usage. In this work, we adopt the principles of algorithm and hardware co-design, introducing a holistic normalization accelerating method named HAAN. The evaluation results demonstrate that HAAN can achieve significantly better hardware performance compared to state-of-the-art solutions.

Original languageEnglish (US)
Title of host publication2025 Design, Automation and Test in Europe Conference, DATE 2025 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9783982674100
DOIs
StatePublished - 2025
Event2025 Design, Automation and Test in Europe Conference, DATE 2025 - Lyon, France
Duration: Mar 31 2025Apr 2 2025

Publication series

NameProceedings -Design, Automation and Test in Europe, DATE
ISSN (Print)1530-1591

Conference

Conference2025 Design, Automation and Test in Europe Conference, DATE 2025
Country/TerritoryFrance
CityLyon
Period3/31/254/2/25

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models'. Together they form a unique fingerprint.

Cite this