JointNF: Enhancing DNN Performance through Adaptive N:M Pruning across both Weight and Activation

Sai Qian Zhang, Thierry Tambe, Gu Yeon Wei, David Brooks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Balancing accuracy and hardware efficiency remains a challenge with traditional pruning methods. N:M sparsity is a recent approach offering a compromise, allowing up to N non-zero weights in a group of M consecutive weights. However, N:M pruning enforces a uniform sparsity level of N/M across all layers, which does not align well sparse nature of deep neural networks (DNNs). To achieve a more flexible sparsity pattern and a higher overall sparsity level, we present JointNF, a novel joint N:M and structured pruning algorithm to enable fine-grained structured pruning with adaptive sparsity levels across the DNN layers. Moreover, we show for the first time that N:M pruning can also be applied over the input activation for further performance enhancement.

Original languageEnglish (US)
Title of host publicationProceedings of the 29th International Symposium on Low Power Electronics and Design, ISLPED 2024
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400706882
DOIs
StatePublished - Aug 5 2024
Event29th ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED 2024 - Newport Beach, United States
Duration: Aug 5 2024Aug 7 2024

Publication series

NameProceedings of the 29th International Symposium on Low Power Electronics and Design, ISLPED 2024

Conference

Conference29th ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED 2024
Country/TerritoryUnited States
CityNewport Beach
Period8/5/248/7/24

Keywords

  • hardware accelerator
  • pruning
  • transformer

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'JointNF: Enhancing DNN Performance through Adaptive N:M Pruning across both Weight and Activation'. Together they form a unique fingerprint.

Cite this