Compact: On-chip compression of activations for low power systolic array based CNN acceleration

Jeff Zhang, Parul Raj, Shuayb Zarar, Amol Ambardekar, Siddharth Garg

Research output: Contribution to journalArticle

Abstract

This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- and off-chip memory accesses to the large activation inputs (sometimes called feature maps) of CNN layers contribute significantly to total energy consumption for such accelerators; while prior has proposed off-chip compression, activations are still stored on-chip in uncompressed form, requiring either large on-chip activation buffers or slow and energy-hungry off-chip accesses. In this paper, we propose CompAct, a new architecture that enables on-chip compression of activations for SA based CNN accelerators. CompAct is built around several key ideas. First, CompAct identifies an SA schedule that has nearly regular access patterns, enabling the use of a modified run-length coding scheme (RLC). Second, CompAct improves compression ratio of the RLC scheme using Sparse-RLC in later CNN layers and Lossy-RLC in earlier layers. Finally, CompAct proposes look-ahead snoozing that operates synergistically with RLC to reduce the leakage energy of activation buffers. Based on detailed synthesis results, we show that CompAct enables up to 62% reduction in activation buffer energy, and 34% reduction in total chip energy.

Original languageEnglish (US)
Article numbera47
JournalACM Transactions on Embedded Computing Systems
Volume18
Issue number5s
DOIs
StatePublished - Oct 2019

Keywords

  • Deep neural networks
  • Low-power design
  • Systolic arrays

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Compact: On-chip compression of activations for low power systolic array based CNN acceleration'. Together they form a unique fingerprint.

  • Cite this