TY - JOUR
T1 - Compact
T2 - On-chip compression of activations for low power systolic array based CNN acceleration
AU - Zhang, Jeff
AU - Raj, Parul
AU - Zarar, Shuayb
AU - Ambardekar, Amol
AU - Garg, Siddharth
N1 - Funding Information:
This work is performed when the first author interned at Microsoft Research, and is supported in part by an National Science Foundation CAREER Award. The authors would like to thank the anonymous reviewers for their time, suggestions, and valuable feedback.
Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/10
Y1 - 2019/10
N2 - This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- and off-chip memory accesses to the large activation inputs (sometimes called feature maps) of CNN layers contribute significantly to total energy consumption for such accelerators; while prior has proposed off-chip compression, activations are still stored on-chip in uncompressed form, requiring either large on-chip activation buffers or slow and energy-hungry off-chip accesses. In this paper, we propose CompAct, a new architecture that enables on-chip compression of activations for SA based CNN accelerators. CompAct is built around several key ideas. First, CompAct identifies an SA schedule that has nearly regular access patterns, enabling the use of a modified run-length coding scheme (RLC). Second, CompAct improves compression ratio of the RLC scheme using Sparse-RLC in later CNN layers and Lossy-RLC in earlier layers. Finally, CompAct proposes look-ahead snoozing that operates synergistically with RLC to reduce the leakage energy of activation buffers. Based on detailed synthesis results, we show that CompAct enables up to 62% reduction in activation buffer energy, and 34% reduction in total chip energy.
AB - This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- and off-chip memory accesses to the large activation inputs (sometimes called feature maps) of CNN layers contribute significantly to total energy consumption for such accelerators; while prior has proposed off-chip compression, activations are still stored on-chip in uncompressed form, requiring either large on-chip activation buffers or slow and energy-hungry off-chip accesses. In this paper, we propose CompAct, a new architecture that enables on-chip compression of activations for SA based CNN accelerators. CompAct is built around several key ideas. First, CompAct identifies an SA schedule that has nearly regular access patterns, enabling the use of a modified run-length coding scheme (RLC). Second, CompAct improves compression ratio of the RLC scheme using Sparse-RLC in later CNN layers and Lossy-RLC in earlier layers. Finally, CompAct proposes look-ahead snoozing that operates synergistically with RLC to reduce the leakage energy of activation buffers. Based on detailed synthesis results, we show that CompAct enables up to 62% reduction in activation buffer energy, and 34% reduction in total chip energy.
KW - Deep neural networks
KW - Low-power design
KW - Systolic arrays
UR - http://www.scopus.com/inward/record.url?scp=85073170953&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073170953&partnerID=8YFLogxK
U2 - 10.1145/3358178
DO - 10.1145/3358178
M3 - Article
AN - SCOPUS:85073170953
SN - 1539-9087
VL - 18
JO - Transactions on Embedded Computing Systems
JF - Transactions on Embedded Computing Systems
IS - 5s
M1 - a47
ER -