Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation

Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with a field-programmable gate array (FPGA) implementation. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference latency, energy efficiency, hardware utilization, and inference accuracy. An FPGA implementation is used as the validation vehicle for our design, achieving a 2.28ms inference latency for the ImageNet benchmark. Our implementation shines in that it has 9x higher energy efficiency compared to other implementations while achieving comparable latency. A highlight of our approach which contributes to the achieved high energy efficiency is an efficient Selector-Accumulator (SAC) architecture for implementing CNNs with powers-of-two weights. Compared to an FPGA implementation for a traditional 8-bit MAC, SAC substantially reduces required hardware resources (4.85x fewer lookup tables) and power consumption (2.48x).

Original languageEnglish (US)
Title of host publicationICS 2019 - International Conference on Supercomputing
PublisherAssociation for Computing Machinery
Pages449-460
Number of pages12
ISBN (Electronic)9781450360791
DOIs
StatePublished - Jun 26 2019
Event33rd ACM International Conference on Supercomputing, ICS 2019, held in conjunction with the Federated Computing Research Conference, FCRC 2019 - Phoenix, United States
Duration: Jun 26 2019 → …

Publication series

NameProceedings of the International Conference on Supercomputing

Conference

Conference33rd ACM International Conference on Supercomputing, ICS 2019, held in conjunction with the Federated Computing Research Conference, FCRC 2019
Country/TerritoryUnited States
CityPhoenix
Period6/26/19 → …

Keywords

  • Co-design
  • Joint optimization
  • Neural networks
  • Powers-of-two weights
  • Sparsity
  • Systolic arrays

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation'. Together they form a unique fingerprint.

Cite this