Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining under Joint Optimization

H. T. Kung, Bradley McDanel, Sai Qian Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes a novel approach of packing sparse convolutional neural networks into a denser format for efficient implementations using systolic arrays. By combining multiple sparse columns of a convolutional filter matrix into a single dense column stored in the systolic array, the utilization efficiency of the systolic array can be substantially increased (e.g., 8x) due to the increased density of nonzero weights in the resulting packed filter matrix. In combining columns, for each row, all filter weights but the one with the largest magnitude are pruned. The remaining weights are retrained to preserve high accuracy.We study the effectiveness of this joint optimization for both high utilization efficiency and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplieraccumulators. We demonstrate that in mitigating data privacy concerns the retraining can be accomplished with only fractions of the original dataset (e.g., 10% for CIFAR-10). We present analysis and empirical evidence on the superior performance of our column combining approach against prior arts under metrics such as energy efficiency (3x) and inference latency (12x).

Original languageEnglish (US)
Title of host publicationASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems
PublisherAssociation for Computing Machinery
Pages821-834
Number of pages14
ISBN (Electronic)9781450362405
DOIs
StatePublished - Apr 4 2019
Event24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019 - Providence, United States
Duration: Apr 13 2019Apr 17 2019

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Conference

Conference24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019
Country/TerritoryUnited States
CityProvidence
Period4/13/194/17/19

Keywords

  • data flow architectures
  • joint optimization
  • neural networks
  • sparsity
  • systolic arrays

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining under Joint Optimization'. Together they form a unique fingerprint.

Cite this