Training for multi-resolution inference using reusable quantization terms

Sai Qian Zhang, Bradley McDanel, H. T. Kung, Xin Dong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Low-resolution uniform quantization (e.g., 4-bit bitwidth) for both Deep Neural Network (DNN) weights and data has emerged as an important technique for efficient inference. Departing from conventional quantization, we describe a novel training approach to support inference at multiple resolutions by reusing a single set of quantization terms (the same set of nonzero bits in values). The proposed approach streamlines the training and supports dynamic selection of resolution levels during inference. We evaluate the method on a diverse range of applications including multiple CNNs on ImageNet, an LSTM on Wikitext-2, and YOLO-v5 on COCO. We show that models resulting from our multi-resolution training can support up to 10 resolutions with only a moderate performance reduction (e.g., = 1%) compared to training them individually. Lastly, using an FPGA, we compare our multi-resolution multiplier-accumulator (mMAC) against other conventional MAC designs and evaluate the inference performance. We show that the mMAC design broadens the choices in trading off cost, efficiency, and latency across a range of computational budgets.

Original languageEnglish (US)
Title of host publicationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021
PublisherAssociation for Computing Machinery
Pages845-860
Number of pages16
ISBN (Electronic)9781450383172
DOIs
StatePublished - Apr 19 2021
Event26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021 - Virtual, Online, United States
Duration: Apr 19 2021Apr 23 2021

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Conference

Conference26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021
Country/TerritoryUnited States
CityVirtual, Online
Period4/19/214/23/21

Keywords

  • co-design
  • deep neural networks
  • joint-optimization training
  • Multi-resolution inference
  • quantization
  • systolic arrays

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Training for multi-resolution inference using reusable quantization terms'. Together they form a unique fingerprint.

Cite this