Term quantization: Furthering quantization at run time

H. T. Kung, Bradley McDanel, Sai Qian Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a novel technique, called Term Quantization (TQ), for furthering quantization at run time for improved computational efficiency of deep neural networks (DNNs) already quantized with conventional quantization methods. TQ operates on power-of-two terms in expressions of values. In computing a dot-product computation, TQ dynamically selects a fixed number of largest terms to use from values of the two vectors. By exploiting weight and data distributions typically present in DNNs, TQ has a minimal impact on DNN model performance (e.g., accuracy or perplexity). We use TQ to facilitate tightly synchronized processor arrays, such as systolic arrays, for efficient parallel processing. We evaluate TQ on an MLP for MNIST, multiple CNNs for ImageNet and an LSTM for Wikitext-2. We demonstrate significant reductions in inference computation costs (between ;3-10×) compared to conventional uniform quantization for the same level of model performance.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2020
Subtitle of host publicationInternational Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781728199986
DOIs
StatePublished - Nov 2020
Event2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020 - Virtual, Atlanta, United States
Duration: Nov 9 2020Nov 19 2020

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume2020-November
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020
Country/TerritoryUnited States
CityVirtual, Atlanta
Period11/9/2011/19/20

Keywords

  • accelerator
  • Deep neural network (DNN)
  • quantization

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Term quantization: Furthering quantization at run time'. Together they form a unique fingerprint.

Cite this