Arithmetic Coding-Based 5-Bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices

Jong Hun Lee, Joonho Kong, Arslan Munir

Research output: Contribution to journalArticlepeer-review

Abstract

Convolutional neural networks (CNNs) have gained a huge attention for real-world artificial intelligence (AI) applications such as image classification and object detection. On the other hand, for better accuracy, the size of the CNNs' parameters (weights) has been increasing, which in turn makes it difficult to enable on-device CNN inferences in resource-constrained edge devices. Though weight pruning and 5-bit quantization methods have shown promising results, it is still challenging to deploy large CNN models in edge devices. In this paper, we propose an encoding and hardware-based decoding technique which can be applied to 5-bit quantized weight data for on-device CNN inferences in resource-constrained edge devices. Given 5-bit quantized weight data, we employ arithmetic coding with range scaling for lossless weight compression, which is performed offline. When executing on-device inferences with underlying CNN accelerators, our hardware decoder enables a fast in-situ weight decompression with small latency overhead. According to our evaluation results with five widely used CNN models, our arithmetic coding-based encoding method applied to 5-bit quantized weights shows a better compression ratio by 9.6× while also reducing the memory data transfer energy consumption by 89.2%, on average, as compared to the case of uncompressed 32-bit floating-point weights. When applying our technique to pruned weights, we obtain better compression ratios by 57.5× - 112.2× while reducing energy consumption by 98.3%-99.1% as compared to the case of 32-bit floating-point weights. In addition, by pipelining the weight decoding and transfer with the CNN execution, the latency overhead of our weight decoding with 16 decoding unit (DU) hardware is only 0.16%-5.48% and 0.16%-0.91% for non-pruned and pruned weights, respectively. Moreover, our proposed technique with 4-DU decoder hardware reduces system-level energy consumption by 1.1%-9.3%.

Original languageEnglish (US)
Pages (from-to)166736-166749
Number of pages14
JournalIEEE Access
Volume9
DOIs
StatePublished - 2021

Keywords

  • 5-bit quantization
  • arithmetic coding
  • Convolutional neural networks
  • edge devices
  • weight compression

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Arithmetic Coding-Based 5-Bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices'. Together they form a unique fingerprint.

Cite this