MaxNVM: Maximizing DNN storage density and inference efficiency with sparse encoding and error mitigation

Lillian Pentecost, Marco Donato, Brandon Reagen, Udit Gupta, Siming Ma, Gu Yeon Wei, David Brooks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deeply embedded applications require low-power, low-cost hardware that fits within stringent area constraints. Deep learning has many potential uses in these domains, but introduces significant inefficiencies stemming from off-chip DRAM accesses of model weights. Ideally, models would fit entirely on-chip. However, even with compression, memory requirements for state-of-the-art models make on-chip inference impractical. Due to increased density, emerging eNVMs are one promising solution. We present MaxNVM, a principled co-design of sparse encodings, protective logic, and fault-prone MLC eNVM technologies (i.e., RRAM and CTT) to enable highly-efficient DNN inference. We find bit reduction techniques (e.g., clustering and sparse compression) increase weight vulnerability to faults. This limits the capabilities of MLC eNVM. To circumvent this limitation, we improve storage density (i.e., bits-per-cell) with minimal overhead using protective logic. Tradeoffs between density and reliability result in a rich design space. We show that by balancing these techniques, the weights of large networks are able to reasonably fit on-chip. Compared to a naive, single-level-cell eNVM solution, our highly-optimized MLC memory systems reduce weight area by up to 29.We compare our technique against NVDLA, a state-of-the-art industry-grade CNN accelerator, and demonstrate up to 3.2 reduced power and up to 3.5 reduced energy per ResNet50 inference.

Original languageEnglish (US)
Title of host publicationMICRO 2019 - 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Proceedings
PublisherIEEE Computer Society
Pages769-781
Number of pages13
ISBN (Electronic)9781450369381
DOIs
StatePublished - Oct 12 2019
Event52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019 - Columbus, United States
Duration: Oct 12 2019Oct 16 2019

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
ISSN (Print)1072-4451

Conference

Conference52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019
CountryUnited States
CityColumbus
Period10/12/1910/16/19

Keywords

  • CTT
  • ENVM
  • Memory systems
  • Neural networks
  • RRAM

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint Dive into the research topics of 'MaxNVM: Maximizing DNN storage density and inference efficiency with sparse encoding and error mitigation'. Together they form a unique fingerprint.

Cite this