On-chip deep neural network storage with multi-level eNVM

Marco Donato, Brandon Reagen, Lillian Pentecost, Udit Gupta, David Brooks, Gu Yeon Wei

Research output: Chapter in Book/Report/Conference proceedingConference contribution


One of the biggest performance bottlenecks of today's neural network (NN) accelerators is off-chip memory accesses [11]. In this paper, we propose a method to use multi-level, embedded nonvolatile memory (eNVM) to eliminate all off-chip weight accesses. The use of multi-level memory cells increases the probability of faults. Therefore, we co-design the weights and memories such that their properties complement each other and the faults result in no noticeable NN accuracy loss. In the extreme case, the weights in fully connected layers can be stored using a single transistor. With weight pruning and clustering, we show our technique reduces the memory area by over an order of magnitude compared to an SRAM baseline. In the case of VGG16 (130M weights), we are able to store all the weights in 4.9 mm2, well within the area allocated to SRAM in modern NN accelerators [6].

Original languageEnglish (US)
Title of host publicationProceedings of the 55th Annual Design Automation Conference, DAC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Print)9781450357005
StatePublished - Jun 24 2018
Event55th Annual Design Automation Conference, DAC 2018 - San Francisco, United States
Duration: Jun 24 2018Jun 29 2018

Publication series

NameProceedings - Design Automation Conference
VolumePart F137710
ISSN (Print)0738-100X


Other55th Annual Design Automation Conference, DAC 2018
Country/TerritoryUnited States
CitySan Francisco

ASJC Scopus subject areas

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Modeling and Simulation


Dive into the research topics of 'On-chip deep neural network storage with multi-level eNVM'. Together they form a unique fingerprint.

Cite this