TY - GEN
T1 - MaxNVM
T2 - 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019
AU - Pentecost, Lillian
AU - Donato, Marco
AU - Reagen, Brandon
AU - Gupta, Udit
AU - Ma, Siming
AU - Wei, Gu Yeon
AU - Brooks, David
N1 - Funding Information:
This work was partially supported by the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA.
Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/10/12
Y1 - 2019/10/12
N2 - Deeply embedded applications require low-power, low-cost hardware that fits within stringent area constraints. Deep learning has many potential uses in these domains, but introduces significant inefficiencies stemming from off-chip DRAM accesses of model weights. Ideally, models would fit entirely on-chip. However, even with compression, memory requirements for state-of-the-art models make on-chip inference impractical. Due to increased density, emerging eNVMs are one promising solution. We present MaxNVM, a principled co-design of sparse encodings, protective logic, and fault-prone MLC eNVM technologies (i.e., RRAM and CTT) to enable highly-efficient DNN inference. We find bit reduction techniques (e.g., clustering and sparse compression) increase weight vulnerability to faults. This limits the capabilities of MLC eNVM. To circumvent this limitation, we improve storage density (i.e., bits-per-cell) with minimal overhead using protective logic. Tradeoffs between density and reliability result in a rich design space. We show that by balancing these techniques, the weights of large networks are able to reasonably fit on-chip. Compared to a naive, single-level-cell eNVM solution, our highly-optimized MLC memory systems reduce weight area by up to 29.We compare our technique against NVDLA, a state-of-the-art industry-grade CNN accelerator, and demonstrate up to 3.2 reduced power and up to 3.5 reduced energy per ResNet50 inference.
AB - Deeply embedded applications require low-power, low-cost hardware that fits within stringent area constraints. Deep learning has many potential uses in these domains, but introduces significant inefficiencies stemming from off-chip DRAM accesses of model weights. Ideally, models would fit entirely on-chip. However, even with compression, memory requirements for state-of-the-art models make on-chip inference impractical. Due to increased density, emerging eNVMs are one promising solution. We present MaxNVM, a principled co-design of sparse encodings, protective logic, and fault-prone MLC eNVM technologies (i.e., RRAM and CTT) to enable highly-efficient DNN inference. We find bit reduction techniques (e.g., clustering and sparse compression) increase weight vulnerability to faults. This limits the capabilities of MLC eNVM. To circumvent this limitation, we improve storage density (i.e., bits-per-cell) with minimal overhead using protective logic. Tradeoffs between density and reliability result in a rich design space. We show that by balancing these techniques, the weights of large networks are able to reasonably fit on-chip. Compared to a naive, single-level-cell eNVM solution, our highly-optimized MLC memory systems reduce weight area by up to 29.We compare our technique against NVDLA, a state-of-the-art industry-grade CNN accelerator, and demonstrate up to 3.2 reduced power and up to 3.5 reduced energy per ResNet50 inference.
KW - CTT
KW - ENVM
KW - Memory systems
KW - Neural networks
KW - RRAM
UR - http://www.scopus.com/inward/record.url?scp=85074450905&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074450905&partnerID=8YFLogxK
U2 - 10.1145/3352460.3358258
DO - 10.1145/3352460.3358258
M3 - Conference contribution
AN - SCOPUS:85074450905
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 769
EP - 781
BT - MICRO 2019 - 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Proceedings
PB - IEEE Computer Society
Y2 - 12 October 2019 through 16 October 2019
ER -