TY - JOUR
T1 - ROMANet
T2 - Fine-Grained Reuse-Driven Off-Chip Memory Access Management and Data Organization for Deep Neural Network Accelerators
AU - Putra, Rachmad Vidya Wicaksana
AU - Hanif, Muhammad Abdullah
AU - Shafique, Muhammad
N1 - Funding Information:
Manuscript received July 12, 2020; revised October 26, 2020 and December 13, 2020; accepted January 27, 2021. Date of publication March 4, 2021; date of current version April 1, 2021. This work was partly supported by the Indonesia Endowment Fund for Education (IEFE/LPDP) Graduate Scholarship Program, Ministry of Finance, Indonesia, under Grant PRJ-1477/LPDP.3/2017. (Corresponding author: Rachmad Vidya Wicaksana Putra.) Rachmad Vidya Wicaksana Putra and Muhammad Abdullah Hanif are with the Institute of Computer Engineering, Technische Universität Wien (TU Wien), 1040 Vienna, Austria (e-mail: rachmad.putra@tuwien.ac.at; muhammad.hanif@tuwien.ac.at).
Publisher Copyright:
© 1993-2012 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - Enabling high energy efficiency is crucial for embedded implementations of deep learning. Several studies have shown that the DRAM-based off-chip memory accesses are one of the most energy-consuming operations in deep neural network (DNN) accelerators and, thereby, limit the designs from achieving efficiency gains at the full potential. DRAM access energy varies depending upon the number of accesses required and the energy consumed per-Access. Therefore, searching for a solution toward the minimum DRAM access energy is an important optimization problem. Toward this, we propose the ROMANet methodology that aims at reducing the number of memory accesses, by searching for the appropriate data partitioning and scheduling for each layer of a network using a design space exploration, based on the knowledge of the available on-chip memory and the data reuse factors. Moreover, ROMANet also targets decreasing the number of DRAM row buffer conflicts and misses by exploiting the DRAM multibank burst feature to improve the energy-per-Access. Besides providing the energy benefits, our proposed DRAM data mapping also results in an increased effective DRAM throughput, which is useful for latency-constraint scenarios. Our experimental results show that the ROMANet saves DRAM access energy by 12% for the AlexNet, 36% for the VGG-16, 46% for the MobileNet, and 45% for the SqueezeNet while improving the DRAM throughput by 10% on average across different networks compared to the state of the art, i.e., bus-width aware (BWA) technique.
AB - Enabling high energy efficiency is crucial for embedded implementations of deep learning. Several studies have shown that the DRAM-based off-chip memory accesses are one of the most energy-consuming operations in deep neural network (DNN) accelerators and, thereby, limit the designs from achieving efficiency gains at the full potential. DRAM access energy varies depending upon the number of accesses required and the energy consumed per-Access. Therefore, searching for a solution toward the minimum DRAM access energy is an important optimization problem. Toward this, we propose the ROMANet methodology that aims at reducing the number of memory accesses, by searching for the appropriate data partitioning and scheduling for each layer of a network using a design space exploration, based on the knowledge of the available on-chip memory and the data reuse factors. Moreover, ROMANet also targets decreasing the number of DRAM row buffer conflicts and misses by exploiting the DRAM multibank burst feature to improve the energy-per-Access. Besides providing the energy benefits, our proposed DRAM data mapping also results in an increased effective DRAM throughput, which is useful for latency-constraint scenarios. Our experimental results show that the ROMANet saves DRAM access energy by 12% for the AlexNet, 36% for the VGG-16, 46% for the MobileNet, and 45% for the SqueezeNet while improving the DRAM throughput by 10% on average across different networks compared to the state of the art, i.e., bus-width aware (BWA) technique.
KW - Accelerator
KW - DRAM
KW - analysis
KW - deep learning
KW - deep neural networks (DNNs)
KW - energy efficiency
KW - memory access management
KW - modeling
KW - off-chip memory
UR - http://www.scopus.com/inward/record.url?scp=85102267098&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102267098&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2021.3060509
DO - 10.1109/TVLSI.2021.3060509
M3 - Article
AN - SCOPUS:85102267098
SN - 1063-8210
VL - 29
SP - 702
EP - 715
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 4
M1 - 9369858
ER -