TY - GEN
T1 - A Fast Design Space Exploration Framework for the Deep Learning Accelerators
T2 - 2020 International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2020
AU - Colucci, Alessio
AU - Marchisio, Alberto
AU - Bussolino, Beatrice
AU - Mrazek, Voitech
AU - Martina, Maurizio
AU - Masera, Guido
AU - Shafique, Muhammad
N1 - Funding Information:
This work has been partially supported by the Doctoral College Resilient Embedded Systems which is run jointly by TU Wien’s Faculty of Informatics and FH-Technikum Wien. REFERENCES [1] Y. LeCun et al. “Deep learning”. In: Nature Cell Biology (2015). [2] V. Sze et al. “Efficient Processing of Deep Neural Networks: A Tutorial and Survey”. In: Proceedings of the IEEE (2017). [3] A. Marchisio et al. “Deep Learning for Edge Computing: Cur-rent Trends, Cross-Layer Optimizations, and Open Research Chal-lenges”. In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 2019, pp. 553–559. [4] M. Shafique et al. “Robust Machine Learning Systems: Chal-lenges,Current Trends, Perspectives, and the Road Ahead”. In: IEEE Design Test 37.2 (2020), pp. 30–57. [5] M. Capra et al. “An Updated Survey of Efficient Hardware Archi-tectures for Accelerating Deep Convolutional Neural Networks”. In: Future Internet 12.7 (July 2020), p. 113. (Visited on 07/16/2020). [6] N. P. Jouppi et al. “In-Datacenter Performance Analysis of a Tensor Processing Unit”. In: ISCA (2017). [7] M. A. Hanif et al. “MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks”. In: CoRR abs/1810.12910 (2018). arXiv: 1810.12910. URL: http://arxiv.org/abs/1810.12910.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/9/20
Y1 - 2020/9/20
N2 - The Capsule Networks (CapsNets) is an advanced form of Convolutional Neural Network (CNN), capable of learning spatial relations and being invariant to transformations. CapsNets requires complex matrix operations which current accelerators are not optimized for, concerning both training and inference passes. Current state-of-The-Art simulators and design space exploration (DSE) tools for DNN hardware neglect the modeling of training operations, while requiring long exploration times that slow down the complete design flow. These impediments restrict the real-world applications of CapsNets (e.g., autonomous driving and robotics) as well as the further development of DNNs in life-long learning scenarios that require training on low-power embedded devices. Towards this, we present XploreDL, a novel framework to perform fast yet high-fidelity DSE for both inference and training accelerators, supporting both CNNs and CapsNets operations. XploreDL enables a resource-efficient DSE for accelerators, focusing on power, area, and latency, highlighting Pareto-optimal solutions which can be a green-lit to expedite the design flow. XploreDL can reach the same fidelity as ARM's SCALE-sim, while providing 600x speedup and having a 50x lower memory-footprint. Preliminary results with a deep CapsNet model on MNIST for training accelerators show promising Pareto-optimal architectures with up to 0.4 TOPS/squared-mm and 800 fJ/op efficiency. With inference accelerators for AlexNet the Pareto-optimal solutions reach up to 1.8 TOPS/squared-mm and 200 fJ/op efficiency.
AB - The Capsule Networks (CapsNets) is an advanced form of Convolutional Neural Network (CNN), capable of learning spatial relations and being invariant to transformations. CapsNets requires complex matrix operations which current accelerators are not optimized for, concerning both training and inference passes. Current state-of-The-Art simulators and design space exploration (DSE) tools for DNN hardware neglect the modeling of training operations, while requiring long exploration times that slow down the complete design flow. These impediments restrict the real-world applications of CapsNets (e.g., autonomous driving and robotics) as well as the further development of DNNs in life-long learning scenarios that require training on low-power embedded devices. Towards this, we present XploreDL, a novel framework to perform fast yet high-fidelity DSE for both inference and training accelerators, supporting both CNNs and CapsNets operations. XploreDL enables a resource-efficient DSE for accelerators, focusing on power, area, and latency, highlighting Pareto-optimal solutions which can be a green-lit to expedite the design flow. XploreDL can reach the same fidelity as ARM's SCALE-sim, while providing 600x speedup and having a 50x lower memory-footprint. Preliminary results with a deep CapsNet model on MNIST for training accelerators show promising Pareto-optimal architectures with up to 0.4 TOPS/squared-mm and 800 fJ/op efficiency. With inference accelerators for AlexNet the Pareto-optimal solutions reach up to 1.8 TOPS/squared-mm and 200 fJ/op efficiency.
KW - Capsule Networks
KW - Convolutional Neural Networks
KW - Design Space Exploration
KW - Hardware Accelerator
KW - Training
UR - http://www.scopus.com/inward/record.url?scp=85097640545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097640545&partnerID=8YFLogxK
U2 - 10.1109/CODESISSS51650.2020.9244038
DO - 10.1109/CODESISSS51650.2020.9244038
M3 - Conference contribution
AN - SCOPUS:85097640545
T3 - Proceedings of the 2020 International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2020
SP - 34
EP - 36
BT - Proceedings of the 2020 International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2020
A2 - Mitra, Tulika
A2 - Gerstlauer, Andreas
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 September 2020 through 25 September 2020
ER -