TY - JOUR
T1 - FEECA
T2 - Design Space Exploration for Low-Latency and Energy-Efficient Capsule Network Accelerators
AU - Marchisio, Alberto
AU - Mrazek, Vojtech
AU - Hanif, Muhammad Abdullah
AU - Shafique, Muhammad
N1 - Funding Information:
Manuscript received October 21, 2020; revised January 6, 2021; accepted January 31, 2021. Date of publication February 25, 2021; date of current version April 1, 2021. This work was supported in part by the Doctoral College Resilient Embedded Systems, which is run jointly by TU Wien’s Faculty of Informatics and FH-Technikum Wien, and in part by the Czech Science Foundation under Project 19-10137S. (Corresponding author: Alberto Marchisio.) Alberto Marchisio and Muhammad Abdullah Hanif are with the Department of Informatics, Institute of Computer Engineering, Technische Universität Wien (TU Wien), 1040 Vienna, Austria (e-mail: alberto.marchisio@tuwien.ac.at).
Publisher Copyright:
© 1993-2012 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - In the past few years, Capsule Networks (CapsNets) have taken the spotlight compared to traditional convolutional neural networks (CNNs) for image classification. Unlike CNNs, CapsNets have the ability to learn the spatial relationship between features of the images. However, their complexity grows because of their heterogeneous capsule structure and the dynamic routing, which is an iterative algorithm to dynamically learn the coupling coefficients of two consecutive capsule layers. This necessitates specialized hardware accelerators for CapsNets. Moreover, a high-performance and energy-efficient design of CapsNet accelerators requires exploration of different design decisions (such as the size and configuration of the processing array and the structure of the processing elements). Toward this, we make the following key contributions: 1) FEECA, a novel methodology to explore the design space of the (micro)architectural parameters of a CapsNet hardware accelerator and 2) CapsAcc, the first specialized RTL-level hardware architecture to perform CapsNets inference with high performance and high energy efficiency. Our CapsAcc achieves significant performance improvement, compared to an optimized GPU implementation, due to its efficient implementation of key activation functions, such as squash and softmax, and an efficient data reuse for the dynamic routing. The FEECA methodology employs the Non-dominated Sorting Genetic Algorithm (NSGA-II) to explore the Pareto-optimal points with respect to area, performance, and energy consumption. This requires analytical modeling of the number of clock cycles required to perform each operation of the CapsNet inference and the memory accesses to enable a fast yet accurate design space exploration. We synthesized the complete accelerator architecture in a 45-nm CMOS technology using Synopsys design tools and evaluated it for the MNIST benchmark (as done by the original CapsNet paper from Google Brain's team) and for a more complex data set, the German Traffic Sign Recognition Benchmark (GTSRB).
AB - In the past few years, Capsule Networks (CapsNets) have taken the spotlight compared to traditional convolutional neural networks (CNNs) for image classification. Unlike CNNs, CapsNets have the ability to learn the spatial relationship between features of the images. However, their complexity grows because of their heterogeneous capsule structure and the dynamic routing, which is an iterative algorithm to dynamically learn the coupling coefficients of two consecutive capsule layers. This necessitates specialized hardware accelerators for CapsNets. Moreover, a high-performance and energy-efficient design of CapsNet accelerators requires exploration of different design decisions (such as the size and configuration of the processing array and the structure of the processing elements). Toward this, we make the following key contributions: 1) FEECA, a novel methodology to explore the design space of the (micro)architectural parameters of a CapsNet hardware accelerator and 2) CapsAcc, the first specialized RTL-level hardware architecture to perform CapsNets inference with high performance and high energy efficiency. Our CapsAcc achieves significant performance improvement, compared to an optimized GPU implementation, due to its efficient implementation of key activation functions, such as squash and softmax, and an efficient data reuse for the dynamic routing. The FEECA methodology employs the Non-dominated Sorting Genetic Algorithm (NSGA-II) to explore the Pareto-optimal points with respect to area, performance, and energy consumption. This requires analytical modeling of the number of clock cycles required to perform each operation of the CapsNet inference and the memory accesses to enable a fast yet accurate design space exploration. We synthesized the complete accelerator architecture in a 45-nm CMOS technology using Synopsys design tools and evaluated it for the MNIST benchmark (as done by the original CapsNet paper from Google Brain's team) and for a more complex data set, the German Traffic Sign Recognition Benchmark (GTSRB).
KW - Capsule network (CapsNet)
KW - Deep Neural Network (DNN)
KW - design
KW - design space exploration (DSE)
KW - hardware accelerator
KW - inference
KW - non-dominated sorting genetic algorithm~(NSGA-II)
UR - http://www.scopus.com/inward/record.url?scp=85101800294&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101800294&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2021.3059518
DO - 10.1109/TVLSI.2021.3059518
M3 - Article
AN - SCOPUS:85101800294
VL - 29
SP - 716
EP - 729
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SN - 1063-8210
IS - 4
M1 - 9363276
ER -