TY - JOUR
T1 - NASCaps
T2 - 39th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2020
AU - Marchisio, Alberto
AU - Massa, Andrea
AU - Mrazek, Vojtech
AU - Bussolino, Beatrice
AU - Martina, Maurizio
AU - Shafique, Muhammad
N1 - Funding Information:
process was based on a random initial parent population that has been newly generated at each search. Moreover, the small size of the initial parent population may have contributed to a nonconvergence of the four dataset-specific searches that have been performed. Also, not each one of the four searches reached the same generation at the end of the experiments. 4.5 Summary of Key Results The above results show how our NASCaps framework has been able to explore multiple solutions with diverse tradeoffs, thanks to the usage of an evolutionary algorithm for a multi-objective search. It has been possible to generate and test 690 candidate networks for the four dataset-specific searches. Using four high-end NVIDIA Tesla V100-SXM2, our NASCaps framework required 90 GPU-hours to test the partially-trained candidate networks. The new 64 Pareto-optimal architectures have been fully-trained, requiring in total additional 682 GPU-hours (i.e., 28 days). Our approach allowed to outperform many objectives of the SoA solutions when performing the full-training, despite the strict time constraints applied to the single searches. In summary, our framework allowed to: • Derive some interesting architectures, such as the above-discussed NASCaps-C10-best that reached an almost similar accuracy as of the SoA, while significantly improving all other objectives of the search, i.e., energy, memory and latency. • Perform early candidate selection while still achieving high accuracy after performing the full training. • Achieve good transferability between different datasets, as demonstrated by the fact that the NASCaps-C10-best DNN, which is found for the CIFAR10-specific search, outperforms other dataset-specific searches also on other datasets. 5 CONCLUSION In this paper, we presented NASCaps, a framework for the Neural Architecture Search (NAS) of Convolutional Capsule Networks (CapsNets). The set of optimization goals for our framework are the network accuracy and the hardware efficiency, expressed in terms of energy consumption, memory footprint, and latency, when executed on the specialized hardware accelerators. We performed a large-scale NAS using GPU-HPC nodes with multiple Tesla V100 GPUs, and found interesting DNN solutions that are hardware-efficient yet highly-accurate, when compared to SoA solutions. Our framework is even more beneficial when the design times are short, training resources at the design center are limited, and the DNN design is subjected to short training durations. Our NASCaps framework can ease the deployment of DNNs based on capsule layers in resource-constrained IoT/Edge devices. We will open-source our framework at https://github.com/ehw-fit/nascaps. ACKNOWLEDGMENTS This work has been partially supported by the Doctoral College Resilient Embedded Systems which is run jointly by TU Wien’s Faculty of Informatics and FH-Technikum Wien, and partially by Czech Science Foundation project GJ20-02328Y. The computational resources were supported by The Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project “e-Infrastructure CZ – LM2018140”.
Publisher Copyright:
© 2020 Association on Computer Machinery.
PY - 2020/11/2
Y1 - 2020/11/2
N2 - Deep Neural Networks (DNNs) have made significant improvements to reach the desired accuracy to be employed in a wide variety of Machine Learning (ML) applications. Recently the Google Brain's team demonstrated the ability of Capsule Networks (CapsNets) to encode and learn spatial correlations between different input features, thereby obtaining superior learning capabilities compared to traditional (i.e., non-capsule based) DNNs. However, designing CapsNets using conventional methods is a tedious job and incurs significant training effort. Recent studies have shown that powerful methods to automatically select the best/optimal DNN model configuration for a given set of applications and a training dataset are based on the Neural Architecture Search (NAS) algorithms. Moreover, due to their extreme computational and memory requirements, DNNs are employed using the specialized hardware accelerators in IoT-Edge/CPS devices. In this paper, we propose NASCaps, an automated framework for the hardware-aware NAS of different types of DNNs, covering both traditional convolutional DNNs and CapsNets. We study the efficacy of deploying a multi-objective Genetic Algorithm (e.g., based on the NSGA-II algorithm). The proposed framework can jointly optimize the network accuracy and the corresponding hardware efficiency, expressed in terms of energy, memory, and latency of a given hardware accelerator executing the DNN inference. Besides supporting the traditional DNN layers (such as, convolutional and fully-connected), our framework is the first to model and supports the specialized capsule layers and dynamic routing in the NAS-flow. We evaluate our framework on different datasets, generating different network configurations, and demonstrate the tradeoffs between the different output metrics. We will open-source the complete framework and configurations of the Pareto-optimal architectures at https://github.com/ehw-fit/nascaps.
AB - Deep Neural Networks (DNNs) have made significant improvements to reach the desired accuracy to be employed in a wide variety of Machine Learning (ML) applications. Recently the Google Brain's team demonstrated the ability of Capsule Networks (CapsNets) to encode and learn spatial correlations between different input features, thereby obtaining superior learning capabilities compared to traditional (i.e., non-capsule based) DNNs. However, designing CapsNets using conventional methods is a tedious job and incurs significant training effort. Recent studies have shown that powerful methods to automatically select the best/optimal DNN model configuration for a given set of applications and a training dataset are based on the Neural Architecture Search (NAS) algorithms. Moreover, due to their extreme computational and memory requirements, DNNs are employed using the specialized hardware accelerators in IoT-Edge/CPS devices. In this paper, we propose NASCaps, an automated framework for the hardware-aware NAS of different types of DNNs, covering both traditional convolutional DNNs and CapsNets. We study the efficacy of deploying a multi-objective Genetic Algorithm (e.g., based on the NSGA-II algorithm). The proposed framework can jointly optimize the network accuracy and the corresponding hardware efficiency, expressed in terms of energy, memory, and latency of a given hardware accelerator executing the DNN inference. Besides supporting the traditional DNN layers (such as, convolutional and fully-connected), our framework is the first to model and supports the specialized capsule layers and dynamic routing in the NAS-flow. We evaluate our framework on different datasets, generating different network configurations, and demonstrate the tradeoffs between the different output metrics. We will open-source the complete framework and configurations of the Pareto-optimal architectures at https://github.com/ehw-fit/nascaps.
KW - Accuracy
KW - Capsule Networks
KW - Deep Neural Networks
KW - Design Space
KW - DNNs
KW - Energy Efficiency
KW - Evolutionary Algorithms
KW - Genetic Algorithms
KW - Hardware Accelerators
KW - Latency
KW - Memory
KW - Multi-Objective
KW - Neural Architecture Search
KW - Optimization
UR - http://www.scopus.com/inward/record.url?scp=85097942672&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097942672&partnerID=8YFLogxK
U2 - 10.1145/3400302.3415731
DO - 10.1145/3400302.3415731
M3 - Conference article
AN - SCOPUS:85097942672
SN - 1092-3152
VL - 2020-November
JO - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
JF - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
M1 - 9256635
Y2 - 2 November 2020 through 5 November 2020
ER -