Abstract
Many hardware accelerators for Convolutional Neural Networks (CNNs) focus on accelerating only the convolutional layers but do not prioritize accelerating the fully connected layers. Therefore, they lack a synergistic optimization of the hardware architecture and various dataflows for the complete CNN model, hence hindering the accelerators from achieving higher performance/energy efficiency. Such problems are more challenging when the CNN acceleration is performed for resource- and energy-constrained embedded systems. Toward this, we propose a novel Massively Parallel Neural Processing Array (MPNA) accelerator that integrates two heterogeneous systolic arrays and highly optimized dataflows to expedite both the convolutional and fully connected layers. Our optimized dataflows fully exploit the available off-chip memory bandwidth and data reuse of all data types (i.e., weights, input and output activations), thereby enabling our MPNA to operate under low power, while achieving high performance and energy efficiency. We synthesize our MPNA accelerator using the ASIC design flow for a 28-nm technology and perform functional and timing validation using real-world CNNs. Our MPNA achieves 149.7GOPS/W at 280 MHz and consumes 239 mW. The experimental results show that our MPNA accelerator provides up to 2× performance improvement and 51% energy saving compared to the baseline accelerator, thereby making our MPNA suitable for embedded systems.
Original language | English (US) |
---|---|
Title of host publication | Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing |
Subtitle of host publication | Hardware Architectures |
Publisher | Springer International Publishing |
Pages | 3-24 |
Number of pages | 22 |
ISBN (Electronic) | 9783031195686 |
ISBN (Print) | 9783031195679 |
DOIs | |
State | Published - Jan 1 2023 |
ASJC Scopus subject areas
- General Computer Science
- General Engineering
- General Social Sciences