Abstract
Implementations of Deep Neural Networks (DNNs) or Deep Learning (DL) for embedded applications may improve the users’ quality of life, as DL has become a prominent solution for many machine learning (ML) tasks, like personalized healthcare assistance. Such implementations require high energy efficiency since embedded applications usually have tight operational constraints, such as small memory and low operational power/energy. Therefore, specialized hardware accelerators are typically employed to expedite the DL inference. However, previous works have shown that DL accelerators still suffer from high energy consumption from the DRAM-based off-chip memory accesses, thereby hindering the embedded DL implementations. In this chapter, we discuss our design methodology for optimizing the energy consumption of DRAM accesses for the DL accelerators targeting embedded applications. Our design methodology employs an exploration technique to find the data partitioning and scheduling that offer minimum DRAM accesses for the given DNN model and exploits the low latency DRAMs to efficiently perform data accesses that incur minimum DRAM access energy.
Original language | English (US) |
---|---|
Title of host publication | Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing |
Subtitle of host publication | Hardware Architectures |
Publisher | Springer International Publishing |
Pages | 175-198 |
Number of pages | 24 |
ISBN (Electronic) | 9783031195686 |
ISBN (Print) | 9783031195679 |
DOIs | |
State | Published - Jan 1 2023 |
ASJC Scopus subject areas
- General Computer Science
- General Engineering
- General Social Sciences