TY - GEN
T1 - CAMEL
T2 - 30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
AU - Zhang, Sai Qian
AU - Tambe, Thierry
AU - Cuevas, Nestor
AU - Wei, Gu Yeon
AU - Brooks, David
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - On-device learning allows AI models to adapt to user data, thereby enhancing service quality on edge platforms. However, training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs). To address these issues, we propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data. In comparison to static random-access memory (SRAM), eDRAM provides higher storage density and lower leakage power, resulting in reduced access cost and power leakage. Nevertheless, to maintain the integrity of the stored data, periodic power-hungry refresh operations could potentially degrade system performance. To minimize the occurrence of expensive eDRAM refresh operations, it is beneficial to shorten the lifetime of stored data during the training process. To achieve this, we adopt the principles of algorithm and hardware co-design, introducing a family of reversible DNN architectures that effectively decrease data lifetime and storage costs throughout training. Additionally, we present a highly efficient on-device training engine named CAMEL, which leverages eDRAM as the primary on-chip memory. This engine enables efficient on-device training with significantly reduced memory usage and off-chip DRAM traffic while maintaining superior training accuracy. We evaluate our CAMEL system on multiple DNNs with different datasets, demonstrating a 2.5× speedup of the training process and 2.8× training energy savings than the other baseline hardware platforms.
AB - On-device learning allows AI models to adapt to user data, thereby enhancing service quality on edge platforms. However, training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs). To address these issues, we propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data. In comparison to static random-access memory (SRAM), eDRAM provides higher storage density and lower leakage power, resulting in reduced access cost and power leakage. Nevertheless, to maintain the integrity of the stored data, periodic power-hungry refresh operations could potentially degrade system performance. To minimize the occurrence of expensive eDRAM refresh operations, it is beneficial to shorten the lifetime of stored data during the training process. To achieve this, we adopt the principles of algorithm and hardware co-design, introducing a family of reversible DNN architectures that effectively decrease data lifetime and storage costs throughout training. Additionally, we present a highly efficient on-device training engine named CAMEL, which leverages eDRAM as the primary on-chip memory. This engine enables efficient on-device training with significantly reduced memory usage and off-chip DRAM traffic while maintaining superior training accuracy. We evaluate our CAMEL system on multiple DNNs with different datasets, demonstrating a 2.5× speedup of the training process and 2.8× training energy savings than the other baseline hardware platforms.
UR - http://www.scopus.com/inward/record.url?scp=85190276585&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190276585&partnerID=8YFLogxK
U2 - 10.1109/HPCA57654.2024.00071
DO - 10.1109/HPCA57654.2024.00071
M3 - Conference contribution
AN - SCOPUS:85190276585
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 861
EP - 875
BT - Proceedings - 2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
PB - IEEE Computer Society
Y2 - 2 March 2024 through 6 March 2024
ER -