TY - GEN
T1 - Murmuration
T2 - 53rd International Conference on Parallel Processing, ICPP 2024
AU - Lin, Jieyu
AU - Li, Minghao
AU - Zhang, Sai Qian
AU - Leon-Garcia, Alberto
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/8/12
Y1 - 2024/8/12
N2 - The proliferation of Virtual and Augmented Reality (VR/AR) and the Internet of Things (IoT) applications is driving the demand for efficient Deep Neural Network (DNN) inference at the edge. These applications often impose stringent Service Level Objectives (SLOs), such as latency or accuracy, that must be met under the constraints of limited resources and dynamic network conditions. In this study, we explore a novel approach to DNN inference across multiple edge devices, incorporating both model customization and partitioning dynamically, to better align with these constraints and SLOs. Unlike conventional methods that employ a single fixed DNN network, our system, termed Murmuration, combines one-shot Neural Architecture Search (NAS) and Reinforcement Learning (RL) to dynamically customize and partition DNN models. This approach adapts in real-time to the capabilities of the edge devices, network conditions, and varying SLO requirements. The design of Murmuration allows it to effectively navigate the large search space defined by DNN models, network delays, and bandwidth, offering a significant improvement in managing trade-offs between accuracy and latency. We implemented and evaluated Murmuration using a variety of edge devices. The results show that our approach outperforms state-of-the-art methods in terms of inference accuracy by up to 5% or latency by up to 6.7×. With the flexibility of model customization, Murmuration can meet SLO under a wider range of network delays and bandwidths, improving SLO compliance rate by up to 52%.
AB - The proliferation of Virtual and Augmented Reality (VR/AR) and the Internet of Things (IoT) applications is driving the demand for efficient Deep Neural Network (DNN) inference at the edge. These applications often impose stringent Service Level Objectives (SLOs), such as latency or accuracy, that must be met under the constraints of limited resources and dynamic network conditions. In this study, we explore a novel approach to DNN inference across multiple edge devices, incorporating both model customization and partitioning dynamically, to better align with these constraints and SLOs. Unlike conventional methods that employ a single fixed DNN network, our system, termed Murmuration, combines one-shot Neural Architecture Search (NAS) and Reinforcement Learning (RL) to dynamically customize and partition DNN models. This approach adapts in real-time to the capabilities of the edge devices, network conditions, and varying SLO requirements. The design of Murmuration allows it to effectively navigate the large search space defined by DNN models, network delays, and bandwidth, offering a significant improvement in managing trade-offs between accuracy and latency. We implemented and evaluated Murmuration using a variety of edge devices. The results show that our approach outperforms state-of-the-art methods in terms of inference accuracy by up to 5% or latency by up to 6.7×. With the flexibility of model customization, Murmuration can meet SLO under a wider range of network delays and bandwidths, improving SLO compliance rate by up to 52%.
KW - Distributed Inference
KW - DNN
KW - Model Partitioning
KW - One-shot NAS
KW - Reinforcement Learning
KW - Service Level Objective (SLO)
UR - http://www.scopus.com/inward/record.url?scp=85202443883&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85202443883&partnerID=8YFLogxK
U2 - 10.1145/3673038.3673154
DO - 10.1145/3673038.3673154
M3 - Conference contribution
AN - SCOPUS:85202443883
T3 - ACM International Conference Proceeding Series
SP - 792
EP - 801
BT - 53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings
PB - Association for Computing Machinery
Y2 - 12 August 2024 through 15 August 2024
ER -