Murmuration: On-the-fly DNN Adaptation for SLO-Aware Distributed Inference in Dynamic Edge Environments

Jieyu Lin, Minghao Li, Sai Qian Zhang, Alberto Leon-Garcia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The proliferation of Virtual and Augmented Reality (VR/AR) and the Internet of Things (IoT) applications is driving the demand for efficient Deep Neural Network (DNN) inference at the edge. These applications often impose stringent Service Level Objectives (SLOs), such as latency or accuracy, that must be met under the constraints of limited resources and dynamic network conditions. In this study, we explore a novel approach to DNN inference across multiple edge devices, incorporating both model customization and partitioning dynamically, to better align with these constraints and SLOs. Unlike conventional methods that employ a single fixed DNN network, our system, termed Murmuration, combines one-shot Neural Architecture Search (NAS) and Reinforcement Learning (RL) to dynamically customize and partition DNN models. This approach adapts in real-time to the capabilities of the edge devices, network conditions, and varying SLO requirements. The design of Murmuration allows it to effectively navigate the large search space defined by DNN models, network delays, and bandwidth, offering a significant improvement in managing trade-offs between accuracy and latency. We implemented and evaluated Murmuration using a variety of edge devices. The results show that our approach outperforms state-of-the-art methods in terms of inference accuracy by up to 5% or latency by up to 6.7×. With the flexibility of model customization, Murmuration can meet SLO under a wider range of network delays and bandwidths, improving SLO compliance rate by up to 52%.

Original languageEnglish (US)
Title of host publication53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings
PublisherAssociation for Computing Machinery
Pages792-801
Number of pages10
ISBN (Electronic)9798400708428
DOIs
StatePublished - Aug 12 2024
Event53rd International Conference on Parallel Processing, ICPP 2024 - Gotland, Sweden
Duration: Aug 12 2024Aug 15 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference53rd International Conference on Parallel Processing, ICPP 2024
Country/TerritorySweden
CityGotland
Period8/12/248/15/24

Keywords

  • Distributed Inference
  • DNN
  • Model Partitioning
  • One-shot NAS
  • Reinforcement Learning
  • Service Level Objective (SLO)

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint

Dive into the research topics of 'Murmuration: On-the-fly DNN Adaptation for SLO-Aware Distributed Inference in Dynamic Edge Environments'. Together they form a unique fingerprint.

Cite this