Aurora: Adaptive Block Replication in Distributed File Systems

Qi Zhang, Sai Qian Zhang, Alberto Leon-Garcia, Raouf Boutaba

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Distributed file systems such as Google File System and Hadoop Distributed File System have been used to store large volumes of data in Cloud data centers. These systems divide data sets in blocks of fixed size and replicate them over multiple machines to achieve both reliability and efficiency. Recent studies have shown that data blocks tend to have a wide disparity in data popularity. In this context, the naive block replication schemes used by these systems often cause an uneven load distribution across machines, which reduces the overall I/O throughput of the system. While many replication algorithms have been proposed, existing solutions have not carefully studied the placement of data blocks that balances the load across machines, while ensuring node and rack-level reliability requirements are satisfied. In this paper, we study the dynamic data replication problem with the goal of balancing machine load while ensuring machine and rack-level reliability requirements are met. We propose several local search algorithms that provide constant approximation guarantees, yet simple and practical for implementation. We further present Aurora, a dynamic block placement mechanism that implements these algorithms in the Hadoop Distributed File System with minimal overhead. Through experiments using workload traces from Yahoo! and Facebook, we show Aurora reduces machine load imbalance by up to 26.9% compared to existing solutions, while satisfying node and rack-level reliability requirements.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages442-451
Number of pages10
ISBN (Electronic)9781467372145
DOIs
StatePublished - Jul 22 2015
Event35th IEEE International Conference on Distributed Computing Systems, ICDCS 2015 - Columbus, United States
Duration: Jun 29 2015Jul 2 2015

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2015-July

Conference

Conference35th IEEE International Conference on Distributed Computing Systems, ICDCS 2015
Country/TerritoryUnited States
CityColumbus
Period6/29/157/2/15

Keywords

  • approximation algorithms
  • Distributed file system
  • Hadoop
  • HDFS
  • local search

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Aurora: Adaptive Block Replication in Distributed File Systems'. Together they form a unique fingerprint.

Cite this