The power of choice in data-aware cluster scheduling

Shivaram Venkataraman, Aurojit Panda, Ganesh Ananthanarayanan, Michael J. Franklin, Ion Stoica

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Providing timely results in the face of rapid growth in data volumes has become important for analytical frameworks. For this reason, frameworks increasingly operate on only a subset of the input data. A key property of such sampling is that combinatorially many subsets of the input are present. We present KMN, a system that leverages these choices to perform data-aware scheduling, i.e., minimize time taken by tasks to read their inputs, for a DAG of tasks. KMN not only uses choices to co-locate tasks with their data but also percolates such combinatorial choices to downstream tasks in the DAG by launching a few additional tasks at every upstream stage. Evaluations using workloads from Facebook and Conviva on a 100-machine EC2 cluster show that KMN reduces average job duration by 81% using just 5% additional resources.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014
PublisherUSENIX Association
Pages301-316
Number of pages16
ISBN (Electronic)9781931971164
StatePublished - Jan 1 2014
Event11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014 - Broomfield, United States
Duration: Oct 6 2014Oct 8 2014

Publication series

NameProceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014

Conference

Conference11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014
Country/TerritoryUnited States
CityBroomfield
Period10/6/1410/8/14

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'The power of choice in data-aware cluster scheduling'. Together they form a unique fingerprint.

Cite this