On distributing symmetric streaming computations

Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Cliff Stein, Zoya Svitkina

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    A common approach for dealing with large data sets is to stream over the input in one pass, and perform computations using sublinear resources. For truly massive data sets, however, even making a single pass over the data is prohibitive. Therefore, streaming computations must be distributed over many machines. In practice, obtaining significant speedups using distributed computation has numerous challenges including synchronization, load balancing, overcoming processor failures, and data distribution. Successful systems in practice such as Google's MapReduce and Apache's Hadoop address these problems by only allowing a certain class of highly distributable tasks defined by local computations that can be applied in any order to the input. The fundamental question that arises is: How does the class of computational tasks supported by these systems differ from the class for which streaming solutions exist? We introduce a simple algorithmic model for massive, unordered, distributed (mud) computation, as implemented by these systems. We show that in principle, mud algorithms are equivalent in power to symmetric streaming algorithms. More precisely, we show that any symmetric (orderinvariant) function that can be computed by a streaming algorithm can also be computed by a mud algorithm, with comparable space and communication complexity. Our simulation uses Savitch's theorem and therefore has superpolynomial time complexity. We extend our simulation result to some natural classes of approximate and randomized streaming algorithms. We also give negative results, using communication complexity arguments to prove that extensions to private randomness, promise problems and indeterminate functions are impossible. We also introduce an extension of the mud model to multiple keys and multiple rounds.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms
    Pages710-719
    Number of pages10
    StatePublished - 2008
    Event19th Annual ACM-SIAM Symposium on Discrete Algorithms - San Francisco, CA, United States
    Duration: Jan 20 2008Jan 22 2008

    Publication series

    NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

    Other

    Other19th Annual ACM-SIAM Symposium on Discrete Algorithms
    Country/TerritoryUnited States
    CitySan Francisco, CA
    Period1/20/081/22/08

    ASJC Scopus subject areas

    • Software
    • General Mathematics

    Fingerprint

    Dive into the research topics of 'On distributing symmetric streaming computations'. Together they form a unique fingerprint.

    Cite this