Space- and time-efficient deterministic algorithms for biased quantiles over data streams

Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Skew is prevalent in data streams, and should be taken into account by algorithms that analyze the data. The problem of finding "biased quantiles"-that is, approximate quantiles which must be more accurate for more extreme values-is a framework for summarizing such skewed data on data streams. We present the first deterministic algorithms for answering biased quantiles queries accurately with small-sublinear in the input size-space and time bounds in one pass. The space bound is near-optimal, and the amortized update cost is close to constant, making it practical for handling high speed network data streams. We not only demonstrate theoretical properties of the algorithm, but also show it uses less space than existing methods in many practical settings, and is fast to maintain.

    Original languageEnglish (US)
    Title of host publicationProceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006
    Pages263-272
    Number of pages10
    DOIs
    StatePublished - 2006
    Event25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006 - Chicago, IL, United States
    Duration: Jun 26 2006Jun 28 2006

    Publication series

    NameProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

    Other

    Other25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006
    Country/TerritoryUnited States
    CityChicago, IL
    Period6/26/066/28/06

    Keywords

    • Biased quantiles
    • Data stream algorithms

    ASJC Scopus subject areas

    • Software
    • Information Systems
    • Hardware and Architecture

    Fingerprint

    Dive into the research topics of 'Space- and time-efficient deterministic algorithms for biased quantiles over data streams'. Together they form a unique fingerprint.

    Cite this