Effective computation of biased quantiles over data streams

Graham Cormode, S. Muthukrishnan, Flip Korn, Divesh Srivastava

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two problems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively, using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the "high-biased" and the "targeted" quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures. Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over high-speed data streams.

    Original languageEnglish (US)
    Title of host publicationProceedings - 21st International Conference on Data Engineering, ICDE 2005
    Pages20-31
    Number of pages12
    DOIs
    StatePublished - 2005
    Event21st International Conference on Data Engineering, ICDE 2005 - Tokyo, Japan
    Duration: Apr 5 2005Apr 8 2005

    Publication series

    NameProceedings - International Conference on Data Engineering
    ISSN (Print)1084-4627

    Other

    Other21st International Conference on Data Engineering, ICDE 2005
    Country/TerritoryJapan
    CityTokyo
    Period4/5/054/8/05

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Information Systems

    Fingerprint

    Dive into the research topics of 'Effective computation of biased quantiles over data streams'. Together they form a unique fingerprint.

    Cite this