Efficient approximation of correlated sums on data streams

Rohit Ananthakrishna, Abhinandan Das, Johannes Gehrke, Flip Korn, S. Muthukrishnan, Divesh Srivastava

    Research output: Contribution to journalArticlepeer-review

    Abstract

    In many applications such as IP network management, data arrives in streams and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries formed by composing basic aggregates on (x, y) pairs and are of the form SUM{g(y) : x ≤ f(AGG(x))}, where AGG(x) can be any basic aggregate and f(), g() are user-specified functions. CS-aggregates cannot be computed exactly in one pass through a data stream using limited storage; hence, we study the problem of computing approximate CS-aggregates. We guarantee a priori error bounds when AGG(x) can be computed in limited space (e.g., MIN, MAX, AVG), using two variants of Greenwald and Khanna's summary structure for the approximate computation of quantiles. Using real data sets, we experimentally demonstrate that an adaptation of the quantile summary structure uses much less space, and is significantly faster, than a more direct use of the quantile summary structure, for the same a posteriori error bounds. Finally, we prove that, when AGG(x) is a quantile (which cannot be computed over a data stream in limited space), the error of a CS-aggregate can be arbitrarily large.

    Original languageEnglish (US)
    Pages (from-to)569-572
    Number of pages4
    JournalIEEE Transactions on Knowledge and Data Engineering
    Volume15
    Issue number3
    DOIs
    StatePublished - May 2003

    Keywords

    • A priori error bounds
    • Approximation
    • Correlated aggregates
    • Data streams
    • IP network management
    • Summary structures

    ASJC Scopus subject areas

    • Information Systems
    • Computer Science Applications
    • Computational Theory and Mathematics

    Fingerprint

    Dive into the research topics of 'Efficient approximation of correlated sums on data streams'. Together they form a unique fingerprint.

    Cite this