Fast, small-space algorithms for approximate histogram maintenance

Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, Martin J. Strauss

    Research output: Contribution to journalConference articlepeer-review

    Abstract

    A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ∥A - H∥ ≤ (1 + ε) ∥A - Hopt∥, where the error ∥A - H∥ is either ℓ1 (absolute) or ℓ2 (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B, log(N), log ∥A∥, 1/ε). Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accuracy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under ℓ2 error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.

    Original languageEnglish (US)
    Pages (from-to)389-398
    Number of pages10
    JournalConference Proceedings of the Annual ACM Symposium on Theory of Computing
    DOIs
    StatePublished - 2002
    EventProceedings of the 34th Annual ACM Symposium on Theory of Computing - Montreal, Que., Canada
    Duration: May 19 2002May 21 2002

    ASJC Scopus subject areas

    • Software

    Fingerprint Dive into the research topics of 'Fast, small-space algorithms for approximate histogram maintenance'. Together they form a unique fingerprint.

    Cite this