TY - JOUR

T1 - Fast, small-space algorithms for approximate histogram maintenance

AU - Gilbert, Anna C.

AU - Guha, Sudipto

AU - Indyk, Piotr

AU - Kotidis, Yannis

AU - Muthukrishnan, S.

AU - Strauss, Martin J.

PY - 2002

Y1 - 2002

N2 - A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ∥A - H∥ ≤ (1 + ε) ∥A - Hopt∥, where the error ∥A - H∥ is either ℓ1 (absolute) or ℓ2 (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B, log(N), log ∥A∥, 1/ε). Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accuracy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under ℓ2 error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.

AB - A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ∥A - H∥ ≤ (1 + ε) ∥A - Hopt∥, where the error ∥A - H∥ is either ℓ1 (absolute) or ℓ2 (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B, log(N), log ∥A∥, 1/ε). Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accuracy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under ℓ2 error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.

UR - http://www.scopus.com/inward/record.url?scp=0036038681&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036038681&partnerID=8YFLogxK

U2 - 10.1145/509907.509966

DO - 10.1145/509907.509966

M3 - Conference article

AN - SCOPUS:0036038681

SN - 0734-9025

SP - 389

EP - 398

JO - Conference Proceedings of the Annual ACM Symposium on Theory of Computing

JF - Conference Proceedings of the Annual ACM Symposium on Theory of Computing

T2 - Proceedings of the 34th Annual ACM Symposium on Theory of Computing

Y2 - 19 May 2002 through 21 May 2002

ER -