A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ∥A - H∥ ≤ (1 + ε) ∥A - Hopt∥, where the error ∥A - H∥ is either ℓ1 (absolute) or ℓ2 (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B, log(N), log ∥A∥, 1/ε). Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accuracy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under ℓ2 error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.
|Original language||English (US)|
|Number of pages||10|
|Journal||Conference Proceedings of the Annual ACM Symposium on Theory of Computing|
|State||Published - 2002|
|Event||Proceedings of the 34th Annual ACM Symposium on Theory of Computing - Montreal, Que., Canada|
Duration: May 19 2002 → May 21 2002
ASJC Scopus subject areas