TY - JOUR
T1 - Fast, small-space algorithms for approximate histogram maintenance
AU - Gilbert, Anna C.
AU - Guha, Sudipto
AU - Indyk, Piotr
AU - Kotidis, Yannis
AU - Muthukrishnan, S.
AU - Strauss, Martin J.
PY - 2002
Y1 - 2002
N2 - A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ∥A - H∥ ≤ (1 + ε) ∥A - Hopt∥, where the error ∥A - H∥ is either ℓ1 (absolute) or ℓ2 (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B, log(N), log ∥A∥, 1/ε). Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accuracy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under ℓ2 error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.
AB - A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ∥A - H∥ ≤ (1 + ε) ∥A - Hopt∥, where the error ∥A - H∥ is either ℓ1 (absolute) or ℓ2 (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B, log(N), log ∥A∥, 1/ε). Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accuracy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under ℓ2 error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.
UR - http://www.scopus.com/inward/record.url?scp=0036038681&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036038681&partnerID=8YFLogxK
U2 - 10.1145/509907.509966
DO - 10.1145/509907.509966
M3 - Conference article
AN - SCOPUS:0036038681
SN - 0734-9025
SP - 389
EP - 398
JO - Conference Proceedings of the Annual ACM Symposium on Theory of Computing
JF - Conference Proceedings of the Annual ACM Symposium on Theory of Computing
T2 - Proceedings of the 34th Annual ACM Symposium on Theory of Computing
Y2 - 19 May 2002 through 21 May 2002
ER -