TY - JOUR
T1 - Estimating entropy and entropy norm on data streams
AU - Chakrabarti, Amit
AU - Ba, Khanh Do
AU - Muthukrishnan, S.
N1 - Publisher Copyright:
© A K Peters, Ltd.
PY - 2006/1/1
Y1 - 2006/1/1
N2 - We consider the problem of computing information-theoretic functions, such as entropy, on a data stream, using sublinear space. Our first result deals with a measure we call the entropy norm of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic-space, one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold. Our second group of results is for estimating the empirical entropy of an input stream. We first present a sublinear-space, one-pass algorithm for this problem. For a stream of m items and a given real parameter α, our algorithm uses space Õ(m2α) and provides an approximation of 1/α in the worst case and (1+ε) in “most” cases. We then present a two-pass, polylogarithmic-space, (1+ε)-approximation algorithm. All our algorithms are quite simple.
AB - We consider the problem of computing information-theoretic functions, such as entropy, on a data stream, using sublinear space. Our first result deals with a measure we call the entropy norm of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic-space, one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold. Our second group of results is for estimating the empirical entropy of an input stream. We first present a sublinear-space, one-pass algorithm for this problem. For a stream of m items and a given real parameter α, our algorithm uses space Õ(m2α) and provides an approximation of 1/α in the worst case and (1+ε) in “most” cases. We then present a two-pass, polylogarithmic-space, (1+ε)-approximation algorithm. All our algorithms are quite simple.
UR - http://www.scopus.com/inward/record.url?scp=79953650863&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953650863&partnerID=8YFLogxK
U2 - 10.1080/15427951.2006.10129117
DO - 10.1080/15427951.2006.10129117
M3 - Article
AN - SCOPUS:79953650863
SN - 1542-7951
VL - 3
SP - 63
EP - 78
JO - Internet Mathematics
JF - Internet Mathematics
IS - 1
ER -