TY - GEN
T1 - Estimating rarity and similarity over data stream windows
AU - Datar, Mayur
AU - Muthukrishnan, S.
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2002.
PY - 2002
Y1 - 2002
N2 - In the windowed data stream model, we observe items coming in over time. At any time t, we consider the window of the last N observations at-(N-1), at-(N-2),…, at, each ai Є{1,…, u}; we are required to support queries about the data in the window. A crucial restriction is that we are only allowed o(N) (often polylogarithmic in N) storage space, so not all items within the window can be archived. We study two basic problems in the windowed data stream model. The first is the estimation of the rarity of items in the window. Our second problem is one of estimating similarity between two data stream windows using the Jacard’s coefficient. The problems of estimating rarity and similarity have many applications in mining massive data sets. We present novel, simple algorithms for estimating rarity and similarity on windowed data streams, accurate up to factor 1 ± e using space only logarithmic in the window size.
AB - In the windowed data stream model, we observe items coming in over time. At any time t, we consider the window of the last N observations at-(N-1), at-(N-2),…, at, each ai Є{1,…, u}; we are required to support queries about the data in the window. A crucial restriction is that we are only allowed o(N) (often polylogarithmic in N) storage space, so not all items within the window can be archived. We study two basic problems in the windowed data stream model. The first is the estimation of the rarity of items in the window. Our second problem is one of estimating similarity between two data stream windows using the Jacard’s coefficient. The problems of estimating rarity and similarity have many applications in mining massive data sets. We present novel, simple algorithms for estimating rarity and similarity on windowed data streams, accurate up to factor 1 ± e using space only logarithmic in the window size.
UR - http://www.scopus.com/inward/record.url?scp=84938057127&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938057127&partnerID=8YFLogxK
U2 - 10.1007/3-540-45749-6_31
DO - 10.1007/3-540-45749-6_31
M3 - Conference contribution
AN - SCOPUS:84938057127
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 323
EP - 335
BT - Algorithms - ESA 2002 - 10th Annual European Symposium, Proceedings
A2 - Möhring, Rolf
A2 - Raman, Rajeev
PB - Springer Verlag
T2 - 10th Annual European Symposium on Algorithms, ESA 2002
Y2 - 17 September 2002 through 21 September 2002
ER -