TY - GEN
T1 - Finding hierarchical heavy hitters in data stream
AU - Cormode, Graham
AU - Korn, Flip
AU - Muthukrishnan, S.
AU - Srivastava, Divesh
PY - 2003
Y1 - 2003
N2 - Aggregation along hierarchies is a critical summary technique in a large variety of online applications including decision support and network management (e.g., IP clustering, denial-of-service attack monitoring). De spite the amount of recent study that has been dedicated to online aggregation on sets (e.g. quantiles, hot items), surprisingly little attention has been paid to summarizing hierarchi cal structure in stream data. The problem we study in this paper is that of finding Hierarchical Heavy Hitters (HHH): given a hierarchy and a fraction Φ, we want to find all HHH nodes that have a total number of descendants in the data stream no smaller than Φ of the total number of elements in the data stream, after discounting the descendant nodes that are HHH nodes. The resulting summary gives a topological "cartogram" of the hierarchical data. We present deterministic and randomized algorithms for finding HHHs, which builds upon existing techniques by incorporating the hierarchy into the algorithms. Our experiments demonstrate several factors of improvement in accuracy over the straightforward approach, which is due to making algorithms hierarchy-aware.
AB - Aggregation along hierarchies is a critical summary technique in a large variety of online applications including decision support and network management (e.g., IP clustering, denial-of-service attack monitoring). De spite the amount of recent study that has been dedicated to online aggregation on sets (e.g. quantiles, hot items), surprisingly little attention has been paid to summarizing hierarchi cal structure in stream data. The problem we study in this paper is that of finding Hierarchical Heavy Hitters (HHH): given a hierarchy and a fraction Φ, we want to find all HHH nodes that have a total number of descendants in the data stream no smaller than Φ of the total number of elements in the data stream, after discounting the descendant nodes that are HHH nodes. The resulting summary gives a topological "cartogram" of the hierarchical data. We present deterministic and randomized algorithms for finding HHHs, which builds upon existing techniques by incorporating the hierarchy into the algorithms. Our experiments demonstrate several factors of improvement in accuracy over the straightforward approach, which is due to making algorithms hierarchy-aware.
UR - http://www.scopus.com/inward/record.url?scp=85012186650&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85012186650&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85012186650
T3 - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
SP - 464
EP - 475
BT - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
A2 - Selinger, Patricia G.
A2 - Carey, Michael J.
A2 - Freytag, Johann Christoph
A2 - Abiteboul, Serge
A2 - Lockemann, Peter C.
A2 - Heuer, Andreas
PB - Morgan Kaufmann
T2 - 29th International Conference on Very Large Data Bases, VLDB 2003
Y2 - 9 September 2003 through 12 September 2003
ER -