Finding hierarchical heavy hitters in data stream

Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava

    Research output: Chapter in Book/Report/Conference proceedingConference contribution


    Aggregation along hierarchies is a critical summary technique in a large variety of online applications including decision support and network management (e.g., IP clustering, denial-of-service attack monitoring). De spite the amount of recent study that has been dedicated to online aggregation on sets (e.g. quantiles, hot items), surprisingly little attention has been paid to summarizing hierarchi cal structure in stream data. The problem we study in this paper is that of finding Hierarchical Heavy Hitters (HHH): given a hierarchy and a fraction Φ, we want to find all HHH nodes that have a total number of descendants in the data stream no smaller than Φ of the total number of elements in the data stream, after discounting the descendant nodes that are HHH nodes. The resulting summary gives a topological "cartogram" of the hierarchical data. We present deterministic and randomized algorithms for finding HHHs, which builds upon existing techniques by incorporating the hierarchy into the algorithms. Our experiments demonstrate several factors of improvement in accuracy over the straightforward approach, which is due to making algorithms hierarchy-aware.

    Original languageEnglish (US)
    Title of host publicationProceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
    EditorsPatricia G. Selinger, Michael J. Carey, Johann Christoph Freytag, Serge Abiteboul, Peter C. Lockemann, Andreas Heuer
    PublisherMorgan Kaufmann
    Number of pages12
    ISBN (Electronic)0127224424, 9780127224428
    StatePublished - 2003
    Event29th International Conference on Very Large Data Bases, VLDB 2003 - Berlin, Germany
    Duration: Sep 9 2003Sep 12 2003

    Publication series

    NameProceedings - 29th International Conference on Very Large Data Bases, VLDB 2003


    Other29th International Conference on Very Large Data Bases, VLDB 2003

    ASJC Scopus subject areas

    • Software
    • Information Systems
    • Hardware and Architecture
    • Information Systems and Management
    • Computer Science Applications
    • Computer Networks and Communications


    Dive into the research topics of 'Finding hierarchical heavy hitters in data stream'. Together they form a unique fingerprint.

    Cite this