Finding frequent items in data streams

Moses Charikar, Kevin Chen, Martin Farach-Colton

    Research output: Contribution to journalConference articlepeer-review

    Abstract

    We present a 1-pass algorithm for estimating the most frequent items in a data stream using limited storage space. Our method relies on a data structure called a COUNT SKETCH, which allows us to reliably estimate the frequencies of frequent items in the stream. Our algorithm achieves better space bounds than the previously known best algorithms for this problem for several natural distributions on the item frequencies. In addition, our algorithm leads directly to a 2-pass algorithm for the problem of estimating the items with the largest (absolute) change in frequency between two data streams. To our knowledge, this latter problem has not been previously studied in the literature.

    Original languageEnglish (US)
    Pages (from-to)3-15
    Number of pages13
    JournalTheoretical Computer Science
    Volume312
    Issue number1
    DOIs
    StatePublished - Jan 26 2004
    EventAutomata, Languages and Programming - Malaga, Spain
    Duration: Jul 8 2002Jul 13 2002

    Keywords

    • Approximation
    • Frequent items
    • Streaming algorithm

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Finding frequent items in data streams'. Together they form a unique fingerprint.

    Cite this