Finding frequent items in data streams

Moses Charikar, Kevin Chen, Martin Farach-Colton

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We present a 1-pass algorithm for estimating the most frequent items in a data stream using very limited storage space. Our method relies on a novel data structure called a count sketch, which allows us to estimate the frequencies of all the items in the stream. Our algorithm achieves better space bounds than the previous best known algorithms for this problem for many natural distributions on the item frequencies. In addition, our algorithm leads directly to a 2-pass algorithm for the problem of estimating the items with the largest (absolute) change in frequency between two data streams. To our knowledge, this problem has not been previously studied in the literature.

    Original languageEnglish (US)
    Title of host publicationAutomata, Languages and Programming - 29th International Colloquium, ICALP 2002, Proceedings
    EditorsPeter Widmayer, Stephan Eidenbenz, Francisco Triguero, Rafael Morales, Ricardo Conejo, Matthew Hennessy
    PublisherSpringer Verlag
    Pages693-703
    Number of pages11
    ISBN (Print)3540438645, 9783540438649
    DOIs
    StatePublished - 2002
    Event29th International Colloquium on Automata, Languages, and Programming, ICALP 2002 - Malaga, Spain
    Duration: Jul 8 2002Jul 13 2002

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume2380 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other29th International Colloquium on Automata, Languages, and Programming, ICALP 2002
    Country/TerritorySpain
    CityMalaga
    Period7/8/027/13/02

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Finding frequent items in data streams'. Together they form a unique fingerprint.

    Cite this