What's Hot and What's Not: Tracking Most Frequent Items Dynamically

Graham Cormode, S. Muthukrishnan

    Research output: Contribution to conferencePaperpeer-review

    Abstract

    Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the "hot items" in the relation: those that appear many times (most frequently, or more than some threshold). For example, end-biased histograms keep the hot items as part of the histogram and are used in selectivity estimation. Hot items are used as simple outliers in data mining, and in anomaly detection in networking applications. We present a new algorithm for dynamically determining the hot items at any time in the relation that is undergoing deletion operations as well as inserts. Our algorithm maintains a small space data structure that monitors the transactions on the relation, and when required, quickly outputs all hot items, without rescanning the relation in the database. With user-specified probability, it is able to report all hot items. Our algorithm relies on the idea of "group testing", is simple to implement, and has provable quality, space and time guarantees. Previously known algorithms for this problem that make similar quality and performance guarantees can not handle deletions, and those that handle deletions can not make similar guarantees without rescanning the database. Our experiments with real and synthetic data shows that our algorithm is remarkably accurate in dynamically tracking the hot items independent of the rate of insertions and deletions.

    Original languageEnglish (US)
    Pages296-306
    Number of pages11
    StatePublished - 2003
    EventTwenty second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2003 - San Diego, CA, United States
    Duration: Jun 9 2003Jun 11 2003

    Conference

    ConferenceTwenty second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2003
    Country/TerritoryUnited States
    CitySan Diego, CA
    Period6/9/036/11/03

    ASJC Scopus subject areas

    • Software
    • Information Systems
    • Hardware and Architecture

    Fingerprint

    Dive into the research topics of 'What's Hot and What's Not: Tracking Most Frequent Items Dynamically'. Together they form a unique fingerprint.

    Cite this