Estimating entropy and entropy norm on data streams

Amit Chakrabarti, Khanh Do Ba, S. Muthukrishnan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the "entropy norm" of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic space one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold. Our second group of results are for estimating the empirical entropy of an input stream. We first present a sublinear space one-pass algorithm for this problem. For a stream of m items and a given real parameter α, our algorithm uses space Õ(m) and provides an approximation of 1/α in the worst case and (1 + ε) in "most" cases. We then present a two-pass polylogarithmic space (1+ε)-approximation algorithm. All our algorithms are quite simple.

    Original languageEnglish (US)
    Title of host publicationSTACS 2006
    Subtitle of host publication23rd Annual Symposium on Theoretical Aspects of Computer Science, Proceedings
    Pages196-205
    Number of pages10
    DOIs
    StatePublished - 2006
    EventSTACS 2006: 23rd Annual Symposium on Theoretical Aspects of Computer Science, Proceedings - Marseille, France
    Duration: Feb 23 2006Feb 25 2006

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume3884 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    ConferenceSTACS 2006: 23rd Annual Symposium on Theoretical Aspects of Computer Science, Proceedings
    CountryFrance
    CityMarseille
    Period2/23/062/25/06

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Fingerprint Dive into the research topics of 'Estimating entropy and entropy norm on data streams'. Together they form a unique fingerprint.

    Cite this