Entropy rate estimation for Markov chains with large state space

Yanjun Han, Jiantao Jiao, Chuan Zheng Lee, Tsachy Weissman, Yihong Wu, Tiancheng Yu

Research output: Contribution to journalConference articlepeer-review


Entropy estimation is one of the prototypical problems in distribution property testing. To consistently estimate the Shannon entropy of a distribution on S elements with independent samples, the optimal sample complexity scales sublinearly with S as Θ(logSS ) as shown by Valiant and Valiant [4]. Extending the theory and algorithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with S states from a sample path of n observations. We show that • Provided the Markov chain mixes not too slowly, i.e., the relaxation time is at most O(lnS3 S ), consistent estimation is achievable when nlog S . S2 • Provided the Markov chain has some slight dependency, i.e., the relaxation time is at least 1 + Ω(ln2SS ), consistent estimation is impossible when n . S2 logS . Under both assumptions, the optimal estimation accuracy is shown to be Θ(nlogS2S ). In comparison, the empirical entropy rate requires at least Ω(S2) samples to be consistent, even when the Markov chain is memoryless. In addition to synthetic experiments, we also apply the estimators that achieve the optimal sample complexity to estimate the entropy rate of the English language in the Penn Treebank and the Google One Billion Words corpora, which provides a natural benchmark for language modeling and relates it directly to the widely used perplexity measure.

Original languageEnglish (US)
Pages (from-to)9781-9792
Number of pages12
JournalAdvances in Neural Information Processing Systems
StatePublished - 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing


Dive into the research topics of 'Entropy rate estimation for Markov chains with large state space'. Together they form a unique fingerprint.

Cite this