TY - GEN
T1 - Does dirichlet prior smoothing solve the Shannon entropy estimation problem?
AU - Han, Yanjun
AU - Jiao, Jiantao
AU - Weissman, Tsachy
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/9/28
Y1 - 2015/9/28
N2 - The Dirichlet prior is widely used in estimating discrete distributions and functionals of discrete distributions. In terms of Shannon entropy estimation, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general they do not improve over the maximum likelihood estimator, which plugs-in the empirical distribution into the entropy functional. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation, as recently characterized by Jiao, Venkat, Han, and Weissman [1], and Wu and Yang [2]. The performance of the minimax rate-optimal estimator with n samples is essentially at least as good as that of the Dirichlet smoothed entropy estimators with n ln n samples. We harness the theory of approximation using positive linear operators for analyzing the bias of plug-in estimators for general functionals under arbitrary statistical models, thereby further consolidating the interplay between these two fields, which was thoroughly exploited by Jiao, Venkat, Han, and Weissman [3] in estimating various functionals of discrete distributions. We establish new results in approximation theory, and apply them to analyze the bias of the Dirichlet prior smoothed plug-in entropy estimator. This interplay between bias analysis and approximation theory is of relevance and consequence far beyond the specific problem setting in this paper.
AB - The Dirichlet prior is widely used in estimating discrete distributions and functionals of discrete distributions. In terms of Shannon entropy estimation, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general they do not improve over the maximum likelihood estimator, which plugs-in the empirical distribution into the entropy functional. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation, as recently characterized by Jiao, Venkat, Han, and Weissman [1], and Wu and Yang [2]. The performance of the minimax rate-optimal estimator with n samples is essentially at least as good as that of the Dirichlet smoothed entropy estimators with n ln n samples. We harness the theory of approximation using positive linear operators for analyzing the bias of plug-in estimators for general functionals under arbitrary statistical models, thereby further consolidating the interplay between these two fields, which was thoroughly exploited by Jiao, Venkat, Han, and Weissman [3] in estimating various functionals of discrete distributions. We establish new results in approximation theory, and apply them to analyze the bias of the Dirichlet prior smoothed plug-in entropy estimator. This interplay between bias analysis and approximation theory is of relevance and consequence far beyond the specific problem setting in this paper.
UR - http://www.scopus.com/inward/record.url?scp=84969796358&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84969796358&partnerID=8YFLogxK
U2 - 10.1109/ISIT.2015.7282679
DO - 10.1109/ISIT.2015.7282679
M3 - Conference contribution
AN - SCOPUS:84969796358
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 1367
EP - 1371
BT - Proceedings - 2015 IEEE International Symposium on Information Theory, ISIT 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE International Symposium on Information Theory, ISIT 2015
Y2 - 14 June 2015 through 19 June 2015
ER -