TY - GEN

T1 - Adaptive estimation of Shannon entropy

AU - Han, Yanjun

AU - Jiao, Jiantao

AU - Weissman, Tsachy

N1 - Publisher Copyright:
© 2015 IEEE.

PY - 2015/9/28

Y1 - 2015/9/28

N2 - We consider estimating the Shannon entropy of a discrete distribution P from n i.i.d. samples. Recently, Jiao, Venkat, Han, and Weissman (JVHW), and Wu and Yang constructed approximation theoretic estimators that achieve the minimax L2 rates in estimating entropy. Their estimators are consistent given n ≫ S over lnS samples, where S is the support size, and it is the best possible sample complexity. In contrast, the Maximum Likelihood Estimator (MLE), which is the empirical entropy, requires n ≫ S samples. In the present paper we significantly refine the minimax results of existing work. To alleviate the pessimism of minimaxity, we adopt the adaptive estimation framework, and show that the JVHW estimator is an adaptive estimator, i.e., it achieves the minimax rates simultaneously over a nested sequence of subsets of distributions P, without knowing the support size S or which subset P lies in. We also characterize the maximum risk of the MLE over this nested sequence, and show, for every subset in the sequence, that the performance of the minimax rate-optimal estimator with n samples is essentially that of the MLE with n ln n samples, thereby further substantiating the generality of 'effective sample size enlargement' phenomenon discovered by Jiao, Venkat, Han, and Weissman. We provide a 'pointwise' explanation of the sample size enlargement phenomenon, which states that for sufficiently small probabilities, the bias function of the JVHW estimator with n samples is nearly that of the MLE with n ln n samples.

AB - We consider estimating the Shannon entropy of a discrete distribution P from n i.i.d. samples. Recently, Jiao, Venkat, Han, and Weissman (JVHW), and Wu and Yang constructed approximation theoretic estimators that achieve the minimax L2 rates in estimating entropy. Their estimators are consistent given n ≫ S over lnS samples, where S is the support size, and it is the best possible sample complexity. In contrast, the Maximum Likelihood Estimator (MLE), which is the empirical entropy, requires n ≫ S samples. In the present paper we significantly refine the minimax results of existing work. To alleviate the pessimism of minimaxity, we adopt the adaptive estimation framework, and show that the JVHW estimator is an adaptive estimator, i.e., it achieves the minimax rates simultaneously over a nested sequence of subsets of distributions P, without knowing the support size S or which subset P lies in. We also characterize the maximum risk of the MLE over this nested sequence, and show, for every subset in the sequence, that the performance of the minimax rate-optimal estimator with n samples is essentially that of the MLE with n ln n samples, thereby further substantiating the generality of 'effective sample size enlargement' phenomenon discovered by Jiao, Venkat, Han, and Weissman. We provide a 'pointwise' explanation of the sample size enlargement phenomenon, which states that for sufficiently small probabilities, the bias function of the JVHW estimator with n samples is nearly that of the MLE with n ln n samples.

UR - http://www.scopus.com/inward/record.url?scp=84969851032&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84969851032&partnerID=8YFLogxK

U2 - 10.1109/ISIT.2015.7282680

DO - 10.1109/ISIT.2015.7282680

M3 - Conference contribution

AN - SCOPUS:84969851032

T3 - IEEE International Symposium on Information Theory - Proceedings

SP - 1372

EP - 1376

BT - Proceedings - 2015 IEEE International Symposium on Information Theory, ISIT 2015

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - IEEE International Symposium on Information Theory, ISIT 2015

Y2 - 14 June 2015 through 19 June 2015

ER -