Abstract
We consider the problem of estimating the L1 distance between two discrete probability measures P and Q from empirical data in a nonasymptotic and large alphabet setting. When Q is known and one obtains n samples from P, we show that for every Q, the minimax rate-optimal estimator with n samples achieves performance comparable to that of the maximum likelihood estimator with nlnn samples. When both P and Q are unknown, we construct minimax rate-optimal estimators, whose worst case performance is essentially that of the known Q case with Q being uniform, implying that Q being uniform is essentially the most difficult case. The effective sample size enlargement phenomenon, identified by Jiao et al., holds both in the known Q case for every Q and the Q unknown case. However, the construction of optimal estimators for |P-Q|1 requires new techniques and insights beyond the approximation-based method of functional estimation by Jiao et al.
Original language | English (US) |
---|---|
Article number | 8379458 |
Pages (from-to) | 6672-6706 |
Number of pages | 35 |
Journal | IEEE Transactions on Information Theory |
Volume | 64 |
Issue number | 10 |
DOIs | |
State | Published - Oct 2018 |
Keywords
- Divergence estimation
- functional estimation
- high-dimensional statistics
- multivariate approximation theory
- optimal classification error
- total variation distance
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences