Abstract
We consider the problem of discrete distribution estimation under ℓ1 loss. We provide tight upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the support size S may grow with the number of observations n. We show that among distributions with bounded entropy H, the asymptotic maximum risk for the empirical distribution is 2H/ln n, while the asymptotic minimax risk is H/ln n. Moreover, we show that a hard-thresholding estimator oblivious to the unknown upper bound H, is essentially minimax. However, if we constrain the estimates to lie in the simplex of probability distributions, then the asymptotic minimax risk is again 2H/ln n. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance (ℓ1 divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation.
Original language | English (US) |
---|---|
Article number | 7268897 |
Pages (from-to) | 6343-6354 |
Number of pages | 12 |
Journal | IEEE Transactions on Information Theory |
Volume | 61 |
Issue number | 11 |
DOIs | |
State | Published - Nov 1 2015 |
Keywords
- Distribution estimation
- entropy estimation
- hard-thresholding
- high dimensional statistics
- minimax risk
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences