TY - GEN
T1 - On the competitive analysis and high accuracy optimality of profile maximum likelihood
AU - Han, Yanjun
AU - Shiragur, Kirankumar
N1 - Publisher Copyright:
Copyright © 2021 by SIAM
PY - 2021
Y1 - 2021
N2 - A striking result of Acharya et al. [ADOS17] showed that to estimate symmetric properties of discrete distributions, plugging in the distribution that maximizes the likelihood of observed multiset of frequencies, also known as the profile maximum likelihood (PML) distribution, is competitive compared with any estimators regardless of the symmetric property. Specifically, given n observations from the discrete distribution, if some estimator incurs an error ε with probability at most δ, then plugging in the PML distribution incurs an error 2ε with probability at most δ · exp(3√n). In this paper, we strengthen the above result and show that using a careful chaining argument, the error probability can be reduced to δ1−c · exp(c0n1/3+c) for arbitrarily small constants c > 0 and some constant c0 > 0. The improved competitive analysis leads to the optimality of the PML plug-in approach for estimating various symmetric properties within higher accuracy ε ≫ n−1/3. In particular, we show that the PML distribution is an optimal estimator of the sorted distribution: it is ε-close in sorted l1 distance to the true distribution with support size k for any n = Ω(k/(ε2 log k)) and ε ≫ n−1/3, which are the information-theoretically optimal sample complexity and the largest error regime where the classical empirical distribution is sub-optimal, respectively. In order to strengthen the analysis of the PML, a key ingredient is to employ novel “continuity” properties of the PML distributions and construct a chain of suitable quantized PMLs, or “coverings”. We also construct a novel approximation-based estimator for the sorted distribution with a near-optimal concentration property without any sample splitting, where as a byproduct we obtain better trade-offs between the polynomial approximation error and the maximum magnitude of coefficients in the Poisson approximation of 1-Lipschitz functions.
AB - A striking result of Acharya et al. [ADOS17] showed that to estimate symmetric properties of discrete distributions, plugging in the distribution that maximizes the likelihood of observed multiset of frequencies, also known as the profile maximum likelihood (PML) distribution, is competitive compared with any estimators regardless of the symmetric property. Specifically, given n observations from the discrete distribution, if some estimator incurs an error ε with probability at most δ, then plugging in the PML distribution incurs an error 2ε with probability at most δ · exp(3√n). In this paper, we strengthen the above result and show that using a careful chaining argument, the error probability can be reduced to δ1−c · exp(c0n1/3+c) for arbitrarily small constants c > 0 and some constant c0 > 0. The improved competitive analysis leads to the optimality of the PML plug-in approach for estimating various symmetric properties within higher accuracy ε ≫ n−1/3. In particular, we show that the PML distribution is an optimal estimator of the sorted distribution: it is ε-close in sorted l1 distance to the true distribution with support size k for any n = Ω(k/(ε2 log k)) and ε ≫ n−1/3, which are the information-theoretically optimal sample complexity and the largest error regime where the classical empirical distribution is sub-optimal, respectively. In order to strengthen the analysis of the PML, a key ingredient is to employ novel “continuity” properties of the PML distributions and construct a chain of suitable quantized PMLs, or “coverings”. We also construct a novel approximation-based estimator for the sorted distribution with a near-optimal concentration property without any sample splitting, where as a byproduct we obtain better trade-offs between the polynomial approximation error and the maximum magnitude of coefficients in the Poisson approximation of 1-Lipschitz functions.
UR - http://www.scopus.com/inward/record.url?scp=85104617611&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104617611&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85104617611
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 1317
EP - 1336
BT - ACM-SIAM Symposium on Discrete Algorithms, SODA 2021
A2 - Marx, Daniel
PB - Association for Computing Machinery
T2 - 32nd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2021
Y2 - 10 January 2021 through 13 January 2021
ER -