TY - GEN
T1 - Near-Linear Sample Complexity for Lp Polynomial Regression
AU - Meyer, Raphael A.
AU - Musco, Cameron
AU - Musco, Christopher
AU - Woodruff, David P.
AU - Zhou, Samson
N1 - Publisher Copyright:
Copyright © 2023 by SIAM.
PY - 2023
Y1 - 2023
N2 - We study Lp polynomial regression. Given query access to a function f : [−1, 1] → R, the goal is to find a degree d polynomial qb such that, for a given parameter ε > 0, (Equation presented). Here (Equation presented). We show that querying f at points randomly drawn from the Chebyshev measure on [−1, 1] is a near-optimal strategy for polynomial regression in all Lp norms. In particular, to find (Equation presented), it suffices to sample (Equation presented) points from [−1, 1] with probabilities proportional to this measure. While the optimal sample complexity for polynomial regression was well understood for L2 and L∞, our result is the first that achieves sample complexity linear in d and error (1 + ε) for other values of p without any assumptions. Our result requires two main technical contributions. The first concerns p ≤ 2, for which we provide explicit bounds on the Lp Lewis weight function of the infinite linear operator underlying polynomial regression. Using tools from the orthogonal polynomial literature, we show that this function is bounded by the Chebyshev density. Our second key contribution is to take advantage of the structure of polynomials to reduce the p > 2 case to the p ≤ 2 case. By doing so, we obtain a better sample complexity than what is possible for general p-norm linear regression problems, for which (Equation presented) samples are required.
AB - We study Lp polynomial regression. Given query access to a function f : [−1, 1] → R, the goal is to find a degree d polynomial qb such that, for a given parameter ε > 0, (Equation presented). Here (Equation presented). We show that querying f at points randomly drawn from the Chebyshev measure on [−1, 1] is a near-optimal strategy for polynomial regression in all Lp norms. In particular, to find (Equation presented), it suffices to sample (Equation presented) points from [−1, 1] with probabilities proportional to this measure. While the optimal sample complexity for polynomial regression was well understood for L2 and L∞, our result is the first that achieves sample complexity linear in d and error (1 + ε) for other values of p without any assumptions. Our result requires two main technical contributions. The first concerns p ≤ 2, for which we provide explicit bounds on the Lp Lewis weight function of the infinite linear operator underlying polynomial regression. Using tools from the orthogonal polynomial literature, we show that this function is bounded by the Chebyshev density. Our second key contribution is to take advantage of the structure of polynomials to reduce the p > 2 case to the p ≤ 2 case. By doing so, we obtain a better sample complexity than what is possible for general p-norm linear regression problems, for which (Equation presented) samples are required.
UR - http://www.scopus.com/inward/record.url?scp=85141579223&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141579223&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85141579223
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 3959
EP - 4025
BT - 34th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2023
PB - Association for Computing Machinery
T2 - 34th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2023
Y2 - 22 January 2023 through 25 January 2023
ER -