TY - JOUR
T1 - Stochastic linear optimization never overfits with quadratically-bounded losses on general data
AU - Telgarsky, Matus
N1 - Funding Information:
The author thanks Daniel Hsu, Ziwei Ji, Francesco Orabona, Jeroen Rombouts, and Danny Son for valuable discussions, as well as the COLT 2022 reviewers for many helpful comments, specifically regarding readability. The author thanks the NSF for support under grant IIS-1750051.
Publisher Copyright:
© 2022 M. Telgarsky.
PY - 2022
Y1 - 2022
N2 - This work provides test error bounds for iterative fixed point methods on linear predictors — specifically, stochastic and batch mirror descent (MD), and stochastic temporal difference learning (TD) — with two core contributions: (a) a single proof technique which gives high probability guarantees despite the absence of projections, regularization, or any equivalents, even when optima have large or infinite norm, for quadratically-bounded losses (e.g., providing unified treatment of squared and logistic losses); (b) locally-adapted rates which depend not on global problem structure (such as conditions numbers and maximum margins), but rather on properties of low norm predictors which may suffer some small excess test error. The proof technique is an elementary and versatile coupling argument, and is demonstrated here in the following settings: stochastic MD under realizability; stochastic MD for general Markov data; batch MD for general IID data; stochastic MD on heavy-tailed data (still without projections); stochastic TD on approximately mixing Markov chains (all prior stochastic TD bounds are in expectation).
AB - This work provides test error bounds for iterative fixed point methods on linear predictors — specifically, stochastic and batch mirror descent (MD), and stochastic temporal difference learning (TD) — with two core contributions: (a) a single proof technique which gives high probability guarantees despite the absence of projections, regularization, or any equivalents, even when optima have large or infinite norm, for quadratically-bounded losses (e.g., providing unified treatment of squared and logistic losses); (b) locally-adapted rates which depend not on global problem structure (such as conditions numbers and maximum margins), but rather on properties of low norm predictors which may suffer some small excess test error. The proof technique is an elementary and versatile coupling argument, and is demonstrated here in the following settings: stochastic MD under realizability; stochastic MD for general Markov data; batch MD for general IID data; stochastic MD on heavy-tailed data (still without projections); stochastic TD on approximately mixing Markov chains (all prior stochastic TD bounds are in expectation).
UR - http://www.scopus.com/inward/record.url?scp=85164741463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85164741463&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85164741463
SN - 2640-3498
VL - 178
SP - 5453
EP - 5488
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 35th Conference on Learning Theory, COLT 2022
Y2 - 2 July 2022 through 5 July 2022
ER -