TY - JOUR
T1 - Logistic Regression
T2 - 31st Annual Conference on Learning Theory, COLT 2018
AU - Foster, Dylan J.
AU - Kale, Satyen
AU - Luo, Haipeng
AU - Mohri, Mehryar
AU - Sridharan, Karthik
N1 - Funding Information:
DF thanks Matus Telgarsky for sparking an interest in logistic regression through a series of talks at the Simons Institute. KS acknowledges support from the NSF under grants CDS&E-MSS 1521544 and NSF CAREER Award 1750575. MM acknowledges support under NSF grants CCF-1535987 and IIS-1618662. DF is supported in part by the NDSEG PhD fellowship.
Publisher Copyright:
© 2018 D.J. Foster, S. Kale, H. Luo, M. Mohri & K. Sridharan.
PY - 2018
Y1 - 2018
N2 - Learning linear predictors with the logistic loss—both in stochastic and online settings—is a fundamental task in machine learning and statistics, with direct connections to classification and boosting. Existing “fast rates” for this setting exhibit exponential dependence on the predictor norm, and Hazan et al. (2014) showed that this is unfortunately unimprovable. Starting with the simple observation that the logistic loss is 1-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm. This provides a positive resolution to a variant of the COLT 2012 open problem of McMahan and Streeter (2012) when improper learning is allowed. This improvement is obtained both in the online setting and, with some extra work, in the batch statistical setting with high probability. We also show that the improved dependence on predictor norm is near-optimal. Leveraging this improved dependency on the predictor norm yields the following applications: (a) we give algorithms for online bandit multiclass learning with the logistic loss with an Õ(√n) relative mistake bound across essentially all parameter ranges, thus providing a solution to the COLT 2009 open problem of Abernethy and Rakhlin (2009), and (b) we give an adaptive algorithm for online multiclass boosting with optimal sample complexity, thus partially resolving an open problem of Beygelzimer et al. (2015) and Jung et al. (2017). Finally, we give information-theoretic bounds on the optimal rates for improper logistic regression with general function classes, thereby characterizing the extent to which our improvement for linear classes extends to other parametric and even nonparametric settings.
AB - Learning linear predictors with the logistic loss—both in stochastic and online settings—is a fundamental task in machine learning and statistics, with direct connections to classification and boosting. Existing “fast rates” for this setting exhibit exponential dependence on the predictor norm, and Hazan et al. (2014) showed that this is unfortunately unimprovable. Starting with the simple observation that the logistic loss is 1-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm. This provides a positive resolution to a variant of the COLT 2012 open problem of McMahan and Streeter (2012) when improper learning is allowed. This improvement is obtained both in the online setting and, with some extra work, in the batch statistical setting with high probability. We also show that the improved dependence on predictor norm is near-optimal. Leveraging this improved dependency on the predictor norm yields the following applications: (a) we give algorithms for online bandit multiclass learning with the logistic loss with an Õ(√n) relative mistake bound across essentially all parameter ranges, thus providing a solution to the COLT 2009 open problem of Abernethy and Rakhlin (2009), and (b) we give an adaptive algorithm for online multiclass boosting with optimal sample complexity, thus partially resolving an open problem of Beygelzimer et al. (2015) and Jung et al. (2017). Finally, we give information-theoretic bounds on the optimal rates for improper logistic regression with general function classes, thereby characterizing the extent to which our improvement for linear classes extends to other parametric and even nonparametric settings.
UR - http://www.scopus.com/inward/record.url?scp=85162144976&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85162144976&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85162144976
SN - 2640-3498
VL - 75
SP - 167
EP - 208
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 6 July 2018 through 9 July 2018
ER -