Abstract
Gradient descent, when applied to the task of logistic regression, outputs iterates which are biased to follow a unique ray defined by the data. The direction of this ray is the maximum margin predictor of a maximal linearly separable subset of the data; the gradient descent iterates converge to this ray in direction at the rate O(ln ln t/ln t). The ray does not pass through the origin in general, and its offset is the bounded global optimum of the risk over the remaining data; gradient descent recovers this offset at a rate O((ln t)2/√t).
Original language | English (US) |
---|---|
Pages (from-to) | 1772-1798 |
Number of pages | 27 |
Journal | Proceedings of Machine Learning Research |
Volume | 99 |
State | Published - 2019 |
Event | 32nd Conference on Learning Theory, COLT 2019 - Phoenix, United States Duration: Jun 25 2019 → Jun 28 2019 |
Keywords
- gradient descent
- Implicit bias
- logistic regression
- maximum margin
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability