The implicit bias of gradient descent on nonseparable data

Ziwei Ji, Matus Telgarsky

Research output: Contribution to journalConference articlepeer-review


Gradient descent, when applied to the task of logistic regression, outputs iterates which are biased to follow a unique ray defined by the data. The direction of this ray is the maximum margin predictor of a maximal linearly separable subset of the data; the gradient descent iterates converge to this ray in direction at the rate O(ln ln t/ln t). The ray does not pass through the origin in general, and its offset is the bounded global optimum of the risk over the remaining data; gradient descent recovers this offset at a rate O((ln t)2/√t).

Original languageEnglish (US)
Pages (from-to)1772-1798
Number of pages27
JournalProceedings of Machine Learning Research
StatePublished - 2019
Event32nd Conference on Learning Theory, COLT 2019 - Phoenix, United States
Duration: Jun 25 2019Jun 28 2019


  • gradient descent
  • Implicit bias
  • logistic regression
  • maximum margin

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'The implicit bias of gradient descent on nonseparable data'. Together they form a unique fingerprint.

Cite this