IMPLICIT BIAS OF SGD IN L2-REGULARIZED LINEAR DNNS: ONE-WAY JUMPS FROM HIGH TO LOW RANK

Zihan Wang, Arthur Jacot

Research output: Contribution to conferencePaperpeer-review

Abstract

The L2-regularized loss of Deep Linear Networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks. In tasks such as matrix completion, the goal is to converge to the local minimum with the smallest rank that still fits the training data. While rank-underestimating minima can be avoided since they do not fit the data, GD might get stuck at rank-overestimating minima. We show that with SGD, there is always a probability to jump from a higher rank critical point to a lower rank one, but the probability of jumping back is zero. More precisely, we define a sequence of sets B1 ⊂ B2 ⊂ ··· ⊂ BR so that Br contains all critical points of rank r or less (and not more) that are absorbing for small enough ridge parameters λ and learning rates η: SGD has prob. 0 of leaving Br, and from any starting point there is a non-zero prob.

Original languageEnglish (US)
StatePublished - 2024
Event12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria
Duration: May 7 2024May 11 2024

Conference

Conference12th International Conference on Learning Representations, ICLR 2024
Country/TerritoryAustria
CityHybrid, Vienna
Period5/7/245/11/24

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'IMPLICIT BIAS OF SGD IN L2-REGULARIZED LINEAR DNNS: ONE-WAY JUMPS FROM HIGH TO LOW RANK'. Together they form a unique fingerprint.

Cite this