Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Arthur Jacot

Research output: Contribution to journalConference articlepeer-review

Abstract

Previous work [Jac23] has shown that DNNs with large depth L and L2-regularization are biased towards learning low-dimensional representations of the inputs, which can be interpreted as minimizing a notion of rank R(0)(f) of the learned function f, conjectured to be the Bottleneck rank. We compute finite depth corrections to this result, revealing a measure R(1) of regularity which bounds the pseudo-determinant of the Jacobian |Jf(x)|+ and is subadditive under composition and addition. This formalizes a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps, allowing the network to learn the 'right' inner dimension. Finally, we prove the conjectured bottleneck structure in the learned features as L → ∞: for large depths, almost all hidden representations are approximately R(0)(f)-dimensional, and almost all weight matrices W have R(0)(f) singular values close to 1 while the others are O(L-1/2). Interestingly, the use of large learning rates is required to guarantee an order O(L) NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume36
StatePublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: Dec 10 2023Dec 16 2023

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff'. Together they form a unique fingerprint.

Cite this