Abstract
Previous work [Jac23] has shown that DNNs with large depth L and L2-regularization are biased towards learning low-dimensional representations of the inputs, which can be interpreted as minimizing a notion of rank R(0)(f) of the learned function f, conjectured to be the Bottleneck rank. We compute finite depth corrections to this result, revealing a measure R(1) of regularity which bounds the pseudo-determinant of the Jacobian |Jf(x)|+ and is subadditive under composition and addition. This formalizes a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps, allowing the network to learn the 'right' inner dimension. Finally, we prove the conjectured bottleneck structure in the learned features as L → ∞: for large depths, almost all hidden representations are approximately R(0)(f)-dimensional, and almost all weight matrices Wℓ have R(0)(f) singular values close to 1 while the others are O(L-1/2). Interestingly, the use of large learning rates is required to guarantee an order O(L) NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.
Original language | English (US) |
---|---|
Journal | Advances in Neural Information Processing Systems |
Volume | 36 |
State | Published - 2023 |
Event | 37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States Duration: Dec 10 2023 → Dec 16 2023 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing