TY - GEN

T1 - Geometry of the Loss Landscape in Overparameterized Neural Networks

T2 - 38th International Conference on Machine Learning, ICML 2021

AU - Şimşek, Berfin

AU - Ged, François

AU - Jacot, Arthur

AU - Spadaro, Francesco

AU - Hongler, Clément

AU - Gerstner, Wulfram

AU - Brea, Johanni

N1 - Funding Information:
The authors thank the authors of Lengyel et al. (2020) for a discussion about neural network invariances at the very beginning of this project. The authors thank Valentin Schmutz for a discussion, Bernd Illing and Levent Sagun for their detailed feedback on the manuscript. This work is partly supported by Swiss National Science Foundation (no. 200020 184615) and ERC SG CONSTAMIS. C. Hongler acknowledges support from the Blavatnik Family Foundation, the Latsis Foundation, and the NCCR Swissmap.
Funding Information:
The authors thank the authors of Lengyel et al. (2020) for a discussion about neural network invariances at the very beginning of this project. The authors thank Valentin Schmutz for a discussion, Bernd Illing and Levent Sa-gun for their detailed feedback on the manuscript. This work is partly supported by Swiss National Science Foundation (no. 200020 184615) and ERC SG CONSTAMIS. C. Hongler acknowledges support from the Blavatnik Family Foundation, the Latsis Foundation, and the NCCR Swissmap.
Publisher Copyright:
Copyright © 2021 by the author(s)

PY - 2021

Y1 - 2021

N2 - We study how permutation symmetries in overparameterized multi-layer neural networks generate 'symmetry-induced' critical points. Assuming a network with L layers of minimal widths r1∗, ..., rL∗−1 reaches a zero-loss minimum at r1∗! · · · rL∗−1! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r∗ + h =: m we explicitly describe the manifold of global minima: it consists of T (r∗, m) affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number G(r, m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width r < r∗. Via a combinatorial analysis, we derive closed-form formulas for T and G and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small h) and vice versa in the vastly overparameterized regime (h ≫ r∗). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.

AB - We study how permutation symmetries in overparameterized multi-layer neural networks generate 'symmetry-induced' critical points. Assuming a network with L layers of minimal widths r1∗, ..., rL∗−1 reaches a zero-loss minimum at r1∗! · · · rL∗−1! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r∗ + h =: m we explicitly describe the manifold of global minima: it consists of T (r∗, m) affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number G(r, m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width r < r∗. Via a combinatorial analysis, we derive closed-form formulas for T and G and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small h) and vice versa in the vastly overparameterized regime (h ≫ r∗). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.

UR - http://www.scopus.com/inward/record.url?scp=85161268279&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85161268279&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85161268279

T3 - Proceedings of Machine Learning Research

SP - 9722

EP - 9732

BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021

PB - ML Research Press

Y2 - 18 July 2021 through 24 July 2021

ER -