TY - GEN
T1 - Geometry of the Loss Landscape in Overparameterized Neural Networks
T2 - 38th International Conference on Machine Learning, ICML 2021
AU - Şimşek, Berfin
AU - Ged, François
AU - Jacot, Arthur
AU - Spadaro, Francesco
AU - Hongler, Clément
AU - Gerstner, Wulfram
AU - Brea, Johanni
N1 - Funding Information:
The authors thank the authors of Lengyel et al. (2020) for a discussion about neural network invariances at the very beginning of this project. The authors thank Valentin Schmutz for a discussion, Bernd Illing and Levent Sagun for their detailed feedback on the manuscript. This work is partly supported by Swiss National Science Foundation (no. 200020 184615) and ERC SG CONSTAMIS. C. Hongler acknowledges support from the Blavatnik Family Foundation, the Latsis Foundation, and the NCCR Swissmap.
Funding Information:
The authors thank the authors of Lengyel et al. (2020) for a discussion about neural network invariances at the very beginning of this project. The authors thank Valentin Schmutz for a discussion, Bernd Illing and Levent Sa-gun for their detailed feedback on the manuscript. This work is partly supported by Swiss National Science Foundation (no. 200020 184615) and ERC SG CONSTAMIS. C. Hongler acknowledges support from the Blavatnik Family Foundation, the Latsis Foundation, and the NCCR Swissmap.
Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - We study how permutation symmetries in overparameterized multi-layer neural networks generate 'symmetry-induced' critical points. Assuming a network with L layers of minimal widths r1∗, ..., rL∗−1 reaches a zero-loss minimum at r1∗! · · · rL∗−1! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r∗ + h =: m we explicitly describe the manifold of global minima: it consists of T (r∗, m) affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number G(r, m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width r < r∗. Via a combinatorial analysis, we derive closed-form formulas for T and G and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small h) and vice versa in the vastly overparameterized regime (h ≫ r∗). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.
AB - We study how permutation symmetries in overparameterized multi-layer neural networks generate 'symmetry-induced' critical points. Assuming a network with L layers of minimal widths r1∗, ..., rL∗−1 reaches a zero-loss minimum at r1∗! · · · rL∗−1! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r∗ + h =: m we explicitly describe the manifold of global minima: it consists of T (r∗, m) affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number G(r, m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width r < r∗. Via a combinatorial analysis, we derive closed-form formulas for T and G and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small h) and vice versa in the vastly overparameterized regime (h ≫ r∗). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.
UR - http://www.scopus.com/inward/record.url?scp=85161268279&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85161268279&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85161268279
T3 - Proceedings of Machine Learning Research
SP - 9722
EP - 9732
BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021
PB - ML Research Press
Y2 - 18 July 2021 through 24 July 2021
ER -