TY - GEN
T1 - Comparing Dynamics
T2 - 35th International Conference on Machine Learning, ICML 2018
AU - Baity-Jest, Marco
AU - Sagun, Levcnt
AU - Mario, Geiger
AU - Spiglery, Stefano
AU - Arous, Gerard Ben
AU - Cammarota, Chiara
AU - Lecun, Yann
AU - Vvyart, Matthieu
AU - Biroli, Giu Jio
N1 - Publisher Copyright:
© Copyright 2018 by the Authors. All rights reserved.
PY - 2018
Y1 - 2018
N2 - We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large limes, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes arc different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
AB - We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large limes, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes arc different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
UR - http://www.scopus.com/inward/record.url?scp=85057239948&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057239948&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057239948
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 526
EP - 535
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Krause, Andreas
A2 - Dy, Jennifer
PB - International Machine Learning Society (IMLS)
Y2 - 10 July 2018 through 15 July 2018
ER -