TY - JOUR
T1 - Comparing dynamics
T2 - deep neural networks versus glassy systems
AU - Baity-Jesi, Marco
AU - Sagun, Levent
AU - Geiger, Mario
AU - Spigler, Stefano
AU - Ben Arous, Gérard
AU - Cammarota, Chiara
AU - Lecun, Yann
AU - Wyart, Matthieu
AU - Biroli, Giulio
N1 - Publisher Copyright:
© 2019 IOP Publishing Ltd and SISSA Medialab srl.
PY - 2019/12/20
Y1 - 2019/12/20
N2 - We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
AB - We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85079862500&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079862500&partnerID=8YFLogxK
U2 - 10.1088/1742-5468/ab3281
DO - 10.1088/1742-5468/ab3281
M3 - Article
AN - SCOPUS:85079862500
SN - 1742-5468
VL - 2019
JO - Journal of Statistical Mechanics: Theory and Experiment
JF - Journal of Statistical Mechanics: Theory and Experiment
IS - 12
M1 - 124013
ER -