TY - JOUR
T1 - A dynamical central limit theorem for shallow neural networks
AU - Chen, Zhengdao
AU - Rotskoff, Grant M.
AU - Bruna, Joan
AU - Vanden-Eijnden, Eric
N1 - Funding Information:
This work benefited from discussions with Lenaic Chizat and Carles Domingo-Enrich. Z.C. acknowledges support from the Henry MacCraken Fellowship. G.M.R. acknowledges support from the James S. McDonnell Foundation. J.B. acknowledges support from the Alfred P. Sloan Foundation, NSF RI-1816753, NSF CAREER CIF 1845360, and the Institute for Advanced Study. E. V.-E. acknowledges support by National Science Foundation (NSF) Materials Research Science and Engineering Center Program Grant No. DMR-1420073, and by NSF Grant No. DMS-1522767.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Recent theoretical works have characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic mean-field limit when the width tends towards infinity. At initialization, the random sampling of the parameters leads to deviations from the mean-field limit dictated by the classical Central Limit Theorem (CLT). However, since gradient descent induces correlations among the parameters, it is of interest to analyze how these fluctuations evolve. In this work, we derive a dynamical CLT to prove that the asymptotic fluctuations around the mean limit remain bounded in mean square throughout training. The upper bound is given by a Monte-Carlo resampling error, with a variance that depends on the 2-norm of the underlying measure, which also controls the generalization error. This motivates the use of this 2-norm as a regularization term during training. Furthermore, if the mean-field dynamics converges to a measure that interpolates the training data, we prove that the asymptotic deviation eventually vanishes in the CLT scaling. We also complement these results with numerical experiments.
AB - Recent theoretical works have characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic mean-field limit when the width tends towards infinity. At initialization, the random sampling of the parameters leads to deviations from the mean-field limit dictated by the classical Central Limit Theorem (CLT). However, since gradient descent induces correlations among the parameters, it is of interest to analyze how these fluctuations evolve. In this work, we derive a dynamical CLT to prove that the asymptotic fluctuations around the mean limit remain bounded in mean square throughout training. The upper bound is given by a Monte-Carlo resampling error, with a variance that depends on the 2-norm of the underlying measure, which also controls the generalization error. This motivates the use of this 2-norm as a regularization term during training. Furthermore, if the mean-field dynamics converges to a measure that interpolates the training data, we prove that the asymptotic deviation eventually vanishes in the CLT scaling. We also complement these results with numerical experiments.
UR - http://www.scopus.com/inward/record.url?scp=85105885570&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105885570&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85105885570
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Y2 - 6 December 2020 through 12 December 2020
ER -