TY - JOUR

T1 - A dynamical central limit theorem for shallow neural networks

AU - Chen, Zhengdao

AU - Rotskoff, Grant M.

AU - Bruna, Joan

AU - Vanden-Eijnden, Eric

N1 - Funding Information:
This work benefited from discussions with Lenaic Chizat and Carles Domingo-Enrich. Z.C. acknowledges support from the Henry MacCraken Fellowship. G.M.R. acknowledges support from the James S. McDonnell Foundation. J.B. acknowledges support from the Alfred P. Sloan Foundation, NSF RI-1816753, NSF CAREER CIF 1845360, and the Institute for Advanced Study. E. V.-E. acknowledges support by National Science Foundation (NSF) Materials Research Science and Engineering Center Program Grant No. DMR-1420073, and by NSF Grant No. DMS-1522767.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.

PY - 2020

Y1 - 2020

N2 - Recent theoretical works have characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic mean-field limit when the width tends towards infinity. At initialization, the random sampling of the parameters leads to deviations from the mean-field limit dictated by the classical Central Limit Theorem (CLT). However, since gradient descent induces correlations among the parameters, it is of interest to analyze how these fluctuations evolve. In this work, we derive a dynamical CLT to prove that the asymptotic fluctuations around the mean limit remain bounded in mean square throughout training. The upper bound is given by a Monte-Carlo resampling error, with a variance that depends on the 2-norm of the underlying measure, which also controls the generalization error. This motivates the use of this 2-norm as a regularization term during training. Furthermore, if the mean-field dynamics converges to a measure that interpolates the training data, we prove that the asymptotic deviation eventually vanishes in the CLT scaling. We also complement these results with numerical experiments.

AB - Recent theoretical works have characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic mean-field limit when the width tends towards infinity. At initialization, the random sampling of the parameters leads to deviations from the mean-field limit dictated by the classical Central Limit Theorem (CLT). However, since gradient descent induces correlations among the parameters, it is of interest to analyze how these fluctuations evolve. In this work, we derive a dynamical CLT to prove that the asymptotic fluctuations around the mean limit remain bounded in mean square throughout training. The upper bound is given by a Monte-Carlo resampling error, with a variance that depends on the 2-norm of the underlying measure, which also controls the generalization error. This motivates the use of this 2-norm as a regularization term during training. Furthermore, if the mean-field dynamics converges to a measure that interpolates the training data, we prove that the asymptotic deviation eventually vanishes in the CLT scaling. We also complement these results with numerical experiments.

UR - http://www.scopus.com/inward/record.url?scp=85105885570&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85105885570&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85105885570

VL - 2020-December

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020

Y2 - 6 December 2020 through 12 December 2020

ER -