TY - GEN
T1 - On Energy-Based Models with Overparametrized Shallow Neural Networks
AU - Domingo-Enrich, Carles
AU - Bietti, Alberto
AU - Vanden-Eijnden, Eric
AU - Bruna, Joan
N1 - Funding Information:
We thank Marylou Gabrié for useful discussions. CD acknowledges partial support by “la Caixa” Foundation (ID 100010434), under agreement LCF/BQ/AA18/11680094. EVE acknowledges partial support from the National Science Foundation (NSF) Materials Research Science and Engineering Center Program grant DMR-1420073, NSF DMS-1522767, and the Vannevar Bush Faculty Fellowship. JB acknowledges partial support from the Alfred P. Sloan Foundation, NSF RI-1816753, NSF CAREER CIF 1845360, NSF CHS-1901091 and Samsung Electronics.
Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - Energy-based models (EBMs) are a simple yet powerful framework for generative modeling. They are based on a trainable energy function which defines an associated Gibbs measure, and they can be trained and sampled from via well-established statistical tools, such as MCMC. Neural networks may be used as energy function approximators, providing both a rich class of expressive models as well as a flexible device to incorporate data structure. In this work we focus on shallow neural networks. Building from the incipient theory of overparametrized neural networks, we show that models trained in the so-called'active' regime provide a statistical advantage over their associated'lazy' or kernel regime, leading to improved adaptivity to hidden low-dimensional structure in the data distribution, as already observed in supervised learning. Our study covers both maximum likelihood and Stein Discrepancy estimators, and we validate our theoretical results with numerical experiments on synthetic data.
AB - Energy-based models (EBMs) are a simple yet powerful framework for generative modeling. They are based on a trainable energy function which defines an associated Gibbs measure, and they can be trained and sampled from via well-established statistical tools, such as MCMC. Neural networks may be used as energy function approximators, providing both a rich class of expressive models as well as a flexible device to incorporate data structure. In this work we focus on shallow neural networks. Building from the incipient theory of overparametrized neural networks, we show that models trained in the so-called'active' regime provide a statistical advantage over their associated'lazy' or kernel regime, leading to improved adaptivity to hidden low-dimensional structure in the data distribution, as already observed in supervised learning. Our study covers both maximum likelihood and Stein Discrepancy estimators, and we validate our theoretical results with numerical experiments on synthetic data.
UR - http://www.scopus.com/inward/record.url?scp=85161337717&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85161337717&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85161337717
T3 - Proceedings of Machine Learning Research
SP - 2771
EP - 2782
BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021
PB - ML Research Press
T2 - 38th International Conference on Machine Learning, ICML 2021
Y2 - 18 July 2021 through 24 July 2021
ER -