TY - JOUR
T1 - Efficient training of energy-based models using Jarzynski equality
AU - Carbone, Davide
AU - Hua, Mengjian
AU - Coste, Simon
AU - Vanden-Eijnden, Eric
N1 - Publisher Copyright:
© 2024 The Author(s). Published on behalf of SISSA Medialab srl by IOP Publishing Ltd.
PY - 2024/10/31
Y1 - 2024/10/31
N2 - Energy-based models (EBMs) are generative models inspired by statistical physics with a wide range of applications in unsupervised learning. Their performance is well measured by the cross-entropy (CE) of the model distribution relative to the data distribution. Using the CE as the objective for training is, however, challenging because the computation of its gradient with respect to the model parameters requires sampling of the model distribution. Here, we show how the results for nonequilibrium thermodynamics based on the Jarzynski equality together with tools from sequential Monte Carlo sampling can be used to perform this computation efficiently and avoid the uncontrolled approximations made using the standard contrastive divergence algorithm. Specifically, we introduce a modification of the unadjusted Langevin algorithm (ULA), in which each walker acquires a weight that enables the estimation of the gradient of the CE at any step during gradient descent, thereby bypassing sampling biases induced by slow mixing of the ULA. We illustrate these results with numerical experiments on Gaussian mixture distributions as well as the MNIST and CIFAR-10 datasets. We show that the proposed approach outperforms methods based on the contrastive divergence algorithm in all the considered situations.
AB - Energy-based models (EBMs) are generative models inspired by statistical physics with a wide range of applications in unsupervised learning. Their performance is well measured by the cross-entropy (CE) of the model distribution relative to the data distribution. Using the CE as the objective for training is, however, challenging because the computation of its gradient with respect to the model parameters requires sampling of the model distribution. Here, we show how the results for nonequilibrium thermodynamics based on the Jarzynski equality together with tools from sequential Monte Carlo sampling can be used to perform this computation efficiently and avoid the uncontrolled approximations made using the standard contrastive divergence algorithm. Specifically, we introduce a modification of the unadjusted Langevin algorithm (ULA), in which each walker acquires a weight that enables the estimation of the gradient of the CE at any step during gradient descent, thereby bypassing sampling biases induced by slow mixing of the ULA. We illustrate these results with numerical experiments on Gaussian mixture distributions as well as the MNIST and CIFAR-10 datasets. We show that the proposed approach outperforms methods based on the contrastive divergence algorithm in all the considered situations.
KW - deep learning
KW - diffusion
KW - machine learning
KW - stochastic thermodynamics
UR - http://www.scopus.com/inward/record.url?scp=85207782323&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85207782323&partnerID=8YFLogxK
U2 - 10.1088/1742-5468/ad65e0
DO - 10.1088/1742-5468/ad65e0
M3 - Article
AN - SCOPUS:85207782323
SN - 1742-5468
VL - 2024
JO - Journal of Statistical Mechanics: Theory and Experiment
JF - Journal of Statistical Mechanics: Theory and Experiment
IS - 10
M1 - 104019
ER -