TY - JOUR
T1 - Variational inference via χ upper bound minimization
AU - Dieng, Adji B.
AU - Tran, Dustin
AU - Ranganath, Rajesh
AU - Paisley, John
AU - Blei, David M.
N1 - Funding Information:
We thank Alp Kucukelbir, Francisco J. R. Ruiz, Christian A. Naesseth, Scott W. Linderman, Maja Rudolph, and Jaan Altosaar for their insightful comments. This work is supported by NSF IIS-1247664, ONR N00014-11-1-0651, DARPA PPAML FA8750-14-2-0009, DARPA SIMPLEX N66001-15-C-4032, the Alfred P. Sloan Foundation, and the John Simon Guggenheim Foundation.
Publisher Copyright:
© 2017 Neural information processing systems foundation. All rights reserved.
PY - 2017
Y1 - 2017
N2 - Variational inference (VI) is widely used as an efficient alternative to Markov chain Monte Carlo. It posits a family of approximating distributions q and finds the closest member to the exact posterior p. Closeness is usually measured via a divergence D(q\ \p) from q to p. While successful, this approach also has problems. Notably, it typically leads to underestimation of the posterior variance. In this paper we propose CHIVI, a black-box variational inference algorithm that minimizes Dx(p\\q), the χ-divergence fromp to q. CHIVI minimizes an upper bound of the model evidence, which we term the χ upper bound (CUBO). Minimizing the CUBO leads to improved posterior uncertainty, and it can also be used with the classical VI lower bound (ELBO) to provide a sandwich estimate of the model evidence. We study CHIVI on three models: probit regression, Gaussian process classification, and a Cox process model of basketball plays. When compared to expectation propagation and classical VI, CHIVI produces better error rates and more accurate estimates of posterior variance.
AB - Variational inference (VI) is widely used as an efficient alternative to Markov chain Monte Carlo. It posits a family of approximating distributions q and finds the closest member to the exact posterior p. Closeness is usually measured via a divergence D(q\ \p) from q to p. While successful, this approach also has problems. Notably, it typically leads to underestimation of the posterior variance. In this paper we propose CHIVI, a black-box variational inference algorithm that minimizes Dx(p\\q), the χ-divergence fromp to q. CHIVI minimizes an upper bound of the model evidence, which we term the χ upper bound (CUBO). Minimizing the CUBO leads to improved posterior uncertainty, and it can also be used with the classical VI lower bound (ELBO) to provide a sandwich estimate of the model evidence. We study CHIVI on three models: probit regression, Gaussian process classification, and a Cox process model of basketball plays. When compared to expectation propagation and classical VI, CHIVI produces better error rates and more accurate estimates of posterior variance.
UR - http://www.scopus.com/inward/record.url?scp=85047018952&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047018952&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85047018952
SN - 1049-5258
VL - 2017-December
SP - 2733
EP - 2742
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 31st Annual Conference on Neural Information Processing Systems, NIPS 2017
Y2 - 4 December 2017 through 9 December 2017
ER -