Evaluating Approximate Inference in Bayesian Deep Learning

Andrew Gordon Wilson, Sanae Lotfi, Sharad Vikram, Matthew D. Hoffman, Yarin Gal, Yingzhen Li, Melanie F. Pradier, Andrew Foong, Sebastian Farquhar, Pavel Izmailov

Research output: Contribution to journalConference articlepeer-review


Uncertainty representation is crucial to the safe and reliable deployment of deep learning. Bayesian methods provide a natural mechanism to represent epistemic uncertainty, leading to improved generalization and calibrated predictive distributions. Understanding the fidelity of approximate inference has extraordinary value beyond the standard approach of measuring generalization on a particular task: if approximate inference is working correctly, then we can expect more reliable and accurate deployment across any number of real-world settings. In this competition, we evaluate the fidelity of approximate Bayesian inference procedures in deep learning, using as a reference Hamiltonian Monte Carlo (HMC) samples obtained by parallelizing computations over hundreds of tensor processing unit (TPU) devices. We consider a variety of tasks, including image recognition, regression, covariate shift, and medical applications. All data are publicly available, and we release several baselines, including stochastic MCMC, variational methods, and deep ensembles. The competition resulted in hundreds of submissions across many teams. The winning entries all involved novel multi-modal posterior approximations, highlighting the relative importance of representing multiple modes, and suggesting that we should not consider deep ensembles a “non-Bayesian” alternative to standard unimodal approximations. In the future, the competition will provide a foundation for innovation and continued benchmarking of approximate Bayesian inference procedures in deep learning. The HMC samples will remain available through the competition website.

Original languageEnglish (US)
Pages (from-to)113-124
Number of pages12
JournalProceedings of Machine Learning Research
StatePublished - 2022
Event35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online
Duration: Dec 6 2021Dec 14 2021


  • Bayesian deep learning
  • Bayesian inference
  • Hamiltonian Monte Carlo
  • approximate inference

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'Evaluating Approximate Inference in Bayesian Deep Learning'. Together they form a unique fingerprint.

Cite this