TY - GEN
T1 - Dangers of Bayesian Model Averaging under Covariate Shift
AU - Izmailov, Pavel
AU - Nicholson, Patrick
AU - Lotfi, Sanae
AU - Wilson, Andrew Gordon
N1 - Funding Information:
We thank Martin Arjovsky, Behnam Neyshabur, Vaishnavh Nagarajan, Marc Finzi, Polina Kirichenko, Greg Benton and Nate Gruver for helpful discussions. This research is supported with Cloud TPUs from Google’s TPU Research Cloud (TRC), and by an Amazon Research Award, NSF I-DISRE 193471, NIH R01DA048764-01A1, NSF IIS-1910266, and NSF 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science.
Funding Information:
We thank Martin Arjovsky, Behnam Neyshabur, Vaishnavh Nagarajan, Marc Finzi, Polina Kirichenko, Greg Benton and Nate Gruver for helpful discussions. This research is supported with Cloud TPUs from Google's TPU Research Cloud (TRC), and by an Amazon Research Award, NSF I-DISRE 193471, NIH R01DA048764-01A1, NSF IIS-1910266, and NSF 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science.
Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
AB - Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
UR - http://www.scopus.com/inward/record.url?scp=85126505212&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126505212&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85126505212
T3 - Advances in Neural Information Processing Systems
SP - 3309
EP - 3322
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -