Should We Learn Most Likely Functions or Parameters?

Shikai Qiu, Tim G.J. Rudner, Sanyam Kapoor, Andrew Gordon Wilson

Research output: Contribution to journalConference articlepeer-review

Abstract

Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation.However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions.Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior.In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior.As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data.We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation.Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume36
StatePublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: Dec 10 2023Dec 16 2023

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Should We Learn Most Likely Functions or Parameters?'. Together they form a unique fingerprint.

Cite this