Large-scale model selection in misspecified generalized linear models

Emre Demirkaya, Yang Feng, Pallavi Basu, Jinchi Lv

Research output: Contribution to journalArticlepeer-review

Abstract

Model selection is crucial both to high-dimensional learning and to inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work implicitly assumes that the models are correctly specified or have fixed dimensionality, yet both model misspecification and high dimensionality are prevalent in practice. In this paper, we exploit the framework of model selection principles under the misspecified generalized linear models presented in Lv Liu (2014), and investigate the asymptotic expansion of the posterior model probability in the setting of high-dimensional misspecified models. With a natural choice of prior probabilities that encourages interpretability and incorporates the Kullback-Leibler divergence, we suggest using the high-dimensional generalized Bayesian information criterion with prior probability for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of the new information criterion in ultrahigh dimensions under some mild regularity conditions. Our numerical studies demonstrate that the proposed method enjoys improved model selection consistency over its main competitors.

Original languageEnglish (US)
Pages (from-to)123-136
Number of pages14
JournalBiometrika
Volume109
Issue number1
DOIs
StatePublished - Mar 1 2022

Keywords

  • Bayesian principle
  • Big data
  • High dimensionality
  • Kullback-Leibler divergence
  • Model misspecification
  • Model selection
  • Robustness

ASJC Scopus subject areas

  • Statistics and Probability
  • General Mathematics
  • Agricultural and Biological Sciences (miscellaneous)
  • General Agricultural and Biological Sciences
  • Statistics, Probability and Uncertainty
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Large-scale model selection in misspecified generalized linear models'. Together they form a unique fingerprint.

Cite this