Abstract
Cross-validation (CV) methods are popular for selecting the tuning parameter in high-dimensional variable selection problems. We show that a misalignment of the CV is one possible reason for its over-selection behavior. To fix this issue, we propose using a version of leave-nv-out CV (CV(nv)) to select the optimal model from a restricted candidate model set for high-dimensional generalized linear models. By using the same candidate model sequence and a proper order for the construction sample size nc in each CV split, CV(nv) avoids potential problems when developing theoretical properties. CV(nv) is shown to exhibit the restricted model-selection consistency property under mild conditions. Extensive simulations and a real-data analysis support the theoretical results and demonstrate the performance of CV(nv) in terms of both model selection and prediction.
Original language | English (US) |
---|---|
Pages (from-to) | 1607-1630 |
Number of pages | 24 |
Journal | Statistica Sinica |
Volume | 29 |
Issue number | 3 |
DOIs | |
State | Published - 2019 |
Keywords
- Generalized linear models
- Leave-nv-out cross-validation
- Restricted maximum likelihood estimators
- Restricted model-selection consistency
- Variable selection
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty