Variable importance in matched case-control studies in settings of high dimensional data

Raji Balasubramanian, E. Andres Houseman, Brent A. Coull, Michael H. Lev, Lee H. Schwamm, Rebecca A. Betensky

Research output: Contribution to journalArticlepeer-review

Abstract

Summary: We propose a method for assessing variable importance in matched case-control investigations and other highly stratified studies characterized by high dimensional data (p>>n). In simulated and real data sets, we show that the algorithm proposed performs better than a conventional univariate method (conditional logistic regression) and a popular multivariable algorithm (random forests) that does not take the matching into account. The methods are applicable to wide ranging, high impact clinical studies including metabolomic, proteomic studies and neuroimaging analyses, such as those assessing stroke and Alzheimer's disease. The methods proposed have been implemented in a freely available R library (http://cran.r-project.org/web/packages/RPCLR/index.html).

Original languageEnglish (US)
Pages (from-to)639-655
Number of pages17
JournalJournal of the Royal Statistical Society. Series C: Applied Statistics
Volume63
Issue number4
DOIs
StatePublished - Aug 2014

Keywords

  • Data mining
  • High dimensional data
  • Matched case-control studies

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Variable importance in matched case-control studies in settings of high dimensional data'. Together they form a unique fingerprint.

Cite this