TY - JOUR
T1 - Contra
T2 - 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021
AU - Sudarshan, Mukund
AU - Puli, Aahlad
AU - Subramanian, Lakshminarayanan
AU - Sankararaman, Sriram
AU - Ranganath, Rajesh
N1 - Funding Information:
The authors would like to thank the reviewers for their thoughtful feedback. Mukund Sudarshan was partially supported by a PhRMA Foundation Predoctoral Fellowship. Mukund Sudarshan, Aahlad Puli, and Rajesh Ranganath were partly supported by NIH/NHLBI Award R01HL148248, and by NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science. Sriram Sankararaman was partially supported by NSF Award 1705121 III: Medium: Scalable Machine Learning for Genome-Wide Association Analyses.
Funding Information:
The authors would like to thank the reviewers for their thoughtful feedback. Mukund Sudarshan was partially supported by a PhRMA Foundation Predoctoral Fellowship. Mukund Sudarshan, Aahlad Puli, and Ra-jesh Ranganath were partly supported by NIH/NHLBI Award R01HL148248, and by NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science. Sriram Sankararaman was partially supported by NSF Award 1705121 III: Medium: Scalable Machine Learning for Genome-Wide Association Analyses.
Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - The holdout randomization test (hrt) discovers a set of covariates most predictive of a response. Given the covariate distribution, hrts can explicitly control the false discovery rate (fdr). However, if this distribution is unknown and must be estimated from data, hrts can inflate the fdr. To alleviate the inflation of fdr, we propose the contrarian randomization test (contra), which is designed explicitly for scenarios where the covariate distribution must be estimated from data and may even be misspecified. Our key insight is to use an equal mixture of two “contrarian” probabilistic models in determining the importance of a covariate. One model is fit with the real data, while the other is fit using the same data, but with the covariate being tested replaced with samples from an estimate of the covariate distribution. Contra is flexible enough to achieve a power of 1 asymptotically, can reduce the fdr compared to state-of-the-art cvs methods when the covariate distribution is misspecified, and is computationally efficient in high dimensions and large sample sizes. We further demonstrate the effectiveness of contra on numerous synthetic benchmarks, and highlight its capabilities on a genetic dataset.
AB - The holdout randomization test (hrt) discovers a set of covariates most predictive of a response. Given the covariate distribution, hrts can explicitly control the false discovery rate (fdr). However, if this distribution is unknown and must be estimated from data, hrts can inflate the fdr. To alleviate the inflation of fdr, we propose the contrarian randomization test (contra), which is designed explicitly for scenarios where the covariate distribution must be estimated from data and may even be misspecified. Our key insight is to use an equal mixture of two “contrarian” probabilistic models in determining the importance of a covariate. One model is fit with the real data, while the other is fit using the same data, but with the covariate being tested replaced with samples from an estimate of the covariate distribution. Contra is flexible enough to achieve a power of 1 asymptotically, can reduce the fdr compared to state-of-the-art cvs methods when the covariate distribution is misspecified, and is computationally efficient in high dimensions and large sample sizes. We further demonstrate the effectiveness of contra on numerous synthetic benchmarks, and highlight its capabilities on a genetic dataset.
UR - http://www.scopus.com/inward/record.url?scp=85127242726&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127242726&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85127242726
SN - 2640-3498
VL - 130
SP - 1900
EP - 1908
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 13 April 2021 through 15 April 2021
ER -