TY - GEN
T1 - Recursive Feature Elimination by Sensitivity Testing
AU - Escanilla, Nicholas Sean
AU - Hellerstein, Lisa
AU - Kleiman, Ross
AU - Kuang, Zhaobin
AU - Shull, James
AU - Page, David
N1 - Funding Information:
ACKNOWLEDGMENT The authors would like to thank the following grants: NLM 5T15LM007359, NIH BD2K U54 AI117924, NIH R01-CA077876, NIH R01-CA204320, and NIH P30-CA014520.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - There is great interest in methods to improve human insight into trained non-linear models. Leading approaches include producing a ranking of the most relevant features, a non-trivial task for non-linear models. We show theoretically and empirically the benefit of a novel version of recursive feature elimination (RFE) as often used with SVMs; the key idea is a simple twist on the kinds of sensitivity testing employed in computational learning theory with membership queries (e.g., [1]). With membership queries, one can check whether changing the value of a feature in an example changes the label. In the real-world, we usually cannot get answers to such queries, so our approach instead makes these queries to a trained (imperfect) non-linear model. Because SVMs are widely used in bioinformatics, our empirical results use a real-world cancer genomics problem; because ground truth is not known for this task, we discuss the potential insights provided. We also evaluate on synthetic data where ground truth is known.
AB - There is great interest in methods to improve human insight into trained non-linear models. Leading approaches include producing a ranking of the most relevant features, a non-trivial task for non-linear models. We show theoretically and empirically the benefit of a novel version of recursive feature elimination (RFE) as often used with SVMs; the key idea is a simple twist on the kinds of sensitivity testing employed in computational learning theory with membership queries (e.g., [1]). With membership queries, one can check whether changing the value of a feature in an example changes the label. In the real-world, we usually cannot get answers to such queries, so our approach instead makes these queries to a trained (imperfect) non-linear model. Because SVMs are widely used in bioinformatics, our empirical results use a real-world cancer genomics problem; because ground truth is not known for this task, we discuss the potential insights provided. We also evaluate on synthetic data where ground truth is known.
KW - Correlation immunity
KW - Feature ranking
KW - Feature selection
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85062219541&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062219541&partnerID=8YFLogxK
U2 - 10.1109/ICMLA.2018.00014
DO - 10.1109/ICMLA.2018.00014
M3 - Conference contribution
AN - SCOPUS:85062219541
T3 - Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018
SP - 40
EP - 47
BT - Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018
A2 - Wani, M. Arif
A2 - Sayed-Mouchaweh, Moamar
A2 - Lughofer, Edwin
A2 - Gama, Joao
A2 - Kantardzic, Mehmed
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018
Y2 - 17 December 2018 through 20 December 2018
ER -