TY - JOUR
T1 - A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning
AU - Cherepanova, Valeriia
AU - Levin, Roman
AU - Somepalli, Gowthami
AU - Geiping, Jonas
AU - Bayan Bruss, C.
AU - Wilson, Andrew Gordon
AU - Goldstein, Tom
AU - Goldblum, Micah
N1 - Publisher Copyright:
© 2023 Neural information processing systems foundation. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. Motivated by the increasing popularity of tabular deep learning, we construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose Deep Lasso - an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features.
AB - Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. Motivated by the increasing popularity of tabular deep learning, we construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose Deep Lasso - an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features.
UR - http://www.scopus.com/inward/record.url?scp=85189211001&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189211001&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85189211001
SN - 1049-5258
VL - 36
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Y2 - 10 December 2023 through 16 December 2023
ER -