TY - JOUR
T1 - Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification
AU - Fan, Jianqing
AU - Feng, Yang
AU - Jiang, Jiancheng
AU - Tong, Xin
N1 - Publisher Copyright:
© 2016 American Statistical Association.
Copyright:
Copyright 2016 Elsevier B.V., All rights reserved.
PY - 2016/1/2
Y1 - 2016/1/2
N2 - We propose a high-dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called feature augmentation via nonparametrics and selection (FANS). We motivate FANS by generalizing the naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression datasets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
AB - We propose a high-dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called feature augmentation via nonparametrics and selection (FANS). We motivate FANS by generalizing the naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression datasets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
KW - Classification
KW - Density estimation
KW - Feature augmentation
KW - Feature selection
KW - High-dimensional space
KW - Nonlinear decision boundary
KW - Parallel computing
UR - http://www.scopus.com/inward/record.url?scp=84969895621&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84969895621&partnerID=8YFLogxK
U2 - 10.1080/01621459.2015.1005212
DO - 10.1080/01621459.2015.1005212
M3 - Article
AN - SCOPUS:84969895621
VL - 111
SP - 275
EP - 287
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
SN - 0162-1459
IS - 513
ER -