TY - JOUR
T1 - RaSE
T2 - Random subspace ensemble classification
AU - Tian, Ye
AU - Feng, Yang
N1 - Funding Information:
The authors would like to thank the Action Editor and anonymous referees for many constructive comments which have greatly improved the paper. This work was partially supported by National Science Foundation CAREER grant DMS-2013789.
Publisher Copyright:
© 2021 Microtome Publishing. All rights reserved.
PY - 2021
Y1 - 2021
N2 - We propose a exible ensemble classification framework, Random Subspace Ensemble (RaSE), for sparse classification. In the RaSE algorithm, we aggregate many weak learners, where each weak learner is a base classifier trained in a subspace optimally selected from a collection of random subspaces. To conduct subspace selection, we propose a new criterion, ratio information criterion (RIC), based on weighted Kullback-Leibler divergence. The theoretical analysis includes the risk and Monte-Carlo variance of the RaSE classifier, establishing the screening consistency and weak consistency of RIC, and providing an upper bound for the misclassification rate of the RaSE classifier. In addition, we show that in a high-dimensional framework, the number of random subspaces needs to be very large to guarantee that a subspace covering signals is selected. Therefore, we propose an iterative version of the RaSE algorithm and prove that under some specific conditions, a smaller number of generated random subspaces are needed to find a desirable subspace through iteration. An array of simulations under various models and real-data applications demonstrate the effectiveness and robustness of the RaSE classifier and its iterative version in terms of low misclassification rate and accurate feature ranking. The RaSE algorithm is implemented in the R package RaSEn on CRAN.
AB - We propose a exible ensemble classification framework, Random Subspace Ensemble (RaSE), for sparse classification. In the RaSE algorithm, we aggregate many weak learners, where each weak learner is a base classifier trained in a subspace optimally selected from a collection of random subspaces. To conduct subspace selection, we propose a new criterion, ratio information criterion (RIC), based on weighted Kullback-Leibler divergence. The theoretical analysis includes the risk and Monte-Carlo variance of the RaSE classifier, establishing the screening consistency and weak consistency of RIC, and providing an upper bound for the misclassification rate of the RaSE classifier. In addition, we show that in a high-dimensional framework, the number of random subspaces needs to be very large to guarantee that a subspace covering signals is selected. Therefore, we propose an iterative version of the RaSE algorithm and prove that under some specific conditions, a smaller number of generated random subspaces are needed to find a desirable subspace through iteration. An array of simulations under various models and real-data applications demonstrate the effectiveness and robustness of the RaSE classifier and its iterative version in terms of low misclassification rate and accurate feature ranking. The RaSE algorithm is implemented in the R package RaSEn on CRAN.
KW - Consistency
KW - Ensemble Classification
KW - Feature Ranking
KW - High Dimensional Data
KW - Information Criterion
KW - Random Subspace Method
KW - Sparsity
UR - http://www.scopus.com/inward/record.url?scp=85105879167&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105879167&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85105879167
SN - 1532-4435
VL - 22
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -