TY - JOUR
T1 - A Gene Selection Method Based on Outliers for Breast Cancer Subtype Classification
AU - Mendonca-Neto, Rayol
AU - Li, Zhi
AU - Fenyo, David
AU - Silva, Claudio T.
AU - Nakamura, Fabiola G.
AU - Nakamura, Eduardo F.
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. Gene expression data is a viable alternative to be employed on cancer subtype classification, as they represent the state of a cell at the molecular level, but generally has a relatively small number of samples compared to a large number of genes. Gene selection is a promising approach that addresses this uneven high-dimensional matrix of genes versus samples and plays an important role in the development of efficient cancer subtype classification. In this work, an innovative outlier-based gene selection (OGS) method is proposed to select relevant genes for efficiently and effectively classify breast cancer subtypes. Experiments show that our strategy presents an F1 score of 1.0 for basal and 0.86 for her 2, the two subtypes with the worst prognoses, respectively. Compared to other methods, our proposed method outperforms in the F1 score using 80% less genes. In general, our method selects only a few highly relevant genes, speeding up the classification, and significantly improving the classifier's performance.
AB - Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. Gene expression data is a viable alternative to be employed on cancer subtype classification, as they represent the state of a cell at the molecular level, but generally has a relatively small number of samples compared to a large number of genes. Gene selection is a promising approach that addresses this uneven high-dimensional matrix of genes versus samples and plays an important role in the development of efficient cancer subtype classification. In this work, an innovative outlier-based gene selection (OGS) method is proposed to select relevant genes for efficiently and effectively classify breast cancer subtypes. Experiments show that our strategy presents an F1 score of 1.0 for basal and 0.86 for her 2, the two subtypes with the worst prognoses, respectively. Compared to other methods, our proposed method outperforms in the F1 score using 80% less genes. In general, our method selects only a few highly relevant genes, speeding up the classification, and significantly improving the classifier's performance.
KW - Gene expression
KW - breast cancer
KW - gene selection
KW - outlier genes
UR - http://www.scopus.com/inward/record.url?scp=85120895658&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85120895658&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2021.3132339
DO - 10.1109/TCBB.2021.3132339
M3 - Article
C2 - 34860652
AN - SCOPUS:85120895658
SN - 1545-5963
VL - 19
SP - 2547
EP - 2559
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 5
ER -