TY - JOUR
T1 - Classification of breast cancer subtypes
T2 - A study based on representative genes
AU - Mendonca-Neto, Rayol
AU - Reis, João
AU - Okimoto, Leandro
AU - Fenyö, David
AU - Silva, Claudio
AU - Nakamura, Fabíola
AU - Nakamura, Eduardo
N1 - Funding Information:
This study was financed in part by the Coordenação de Aperfeiçoa-mento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. This work was developed with the support of the Amazonas State Government through the Fundação de Amparo à Pesquisa do Estado do Amazonas (FAPEAM), with the granting of a scholarship. This research, according to Article 48 of Decree nº 6.008/2006, was funded by Samsung Electronics of Amazonia Ltda, under the terms of Federal Law nº 8.387/1991, through agreement nº 003, signed with ICOMP/UFAM.
Publisher Copyright:
© 2022, Brazilian Computing Society. All rights reserved.
PY - 2022/9/22
Y1 - 2022/9/22
N2 - Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. In this work, we propose an evaluation framework that uses different machine learning techniques for a broader analysis of the PAM50 list in the classification of breast cancer subtypes. The experiments show that the best method to be used in the classification of breast cancer subtypes is the SVM with linear kernel, which presented an F1 score of 0.98 for the Basal subtype and 0.90 for the Her 2 subtype, the two subtypes with worse prognosis, respectively. We also presented a gene analysis for the classification methods using SHAP values, where we found which genes are important for the classification of each subtype.
AB - Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. In this work, we propose an evaluation framework that uses different machine learning techniques for a broader analysis of the PAM50 list in the classification of breast cancer subtypes. The experiments show that the best method to be used in the classification of breast cancer subtypes is the SVM with linear kernel, which presented an F1 score of 0.98 for the Basal subtype and 0.90 for the Her 2 subtype, the two subtypes with worse prognosis, respectively. We also presented a gene analysis for the classification methods using SHAP values, where we found which genes are important for the classification of each subtype.
KW - Breast Cancer
KW - Gene Expression
KW - Subtypes Classification
UR - http://www.scopus.com/inward/record.url?scp=85146225785&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146225785&partnerID=8YFLogxK
U2 - 10.5753/jbcs.2022.2209
DO - 10.5753/jbcs.2022.2209
M3 - Article
AN - SCOPUS:85146225785
SN - 0104-6500
VL - 28
SP - 59
EP - 68
JO - Journal of the Brazilian Computer Society
JF - Journal of the Brazilian Computer Society
IS - 1
ER -