TY - JOUR
T1 - Few-shot genes selection
T2 - subset of PAM50 genes for breast cancer subtypes classification
AU - Okimoto, Leandro Y.S.
AU - Mendonca-Neto, Rayol
AU - Nakamura, Fabíola G.
AU - Nakamura, Eduardo F.
AU - Fenyö, David
AU - Silva, Claudio T.
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12
Y1 - 2024/12
N2 - Background: In recent years, researchers have made significant strides in understanding the heterogeneity of breast cancer and its various subtypes. However, the wealth of genomic and proteomic data available today necessitates efficient frameworks, instruments, and computational tools for meaningful analysis. Despite its success as a prognostic tool, the PAM50 gene signature’s reliance on many genes presents challenges in terms of cost and complexity. Consequently, there is a need for more efficient methods to classify breast cancer subtypes using a reduced gene set accurately. Results: This study explores the potential of achieving precise breast cancer subtype categorization using a reduced gene set derived from the PAM50 gene signature. By employing a “Few-Shot Genes Selection” method, we randomly select smaller subsets from PAM50 and evaluate their performance using metrics and a linear model, specifically the Support Vector Machine (SVM) classifier. In addition, we aim to assess whether a more compact gene set can maintain performance while simplifying the classification process. Our findings demonstrate that certain reduced gene subsets can perform comparable or superior to the full PAM50 gene signature. Conclusions: The identified gene subsets, with 36 genes, have the potential to contribute to the development of more cost-effective and streamlined diagnostic tools in breast cancer research and clinical settings.
AB - Background: In recent years, researchers have made significant strides in understanding the heterogeneity of breast cancer and its various subtypes. However, the wealth of genomic and proteomic data available today necessitates efficient frameworks, instruments, and computational tools for meaningful analysis. Despite its success as a prognostic tool, the PAM50 gene signature’s reliance on many genes presents challenges in terms of cost and complexity. Consequently, there is a need for more efficient methods to classify breast cancer subtypes using a reduced gene set accurately. Results: This study explores the potential of achieving precise breast cancer subtype categorization using a reduced gene set derived from the PAM50 gene signature. By employing a “Few-Shot Genes Selection” method, we randomly select smaller subsets from PAM50 and evaluate their performance using metrics and a linear model, specifically the Support Vector Machine (SVM) classifier. In addition, we aim to assess whether a more compact gene set can maintain performance while simplifying the classification process. Our findings demonstrate that certain reduced gene subsets can perform comparable or superior to the full PAM50 gene signature. Conclusions: The identified gene subsets, with 36 genes, have the potential to contribute to the development of more cost-effective and streamlined diagnostic tools in breast cancer research and clinical settings.
KW - Breast cancer subtypes
KW - EXplainable AI
KW - Gene expression
KW - PAM50
UR - http://www.scopus.com/inward/record.url?scp=85186610289&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85186610289&partnerID=8YFLogxK
U2 - 10.1186/s12859-024-05715-8
DO - 10.1186/s12859-024-05715-8
M3 - Article
C2 - 38429657
AN - SCOPUS:85186610289
SN - 1471-2105
VL - 25
JO - BMC bioinformatics
JF - BMC bioinformatics
IS - 1
M1 - 92
ER -