Classification of breast cancer subtypes: A study based on representative genes

Rayol Mendonca-Neto, João Reis, Leandro Okimoto, David Fenyö, Claudio Silva, Fabíola Nakamura, Eduardo Nakamura

Research output: Contribution to journalArticlepeer-review


Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. In this work, we propose an evaluation framework that uses different machine learning techniques for a broader analysis of the PAM50 list in the classification of breast cancer subtypes. The experiments show that the best method to be used in the classification of breast cancer subtypes is the SVM with linear kernel, which presented an F1 score of 0.98 for the Basal subtype and 0.90 for the Her 2 subtype, the two subtypes with worse prognosis, respectively. We also presented a gene analysis for the classification methods using SHAP values, where we found which genes are important for the classification of each subtype.

Original languageEnglish (US)
Pages (from-to)59-68
Number of pages10
JournalJournal of the Brazilian Computer Society
Issue number1
StatePublished - Sep 22 2022


  • Breast Cancer
  • Gene Expression
  • Subtypes Classification

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Classification of breast cancer subtypes: A study based on representative genes'. Together they form a unique fingerprint.

Cite this