Neyman-Pearson Multi-Class Classification via Cost-Sensitive Learning

Ye Tian, Yang Feng

Research output: Contribution to journalArticlepeer-review

Abstract

Most existing classification methods aim to minimize the overall misclassification error rate. However, in applications such as loan default prediction, different types of errors can have varying consequences. To address this asymmetry issue, two popular paradigms have been developed: the Neyman-Pearson (NP) paradigm and the cost-sensitive (CS) paradigm. Previous studies on the NP paradigm have primarily focused on the binary case, while the multi-class NP problem poses a greater challenge due to its unknown feasibility. In this work, we tackle the multi-class NP problem by establishing a connection with the CS problem via strong duality and propose two algorithms. We extend the concept of NP oracle inequalities, crucial in binary classifications, to NP oracle properties in the multi-class context. Our algorithms satisfy these NP oracle properties under certain conditions. Furthermore, we develop practical algorithms to assess the feasibility and strong duality in multi-class NP problems, which can offer practitioners the landscape of a multi-class NP problem with various target error levels. Simulations and real data studies validate the effectiveness of our algorithms. To our knowledge, this is the first study to address the multi-class NP problem with theoretical guarantees. The proposed algorithms have been implemented in the R package npcs, which is available on CRAN. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

Original languageEnglish (US)
Pages (from-to)1164-1177
Number of pages14
JournalJournal of the American Statistical Association
Volume120
Issue number550
DOIs
StatePublished - 2025

Keywords

  • Confusion matrix
  • Cost-sensitive learning
  • Duality
  • Feasibility
  • Multi-class classification
  • Neyman-Pearson paradigm

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Neyman-Pearson Multi-Class Classification via Cost-Sensitive Learning'. Together they form a unique fingerprint.

Cite this