An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis

G. T. Toussaint, P. M. Sharpe

Research output: Contribution to journalArticlepeer-review

Abstract

The problem of estimating the performance of a given classifier on a given data set is discussed for the case when no knowledge is available concerning the underlying distributions. A new method of estimating the probability of misclassification is proposed which yields essentially unbiased results similar to Lachenbruch's U-method with far less computation involved. While no theoretical work is presented, a practical rule of thumb is given for choosing the parameters of the estimator. The results are based on experiments performed with a data set concerning six diseases related to epigastric pain, and underline the importance of reporting performance on both the testing data and the training data. Whereas previous papers have continually reported results with a probability of correct classification as high as 74.3 per cent on the raw data and 92.0 per cent on "processed" data, in this paper it is shown that a much more significant estimate of the probability of correct classification based on this data set is 51.0 per cent.

Original languageEnglish (US)
Pages (from-to)269-278
Number of pages10
JournalComputers in Biology and Medicine
Volume4
Issue number3-4
DOIs
StatePublished - Feb 1975

Keywords

  • Classification
  • Epigastric pain
  • Feature size
  • Nearest Neighbour rule
  • Nonparametric
  • Pattern recognition
  • Probability of misclassification
  • Sample size
  • Symptom diagnosis
  • Testing sets
  • Training sets

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Fingerprint Dive into the research topics of 'An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis'. Together they form a unique fingerprint.

Cite this