The condensed nearest neighbor rule (CNN) was proposed by Hart (1968) as a method to reduce the storage requirements of the original data set D for the efficient implementation of the nearest neighbor decision rule in pattern classification problems. Tomek (1976a) suggested two modifications of CNN in order to improve its performance. As a first step in Tomek's second method he computes a subset C of D, for subsequent use in CNN, and claims that C is training-set-consistent, i.e., that all data points in D are correctly classified by the nearest neighbor rule using C. In this note we provide a counterexample to this claim. We also analyze Tomek's algorithm in the context of more recent graph-theoretical condensing schemes.
ASJC Scopus subject areas
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence