An empirical comparison of support vector machines versus nearest neighbour methods for machine learning applications

Mori Gamboni, Abhijai Garg, Oleg Grishin, Seung Man Oh, Francis Sowani, Anthony Spalvieri-Kruse, Godfried T. Toussaint, Lingliang Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Support vector machines (SVMs) are traditionally considered to be the best classifiers in terms of minimizing the empirical probability of misclassification, although they can be slow when the training datasets are large. Here SVMs are compared to the classic k-Nearest Neighbour (k-NN) decision rule using seven large real-world datasets obtained from the University of California at Irvine (UCI) Machine Learning Repository. To counterbalance the slowness of SVMs on large datasets, three simple and fast methods for reducing the size of the training data, and thus speeding up the SVMs are incorporated. One is blind random sampling. The other two are new linear-time methods for guided random sampling which we call Gaussian Condensing and Gaussian Smoothing. In spite of the speedups of SVMs obtained by incorporating Gaussian Smoothing and Condensing, the results obtained show that k-NN methods are superior to SVMs on most of the seven data sets used, and cast doubt on the general superiority of SVMs. Furthermore, random sampling works surprisingly well and is robust, suggesting that it is a worthwhile preprocessing step to either SVMs or k-NN.

Original languageEnglish (US)
Title of host publicationPattern Recognition Applications and Methods - 3rs International Conference, ICPRAM 2014, Revised Selected Papers
EditorsMaria de Marsico, Ana Fred, Antoine Tabbone
PublisherSpringer Verlag
Pages110-129
Number of pages20
ISBN (Print)9783319255293
DOIs
StatePublished - 2015
Event3rd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2014 - Angers, France
Duration: Mar 6 2014Mar 8 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9443
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other3rd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2014
Country/TerritoryFrance
CityAngers
Period3/6/143/8/14

Keywords

  • Blind and guided random sampling
  • Data mining
  • Gaussian Condensing
  • K-Nearest neighbour methods
  • Machine learning
  • SMO
  • Support vector machines
  • Training data condensation
  • Wilson editing

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'An empirical comparison of support vector machines versus nearest neighbour methods for machine learning applications'. Together they form a unique fingerprint.

Cite this