TY - GEN
T1 - An empirical comparison of support vector machines versus nearest neighbour methods for machine learning applications
AU - Gamboni, Mori
AU - Garg, Abhijai
AU - Grishin, Oleg
AU - Oh, Seung Man
AU - Sowani, Francis
AU - Spalvieri-Kruse, Anthony
AU - Toussaint, Godfried T.
AU - Zhang, Lingliang
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Support vector machines (SVMs) are traditionally considered to be the best classifiers in terms of minimizing the empirical probability of misclassification, although they can be slow when the training datasets are large. Here SVMs are compared to the classic k-Nearest Neighbour (k-NN) decision rule using seven large real-world datasets obtained from the University of California at Irvine (UCI) Machine Learning Repository. To counterbalance the slowness of SVMs on large datasets, three simple and fast methods for reducing the size of the training data, and thus speeding up the SVMs are incorporated. One is blind random sampling. The other two are new linear-time methods for guided random sampling which we call Gaussian Condensing and Gaussian Smoothing. In spite of the speedups of SVMs obtained by incorporating Gaussian Smoothing and Condensing, the results obtained show that k-NN methods are superior to SVMs on most of the seven data sets used, and cast doubt on the general superiority of SVMs. Furthermore, random sampling works surprisingly well and is robust, suggesting that it is a worthwhile preprocessing step to either SVMs or k-NN.
AB - Support vector machines (SVMs) are traditionally considered to be the best classifiers in terms of minimizing the empirical probability of misclassification, although they can be slow when the training datasets are large. Here SVMs are compared to the classic k-Nearest Neighbour (k-NN) decision rule using seven large real-world datasets obtained from the University of California at Irvine (UCI) Machine Learning Repository. To counterbalance the slowness of SVMs on large datasets, three simple and fast methods for reducing the size of the training data, and thus speeding up the SVMs are incorporated. One is blind random sampling. The other two are new linear-time methods for guided random sampling which we call Gaussian Condensing and Gaussian Smoothing. In spite of the speedups of SVMs obtained by incorporating Gaussian Smoothing and Condensing, the results obtained show that k-NN methods are superior to SVMs on most of the seven data sets used, and cast doubt on the general superiority of SVMs. Furthermore, random sampling works surprisingly well and is robust, suggesting that it is a worthwhile preprocessing step to either SVMs or k-NN.
KW - Blind and guided random sampling
KW - Data mining
KW - Gaussian Condensing
KW - K-Nearest neighbour methods
KW - Machine learning
KW - SMO
KW - Support vector machines
KW - Training data condensation
KW - Wilson editing
UR - http://www.scopus.com/inward/record.url?scp=84951854727&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84951854727&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-25530-9_8
DO - 10.1007/978-3-319-25530-9_8
M3 - Conference contribution
AN - SCOPUS:84951854727
SN - 9783319255293
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 110
EP - 129
BT - Pattern Recognition Applications and Methods - 3rs International Conference, ICPRAM 2014, Revised Selected Papers
A2 - de Marsico, Maria
A2 - Fred, Ana
A2 - Tabbone, Antoine
PB - Springer Verlag
T2 - 3rd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2014
Y2 - 6 March 2014 through 8 March 2014
ER -