TY - GEN
T1 - Large-scale learning with SVM and convolutional nets for generic object categorization
AU - Huang, Fu Jie
AU - LeCun, Yann
PY - 2006
Y1 - 2006
N2 - The detection and recognition of generic object categories with invariance to viewpoint, illumination, and clutter requires the combination of a feature extractor and a classifier. We show that architectures such as convolutional networks are good at learning invariant features, but not always optimal for classification, while Support Vector Machines are good at producing decision surfaces from well-behaved feature vectors, but cannot learn complicated invariances. We present a hybrid system where a convolutional network is trained to detect and recognize generic objects, and a Gaussian-kernel SVM is trained from the features learned by the convolutional network. Results are given on a large generic object recognition task with six categories (human figures, four-legged animals, airplanes, trucks, cars, and "none of the above"), with multiple instances of each object category under various poses, illuminations, and backgrounds. On the test set, which contains different object instances than the training set, an SVM alone yields a 43.3% error rate, a convolutional net alone yields 7.2% and an SVM on top of features produced by the convolutional net yields 5.9%.
AB - The detection and recognition of generic object categories with invariance to viewpoint, illumination, and clutter requires the combination of a feature extractor and a classifier. We show that architectures such as convolutional networks are good at learning invariant features, but not always optimal for classification, while Support Vector Machines are good at producing decision surfaces from well-behaved feature vectors, but cannot learn complicated invariances. We present a hybrid system where a convolutional network is trained to detect and recognize generic objects, and a Gaussian-kernel SVM is trained from the features learned by the convolutional network. Results are given on a large generic object recognition task with six categories (human figures, four-legged animals, airplanes, trucks, cars, and "none of the above"), with multiple instances of each object category under various poses, illuminations, and backgrounds. On the test set, which contains different object instances than the training set, an SVM alone yields a 43.3% error rate, a convolutional net alone yields 7.2% and an SVM on top of features produced by the convolutional net yields 5.9%.
UR - http://www.scopus.com/inward/record.url?scp=33845597145&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33845597145&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2006.164
DO - 10.1109/CVPR.2006.164
M3 - Conference contribution
AN - SCOPUS:33845597145
SN - 0769525970
SN - 9780769525976
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 284
EP - 291
BT - Proceedings - 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006
T2 - 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006
Y2 - 17 June 2006 through 22 June 2006
ER -