TY - GEN
T1 - Learning invariant features through topographic filter maps
AU - Kavukcuoglu, Koray
AU - Ranzato, Marc'Aurelio
AU - Fergus, Rob
AU - LeCun, Yann
PY - 2009
Y1 - 2009
N2 - Several recently-proposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a non-linear transform, such as a point-wise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locally-invariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited.
AB - Several recently-proposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a non-linear transform, such as a point-wise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locally-invariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited.
UR - http://www.scopus.com/inward/record.url?scp=70450177775&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70450177775&partnerID=8YFLogxK
U2 - 10.1109/CVPRW.2009.5206545
DO - 10.1109/CVPRW.2009.5206545
M3 - Conference contribution
AN - SCOPUS:70450177775
SN - 9781424439935
T3 - 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009
SP - 1605
EP - 1612
BT - 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009
PB - IEEE Computer Society
T2 - 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009
Y2 - 20 June 2009 through 25 June 2009
ER -