TY - GEN
T1 - Learning mid-level features for recognition
AU - Boureau, Y. Lan
AU - Bach, Francis
AU - LeCun, Yann
AU - Ponce, Jean
PY - 2010
Y1 - 2010
N2 - Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter re-sponses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be bro-ken down into two steps: (1) a coding step, which per-forms a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pool-ing step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pool-ing schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the rela-tive importance of each step of mid-level feature extrac-tion through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the aver-age, or the maximum), which obtains state-of-the-art per-formance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature ex-tractors, our approach aims to facilitate the design of better recognition architectures.
AB - Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter re-sponses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be bro-ken down into two steps: (1) a coding step, which per-forms a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pool-ing step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pool-ing schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the rela-tive importance of each step of mid-level feature extrac-tion through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the aver-age, or the maximum), which obtains state-of-the-art per-formance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature ex-tractors, our approach aims to facilitate the design of better recognition architectures.
UR - http://www.scopus.com/inward/record.url?scp=77955993281&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955993281&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2010.5539963
DO - 10.1109/CVPR.2010.5539963
M3 - Conference contribution
AN - SCOPUS:77955993281
SN - 9781424469840
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 2559
EP - 2566
BT - 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
T2 - 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
Y2 - 13 June 2010 through 18 June 2010
ER -