A theoretical analysis of feature pooling in visual recognition

Y. Lan Boureau, Jean Ponce, Yann Lecun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many modem visual recognition algorithms incorporate a step of spatial 'pooling', where the outputs of several nearby feature detectors are combined into a local or global 'bag of features', in a way that preserves task-related information while removing irrelevant details. Pooling is used to achieve invariance to image transformations, more compact representations, and better robustness to noise and clutter. Several papers have shown that the details of the pooling operation can greatly influence the performance, but studies have so far been purely empirical. In this paper, we show that the reasons underlying the performance of various pooling methods are obscured by several confounding factors, such as the link between the sample cardinality in a spatial pool and the resolution at which low-level features have been extracted. We provide a detailed theoretical analysis of max pooling and average pooling, and give extensive empirical comparisons for object recognition tasks.

Original languageEnglish (US)
Title of host publicationICML 2010 - Proceedings, 27th International Conference on Machine Learning
Pages111-118
Number of pages8
StatePublished - 2010
Event27th International Conference on Machine Learning, ICML 2010 - Haifa, Israel
Duration: Jun 21 2010Jun 25 2010

Publication series

NameICML 2010 - Proceedings, 27th International Conference on Machine Learning

Other

Other27th International Conference on Machine Learning, ICML 2010
Country/TerritoryIsrael
CityHaifa
Period6/21/106/25/10

ASJC Scopus subject areas

  • Artificial Intelligence
  • Education

Fingerprint

Dive into the research topics of 'A theoretical analysis of feature pooling in visual recognition'. Together they form a unique fingerprint.

Cite this