Convolutional learning of spatio-temporal features

Graham W. Taylor, Rob Fergus, Yann LeCun, Christoph Bregler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent "flow fields" which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.

Original languageEnglish (US)
Title of host publicationComputer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings
PublisherSpringer Verlag
Pages140-153
Number of pages14
EditionPART 6
ISBN (Print)3642155669, 9783642155666
DOIs
StatePublished - 2010
Event11th European Conference on Computer Vision, ECCV 2010 - Heraklion, Crete, Greece
Duration: Sep 10 2010Sep 11 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 6
Volume6316 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th European Conference on Computer Vision, ECCV 2010
CountryGreece
CityHeraklion, Crete
Period9/10/109/11/10

Keywords

  • activity recognition
  • convolutional nets
  • optical flow
  • restricted Boltzmann machines
  • unsupervised learning
  • video analysis

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Convolutional learning of spatio-temporal features'. Together they form a unique fingerprint.

  • Cite this

    Taylor, G. W., Fergus, R., LeCun, Y., & Bregler, C. (2010). Convolutional learning of spatio-temporal features. In Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings (PART 6 ed., pp. 140-153). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6316 LNCS, No. PART 6). Springer Verlag. https://doi.org/10.1007/978-3-642-15567-3_11