Convolutional learning of spatio-temporal features

Graham W. Taylor, Rob Fergus, Yann LeCun, Christoph Bregler

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent "flow fields" which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.

Original languageEnglish (US)
Title of host publicationComputer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings
PublisherSpringer Verlag
Number of pages14
EditionPART 6
ISBN (Print)3642155669, 9783642155666
StatePublished - 2010
Event11th European Conference on Computer Vision, ECCV 2010 - Heraklion, Crete, Greece
Duration: Sep 10 2010Sep 11 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 6
Volume6316 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference11th European Conference on Computer Vision, ECCV 2010
CityHeraklion, Crete


  • activity recognition
  • convolutional nets
  • optical flow
  • restricted Boltzmann machines
  • unsupervised learning
  • video analysis

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Convolutional learning of spatio-temporal features'. Together they form a unique fingerprint.

Cite this