Deep multi-scale video prediction beyond mean square error

Michael Mathieu, Camille Couprie, Yann LeCun

Research output: Contribution to conferencePaper

Abstract

Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. This is why pixel-space video prediction may be viewed as a promising avenue for unsupervised feature learning. In addition, while optical flow has been a very studied problem in computer vision for a long time, future frame prediction is rarely approached. Still, many vision applications could benefit from the knowledge of the next frames of videos, that does not require the complexity of tracking every pixel trajectory. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function. We compare our predictions to different published results based on recurrent neural networks on the UCF101 dataset.

Original languageEnglish (US)
StatePublished - Jan 1 2016
Event4th International Conference on Learning Representations, ICLR 2016 - San Juan, Puerto Rico
Duration: May 2 2016May 4 2016

Conference

Conference4th International Conference on Learning Representations, ICLR 2016
CountryPuerto Rico
CitySan Juan
Period5/2/165/4/16

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint Dive into the research topics of 'Deep multi-scale video prediction beyond mean square error'. Together they form a unique fingerprint.

  • Cite this

    Mathieu, M., Couprie, C., & LeCun, Y. (2016). Deep multi-scale video prediction beyond mean square error. Paper presented at 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico.