TY - GEN
T1 - Predicting Deeper into the Future of Semantic Segmentation
AU - Luc, Pauline
AU - Neverova, Natalia
AU - Couprie, Camille
AU - Verbeek, Jakob
AU - Lecun, Yann
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/22
Y1 - 2017/12/22
N2 - The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. in robotics or autonomous driving, which depend on visual scene understanding for decision making. While prediction of the raw RGB pixel values in future video frames has been studied in previous work, here we introduce the novel task of predicting semantic segmentations of future frames. Given a sequence of video frames, our goal is to predict segmentation maps of not yet observed video frames that lie up to a second or further in the future. We develop an autoregressive convolutional neural network that learns to iteratively generate multiple frames. Our results on the Cityscapes dataset show that directly predicting future segmentations is substantially better than predicting and then segmenting future RGB frames. Prediction results up to half a second in the future are visually convincing and are much more accurate than those of a baseline based on warping semantic segmentations using optical flow.
AB - The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. in robotics or autonomous driving, which depend on visual scene understanding for decision making. While prediction of the raw RGB pixel values in future video frames has been studied in previous work, here we introduce the novel task of predicting semantic segmentations of future frames. Given a sequence of video frames, our goal is to predict segmentation maps of not yet observed video frames that lie up to a second or further in the future. We develop an autoregressive convolutional neural network that learns to iteratively generate multiple frames. Our results on the Cityscapes dataset show that directly predicting future segmentations is substantially better than predicting and then segmenting future RGB frames. Prediction results up to half a second in the future are visually convincing and are much more accurate than those of a baseline based on warping semantic segmentations using optical flow.
UR - http://www.scopus.com/inward/record.url?scp=85041902695&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041902695&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2017.77
DO - 10.1109/ICCV.2017.77
M3 - Conference contribution
AN - SCOPUS:85041902695
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 648
EP - 657
BT - Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE International Conference on Computer Vision, ICCV 2017
Y2 - 22 October 2017 through 29 October 2017
ER -