TY - GEN
T1 - Predicting future instance segmentation by forecasting convolutional features
AU - Luc, Pauline
AU - Couprie, Camille
AU - LeCun, Yann
AU - Verbeek, Jakob
N1 - Funding Information:
Acknowledgment. This work has been partially supported by the grant ANR-16-CE23-0006 “Deep in France” and LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01). We thank Matthijs Douze, Xavier Martin, Ilija Radosavovic and Thomas Lucas for their precious comments.
Funding Information:
This work has been partially supported by the grant ANR-16-CE23-0006 “Deep in France” and LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01). We thank Matthijs Douze, Xavier Martin, Ilija Radosavovic and Thomas Lucas for their precious comments.
Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects. To deal with a varying number of output labels per image, we develop a predictive model in the space of fixed-sized convolutional features of the Mask R-CNN instance segmentation model. We apply the “detection head” of Mask R-CNN on the predicted features to produce the instance segmentation of future frames. Experiments show that this approach significantly improves over strong baselines based on optical flow and repurposed instance segmentation architectures.
AB - Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects. To deal with a varying number of output labels per image, we develop a predictive model in the space of fixed-sized convolutional features of the Mask R-CNN instance segmentation model. We apply the “detection head” of Mask R-CNN on the predicted features to produce the instance segmentation of future frames. Experiments show that this approach significantly improves over strong baselines based on optical flow and repurposed instance segmentation architectures.
KW - Convolutional neural networks
KW - Deep learning
KW - Instance segmentation
KW - Video prediction
UR - http://www.scopus.com/inward/record.url?scp=85055085511&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055085511&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01240-3_36
DO - 10.1007/978-3-030-01240-3_36
M3 - Conference contribution
AN - SCOPUS:85055085511
SN - 9783030012397
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 593
EP - 608
BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
A2 - Hebert, Martial
A2 - Ferrari, Vittorio
A2 - Sminchisescu, Cristian
A2 - Weiss, Yair
PB - Springer Verlag
T2 - 15th European Conference on Computer Vision, ECCV 2018
Y2 - 8 September 2018 through 14 September 2018
ER -