TY - GEN
T1 - Cross-task weakly supervised learning from instructional videos
AU - Zhukov, DImitri
AU - Alayrac, Jean Baptiste
AU - Cinbis, Ramazan Gokberk
AU - Fouhey, David
AU - Laptev, Ivan
AU - Sivic, Josef
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. At the heart of our approach is the observation that weakly supervised learning may be easier if a model shares components while learning different steps: "pour egg" should be trained jointly with other tasks involving "pour" and "egg". We formalize this in a component model for recognizing steps and a weakly supervised learning framework that can learn this model under temporal constraints from narration and the list of steps. Past data does not permit systematic studying of sharing and so we also gather a new dataset aimed at assessing cross-task sharing. Our experiments demonstrate that sharing across tasks improves performance, especially when done at the component level and that our component model can parse previously unseen tasks by virtue of its compositionality.
AB - In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. At the heart of our approach is the observation that weakly supervised learning may be easier if a model shares components while learning different steps: "pour egg" should be trained jointly with other tasks involving "pour" and "egg". We formalize this in a component model for recognizing steps and a weakly supervised learning framework that can learn this model under temporal constraints from narration and the list of steps. Past data does not permit systematic studying of sharing and so we also gather a new dataset aimed at assessing cross-task sharing. Our experiments demonstrate that sharing across tasks improves performance, especially when done at the component level and that our component model can parse previously unseen tasks by virtue of its compositionality.
KW - Action Recognition
UR - http://www.scopus.com/inward/record.url?scp=85078751217&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078751217&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2019.00365
DO - 10.1109/CVPR.2019.00365
M3 - Conference contribution
AN - SCOPUS:85078751217
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 3532
EP - 3540
BT - Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
PB - IEEE Computer Society
T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Y2 - 16 June 2019 through 20 June 2019
ER -