TY - GEN
T1 - A Closer Look at Spatiotemporal Convolutions for Action Recognition
AU - Tran, Du
AU - Wang, Heng
AU - Torresani, Lorenzo
AU - Ray, Jamie
AU - Lecun, Yann
AU - Paluri, Manohar
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/14
Y1 - 2018/12/14
N2 - In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly gains in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block 'R(2+1)D' which produces CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101, and HMDB51.
AB - In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly gains in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block 'R(2+1)D' which produces CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101, and HMDB51.
UR - http://www.scopus.com/inward/record.url?scp=85055123128&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055123128&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2018.00675
DO - 10.1109/CVPR.2018.00675
M3 - Conference contribution
AN - SCOPUS:85055123128
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 6450
EP - 6459
BT - Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
PB - IEEE Computer Society
T2 - 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
Y2 - 18 June 2018 through 22 June 2018
ER -