TY - JOUR
T1 - From handcrafted to learned representations for human action recognition
T2 - A survey
AU - Zhu, Fan
AU - Shao, Ling
AU - Xie, Jin
AU - Fang, Yi
N1 - Publisher Copyright:
© 2016 Elsevier B.V.
PY - 2016/11/1
Y1 - 2016/11/1
N2 - Human action recognition is an important branch among the studies of both human perception and computer vision systems. Along with the development of artificial intelligence, deep learning techniques have gained remarkable reputation when dealing with image categorization tasks (e.g., object detection and classification). However, since human actions normally present in the form of sequential image frames, analyzing human action data requires significantly increased computational power than still images when deep learning techniques are employed. Such a challenge has been the bottleneck for the migration of learning-based image representation techniques to action sequences, so that the old fashioned handcrafted human action representations are still widely used for human action recognition tasks. On the other hand, since handcrafted representations are usually ad-hoc and overfit to specific data, they are incapable of being generalized to deal with various realistic scenarios. Consequently, resorting to deep learning action representations for human action recognition tasks is eventually a natural option. In this work, we provide a detailed overview of recent advancements in human action representations. As the first survey that covers both handcrafted and learning-based action representations, we explicitly discuss the superiorities and limitations of exiting techniques from both kinds. The ultimate goal of this survey is to provide comprehensive analysis and comparisons between learning-based and handcrafted action representations respectively, so as to inspire action recognition researchers towards the study of both kinds of representation techniques.
AB - Human action recognition is an important branch among the studies of both human perception and computer vision systems. Along with the development of artificial intelligence, deep learning techniques have gained remarkable reputation when dealing with image categorization tasks (e.g., object detection and classification). However, since human actions normally present in the form of sequential image frames, analyzing human action data requires significantly increased computational power than still images when deep learning techniques are employed. Such a challenge has been the bottleneck for the migration of learning-based image representation techniques to action sequences, so that the old fashioned handcrafted human action representations are still widely used for human action recognition tasks. On the other hand, since handcrafted representations are usually ad-hoc and overfit to specific data, they are incapable of being generalized to deal with various realistic scenarios. Consequently, resorting to deep learning action representations for human action recognition tasks is eventually a natural option. In this work, we provide a detailed overview of recent advancements in human action representations. As the first survey that covers both handcrafted and learning-based action representations, we explicitly discuss the superiorities and limitations of exiting techniques from both kinds. The ultimate goal of this survey is to provide comprehensive analysis and comparisons between learning-based and handcrafted action representations respectively, so as to inspire action recognition researchers towards the study of both kinds of representation techniques.
KW - Convolutional neural network
KW - Deep learning
KW - Dictionary learning
KW - Handcrafted features
KW - Human action recognition
UR - http://www.scopus.com/inward/record.url?scp=84979582625&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84979582625&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2016.06.007
DO - 10.1016/j.imavis.2016.06.007
M3 - Article
AN - SCOPUS:84979582625
SN - 0262-8856
VL - 55
SP - 42
EP - 52
JO - Image and Vision Computing
JF - Image and Vision Computing
ER -