TY - GEN
T1 - Automatic emotion recognition in the wild using an ensemble of static and dynamic representations
AU - Ghazi, Mostafa Mehdipour
AU - Ekenel, Hazim Kemal
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/10/31
Y1 - 2016/10/31
N2 - Automatic emotion recognition in the wild video datasets is a very challenging problem because of the inter-class similarities among different facial expressions and large intraclass variabilities due to the significant changes in illumination, pose, scene, and expression. In this paper, we present our proposed method for video-based emotion recognition in the EmotiW 2016 challenge. The task considers the unconstrained emotion recognition problem by training from short video clips extracted from movies and testing on short movie clips and spontaneous video clips of the reality TV data. Four different methods are employed to extract both static and dynamic emotion representations from the videos. First, local binary patterns of three orthogonal planes are used to describe spatiotemporal features of the video frames. Second, principal component analysis is applied to the image patches in a two-stage convolutional network to learn weights and extract facial features from the aligned faces. Third, the deep convolutional neural network model of VGGFace is deployed to extract deep facial representations from aligned faces. Fourth, a bag of visual words is computed based on dense scale-invariant feature transform descriptors from aligned face images to form hand-crafted representations. Support vector machines are then utilized to train and classify the obtained spatiotemporal representations and facial features. Finally, score-level fusion is applied to combine the classification results and predict the emotion labels of the video clips. The results show that the proposed combined method has outperformed all the utilized techniques with the overall validations and test accuracies of 43.13% and 40.13%, respectively. This system, is relatively a good classifier in Happy and Angry emotion categories and is unsuccessful in detecting Surprise, Disgust, and Fear.
AB - Automatic emotion recognition in the wild video datasets is a very challenging problem because of the inter-class similarities among different facial expressions and large intraclass variabilities due to the significant changes in illumination, pose, scene, and expression. In this paper, we present our proposed method for video-based emotion recognition in the EmotiW 2016 challenge. The task considers the unconstrained emotion recognition problem by training from short video clips extracted from movies and testing on short movie clips and spontaneous video clips of the reality TV data. Four different methods are employed to extract both static and dynamic emotion representations from the videos. First, local binary patterns of three orthogonal planes are used to describe spatiotemporal features of the video frames. Second, principal component analysis is applied to the image patches in a two-stage convolutional network to learn weights and extract facial features from the aligned faces. Third, the deep convolutional neural network model of VGGFace is deployed to extract deep facial representations from aligned faces. Fourth, a bag of visual words is computed based on dense scale-invariant feature transform descriptors from aligned face images to form hand-crafted representations. Support vector machines are then utilized to train and classify the obtained spatiotemporal representations and facial features. Finally, score-level fusion is applied to combine the classification results and predict the emotion labels of the video clips. The results show that the proposed combined method has outperformed all the utilized techniques with the overall validations and test accuracies of 43.13% and 40.13%, respectively. This system, is relatively a good classifier in Happy and Angry emotion categories and is unsuccessful in detecting Surprise, Disgust, and Fear.
KW - Automatic emotion recognition
KW - Convolutional neural network
KW - Local binary patterns
KW - Principal component analysis
KW - Scale-invariant feature transform
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=85016593105&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016593105&partnerID=8YFLogxK
U2 - 10.1145/2993148.2997634
DO - 10.1145/2993148.2997634
M3 - Conference contribution
AN - SCOPUS:85016593105
T3 - ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction
SP - 514
EP - 521
BT - ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction
A2 - Pelachaud, Catherine
A2 - Nakano, Yukiko I.
A2 - Nishida, Toyoaki
A2 - Busso, Carlos
A2 - Morency, Louis-Philippe
A2 - Andre, Elisabeth
PB - Association for Computing Machinery, Inc
T2 - 18th ACM International Conference on Multimodal Interaction, ICMI 2016
Y2 - 12 November 2016 through 16 November 2016
ER -