Automatic emotion recognition in the wild using an ensemble of static and dynamic representations

Mostafa Mehdipour Ghazi, Hazim Kemal Ekenel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic emotion recognition in the wild video datasets is a very challenging problem because of the inter-class similarities among different facial expressions and large intraclass variabilities due to the significant changes in illumination, pose, scene, and expression. In this paper, we present our proposed method for video-based emotion recognition in the EmotiW 2016 challenge. The task considers the unconstrained emotion recognition problem by training from short video clips extracted from movies and testing on short movie clips and spontaneous video clips of the reality TV data. Four different methods are employed to extract both static and dynamic emotion representations from the videos. First, local binary patterns of three orthogonal planes are used to describe spatiotemporal features of the video frames. Second, principal component analysis is applied to the image patches in a two-stage convolutional network to learn weights and extract facial features from the aligned faces. Third, the deep convolutional neural network model of VGGFace is deployed to extract deep facial representations from aligned faces. Fourth, a bag of visual words is computed based on dense scale-invariant feature transform descriptors from aligned face images to form hand-crafted representations. Support vector machines are then utilized to train and classify the obtained spatiotemporal representations and facial features. Finally, score-level fusion is applied to combine the classification results and predict the emotion labels of the video clips. The results show that the proposed combined method has outperformed all the utilized techniques with the overall validations and test accuracies of 43.13% and 40.13%, respectively. This system, is relatively a good classifier in Happy and Angry emotion categories and is unsuccessful in detecting Surprise, Disgust, and Fear.

Original languageEnglish (US)
Title of host publicationICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction
EditorsCatherine Pelachaud, Yukiko I. Nakano, Toyoaki Nishida, Carlos Busso, Louis-Philippe Morency, Elisabeth Andre
PublisherAssociation for Computing Machinery, Inc
Pages514-521
Number of pages8
ISBN (Electronic)9781450345569
DOIs
StatePublished - Oct 31 2016
Event18th ACM International Conference on Multimodal Interaction, ICMI 2016 - Tokyo, Japan
Duration: Nov 12 2016Nov 16 2016

Publication series

NameICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction

Conference

Conference18th ACM International Conference on Multimodal Interaction, ICMI 2016
Country/TerritoryJapan
CityTokyo
Period11/12/1611/16/16

Keywords

  • Automatic emotion recognition
  • Convolutional neural network
  • Local binary patterns
  • Principal component analysis
  • Scale-invariant feature transform
  • Support vector machines

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Automatic emotion recognition in the wild using an ensemble of static and dynamic representations'. Together they form a unique fingerprint.

Cite this