TY - GEN
T1 - G2-VER
T2 - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019
AU - Albrici, Tanguy
AU - Fasounaki, Mandana
AU - Salimi, Saleh Bagher
AU - Vray, Guillaume
AU - Bozorgtabar, Behzad
AU - Ekenel, Hazim Kemal
AU - Thiran, Jean Philippe
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - This paper addresses the problem of automatic facial expression recognition in videos, where the goal is to predict discrete emotion labels best describing the emotions expressed in short video clips. Building on a pre-trained convolutional neural network (CNN) model dedicated to analyzing the video frames and LSTM network designed to process the trajectories of the facial landmarks, this paper investigates several novel directions. First of all, improved face descriptors based on 2D CNNs and facial landmarks are proposed. Second, the paper investigates fusion methods of the features temporally, including a novel hierarchical recurrent neural network combining facial landmark trajectories over time. In addition, we propose a modification to state-of-the-art expression recognition architectures to adapt them to video processing in a simple way. In both ensemble approaches, the temporal information is integrated. Comparative experiments on publicly available video-based facial expression recognition datasets verified that the proposed framework outperforms state-of-the-art methods. Moreover, we introduce a near-infrared video dataset containing facial expressions from subjects driving their cars, which are recorded in real world conditions.
AB - This paper addresses the problem of automatic facial expression recognition in videos, where the goal is to predict discrete emotion labels best describing the emotions expressed in short video clips. Building on a pre-trained convolutional neural network (CNN) model dedicated to analyzing the video frames and LSTM network designed to process the trajectories of the facial landmarks, this paper investigates several novel directions. First of all, improved face descriptors based on 2D CNNs and facial landmarks are proposed. Second, the paper investigates fusion methods of the features temporally, including a novel hierarchical recurrent neural network combining facial landmark trajectories over time. In addition, we propose a modification to state-of-the-art expression recognition architectures to adapt them to video processing in a simple way. In both ensemble approaches, the temporal information is integrated. Comparative experiments on publicly available video-based facial expression recognition datasets verified that the proposed framework outperforms state-of-the-art methods. Moreover, we introduce a near-infrared video dataset containing facial expressions from subjects driving their cars, which are recorded in real world conditions.
UR - http://www.scopus.com/inward/record.url?scp=85070473176&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85070473176&partnerID=8YFLogxK
U2 - 10.1109/FG.2019.8756600
DO - 10.1109/FG.2019.8756600
M3 - Conference contribution
AN - SCOPUS:85070473176
T3 - Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019
BT - Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 May 2019 through 18 May 2019
ER -