TY - GEN
T1 - A multimodal fuzzy inference system using a continuous facial expression representation for emotion detection
AU - Soladié, Catherine
AU - Salam, Hanan
AU - Pelachaud, Catherine
AU - Stoiber, Nicolas
AU - Séguier, Renaud
PY - 2012
Y1 - 2012
N2 - This paper presents a multimodal fuzzy inference system for emotion detection. The system extracts and merges visual, acoustic and context relevant features. The experiments have been performed as part of the AVEC 2012 challenge. Facial expressions play an important role in emotion detection. However, having an automatic system to detect facial emotional expressions on unknown subjects is still a challenging problem. Here, we propose a method that adapts to the morphology of the subject and that is based on an invariant representation of facial expressions. Our method relies on 8 key expressions of emotions of the subject. In our system, each image of a video sequence is defined by its relative position to these 8 expressions. These 8 expressions are synthesized for each subject from plausible distortions learnt on other subjects and transferred on the neutral face of the subject. Expression recognition in a video sequence is performed in this space with a basic intensity-area detector. The emotion is described in the 4 dimensions : valence, arousal, power and expectancy. The results show that the duration of high intensity smile is an expression that is meaningful for continuous valence detection and can also be used to improve arousal detection. The main variations in power and expectancy are given by context data.
AB - This paper presents a multimodal fuzzy inference system for emotion detection. The system extracts and merges visual, acoustic and context relevant features. The experiments have been performed as part of the AVEC 2012 challenge. Facial expressions play an important role in emotion detection. However, having an automatic system to detect facial emotional expressions on unknown subjects is still a challenging problem. Here, we propose a method that adapts to the morphology of the subject and that is based on an invariant representation of facial expressions. Our method relies on 8 key expressions of emotions of the subject. In our system, each image of a video sequence is defined by its relative position to these 8 expressions. These 8 expressions are synthesized for each subject from plausible distortions learnt on other subjects and transferred on the neutral face of the subject. Expression recognition in a video sequence is performed in this space with a basic intensity-area detector. The emotion is described in the 4 dimensions : valence, arousal, power and expectancy. The results show that the duration of high intensity smile is an expression that is meaningful for continuous valence detection and can also be used to improve arousal detection. The main variations in power and expectancy are given by context data.
KW - Context in emotion recognition
KW - Facial expression representation
KW - Fusion techniques
KW - Fuzzy inference system
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=84870213555&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870213555&partnerID=8YFLogxK
U2 - 10.1145/2388676.2388782
DO - 10.1145/2388676.2388782
M3 - Conference contribution
AN - SCOPUS:84870213555
SN - 9781450314671
T3 - ICMI'12 - Proceedings of the ACM International Conference on Multimodal Interaction
SP - 493
EP - 500
BT - ICMI'12 - Proceedings of the ACM International Conference on Multimodal Interaction
T2 - 14th ACM International Conference on Multimodal Interaction, ICMI 2012
Y2 - 22 October 2012 through 26 October 2012
ER -