TY - GEN
T1 - Advanced Facial Expression Classification with CNN-Transformer Integration for Human-Computer Interaction
AU - Azmoudeh, Ali
AU - Gumussoy, Cigdem Altin
AU - Ekenel, Hazim Kemal
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This paper presents an advanced approach to Facial Expression Classification (FEC) to evaluate user behavior in online shopping environments. In this application, users' videos are captured as they accomplish tasks under varying circumstances, including scenarios with and without moderator aid. We utilized and trained a simplified POSTERv1 model on the AffectNet dataset to analyze the captured videos. The model processes frames and performs first, face detection b y using the MTCNN approach. Then, the detected face is resized and normalized to ensure compatibility with the input requirement of the deep learning architecture. The normalized face image is fed to the facial landmark detector and facial feature extractor networks. The outputs from these two parallel pipelines are provided to the cross-fusion transformer encoder to capture multi-scale features and enhance expression recognition accuracy. Experimental results demonstrate the model's efficacy, achieving notable accuracy across AffectNet, CK+, and FER2013 datasets. Our approach effectively addresses real-world challenges in FEC by creating a custom dataset and comparing emotional responses in moderated versus non-moderated scenarios, highlighting its potential for Human-Computer Interaction applications.
AB - This paper presents an advanced approach to Facial Expression Classification (FEC) to evaluate user behavior in online shopping environments. In this application, users' videos are captured as they accomplish tasks under varying circumstances, including scenarios with and without moderator aid. We utilized and trained a simplified POSTERv1 model on the AffectNet dataset to analyze the captured videos. The model processes frames and performs first, face detection b y using the MTCNN approach. Then, the detected face is resized and normalized to ensure compatibility with the input requirement of the deep learning architecture. The normalized face image is fed to the facial landmark detector and facial feature extractor networks. The outputs from these two parallel pipelines are provided to the cross-fusion transformer encoder to capture multi-scale features and enhance expression recognition accuracy. Experimental results demonstrate the model's efficacy, achieving notable accuracy across AffectNet, CK+, and FER2013 datasets. Our approach effectively addresses real-world challenges in FEC by creating a custom dataset and comparing emotional responses in moderated versus non-moderated scenarios, highlighting its potential for Human-Computer Interaction applications.
KW - Convolutional Neural Networks
KW - Facial Expression Classification
KW - Human-Computer Interaction
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85215518989&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85215518989&partnerID=8YFLogxK
U2 - 10.1109/UBMK63289.2024.10773557
DO - 10.1109/UBMK63289.2024.10773557
M3 - Conference contribution
AN - SCOPUS:85215518989
T3 - UBMK 2024 - Proceedings: 9th International Conference on Computer Science and Engineering
SP - 800
EP - 805
BT - UBMK 2024 - Proceedings
A2 - Adali, Esref
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Conference on Computer Science and Engineering, UBMK 2024
Y2 - 26 October 2024 through 28 October 2024
ER -