TY - CONF
T1 - Predicting latent narrative mood using audio and physiologic data
AU - Al Hanai, Tuka
AU - Ghassemi, Mohammad Mahdi
N1 - Funding Information:
*Both authors contributed equally to this work. Tuka AlHanai would like to acknowledge the Al-Nokhba Scholarship and the Abu Dhabi Education Council for their support. Mohammad Ghassemi would like to acknowledge the Salerno foundation, The NIH Neu-roimaging Training Grant (NTP-T32 EB 001680), the Advanced Neuroimaging Training Grant (AMNTPT90 DA 22759). The authors would also like to acknowledge Hao Shen for aiding with data labeling, and Kushal Vora for arranging access to the Simbands. Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Publisher Copyright:
Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2017
Y1 - 2017
N2 - Inferring the latent emotive content of a narrative requires consideration of para-linguistic cues (e.g. pitch), linguistic content (e.g. vocabulary) and the physiological state of the narrator (e.g. heart-rate). In this study we utilized a combination of auditory, text, and physiological signals to predict the mood (happy or sad) of 31 narrations from subjects engaged in personal story-telling. We extracted 386 audio and 222 physiological features (using the Samsung Simband) from the data. A subset of 4 audio, 1 text, and 5 physiologic features were identified using Sequential Forward Selection (SFS) for inclusion in a Neural Network (NN). These features included subject movement, cardiovascular activity, energy in speech, probability of voicing, and linguistic sentiment (i.e. negative or positive). We explored the effects of introducing our selected features at various layers of the NN and found that the location of these features in the network topology had a significant impact on model performance. To ensure the real-time utility of the model, classification was performed over 5 second intervals. We evaluated our model's performance using leave-one-subject-out cross-validation and compared the performance to 20 baseline models and a NN with all features included in the input layer.
AB - Inferring the latent emotive content of a narrative requires consideration of para-linguistic cues (e.g. pitch), linguistic content (e.g. vocabulary) and the physiological state of the narrator (e.g. heart-rate). In this study we utilized a combination of auditory, text, and physiological signals to predict the mood (happy or sad) of 31 narrations from subjects engaged in personal story-telling. We extracted 386 audio and 222 physiological features (using the Samsung Simband) from the data. A subset of 4 audio, 1 text, and 5 physiologic features were identified using Sequential Forward Selection (SFS) for inclusion in a Neural Network (NN). These features included subject movement, cardiovascular activity, energy in speech, probability of voicing, and linguistic sentiment (i.e. negative or positive). We explored the effects of introducing our selected features at various layers of the NN and found that the location of these features in the network topology had a significant impact on model performance. To ensure the real-time utility of the model, classification was performed over 5 second intervals. We evaluated our model's performance using leave-one-subject-out cross-validation and compared the performance to 20 baseline models and a NN with all features included in the input layer.
UR - http://www.scopus.com/inward/record.url?scp=85019603994&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85019603994&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85019603994
SP - 948
EP - 954
T2 - 31st AAAI Conference on Artificial Intelligence, AAAI 2017
Y2 - 4 February 2017 through 10 February 2017
ER -