TY - JOUR
T1 - Causal inference of asynchronous audiovisual speech
AU - Magnotti, John F.
AU - Ma, Wei Ji
AU - Beauchamp, Michael S.
PY - 2013
Y1 - 2013
N2 - During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions about the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post-hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.
AB - During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions about the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post-hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.
KW - Bayesian observer
KW - Causal inference
KW - Multisensory integration
KW - Speech perception
KW - Synchrony judgments
UR - http://www.scopus.com/inward/record.url?scp=84889672200&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889672200&partnerID=8YFLogxK
U2 - 10.3389/fpsyg.2013.00854
DO - 10.3389/fpsyg.2013.00854
M3 - Article
AN - SCOPUS:84889672200
SN - 1664-1078
VL - 4
JO - Frontiers in Psychology
JF - Frontiers in Psychology
IS - NOV
M1 - 854
ER -